Presentation + Paper
7 June 2024 Generating simulated data with a large language model
Jeffrey Kerley, Derek T. Anderson, Andrew R. Buck, Brendan Alvey
Author Affiliations +
Abstract
Enabling abstraction within a programming language has benefits. However, the associated complexity of such abstractions often pose a steep learning curve for users. While user interfaces or visual scripting can help alleviate this to some extent, they often lack readability and reproducibility, especially as complexity grows. Herein, we explore the use of Large Language Models (LLMs) as an intermediate between the nuanced, syntactical programming language and the natural (human) way of describing the world. Our formal language LSCENE is a way to procedurally generate realistic synthetic scenes in the Unreal Engine. This tool is useful because artificial intelligence (AI) typically requires large volumes of labeled data with variety. To generate such data for training and evaluating AI, we employ an LLM to interpret and sample LSCENEs that are compatible with user input. Through this approach, we demonstrate a reduction in abstract complexity, elimination of syntax complexity, and the ability to tackle complex tasks in LSCENE using natural language. To illustrate our findings, we present three experiments with quantitative results focused on spatial reasoning, along with a more intricate qualitative example of automatically generating an environment for a specific biome.
Conference Presentation
(2024) Published by SPIE. Downloading of the abstract is permitted for personal use only.
Jeffrey Kerley, Derek T. Anderson, Andrew R. Buck, and Brendan Alvey "Generating simulated data with a large language model", Proc. SPIE 13035, Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications II, 1303508 (7 June 2024); https://doi.org/10.1117/12.3013460
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Artificial intelligence

3D modeling

Data modeling

Education and training

Contamination

Mathematical modeling

Performance modeling

Back to Top