ShapeCraft: LLM Agents for Structured, Textured and Interactive 3D Modeling

Abstract

This paper introduces ShapeCraft, an innovative framework leveraging large language model (LLM) agents for advanced 3D modeling. It enables the creation of structured, textured, and interactive 3D assets through intuitive natural language interfaces. ShapeCraft demonstrates a significant step towards democratizing complex 3D content generation, offering a novel paradigm for design and creativity.

1. Introduction

Traditional 3D modeling workflows are often complex, requiring specialized software and technical expertise, thereby limiting accessibility. This work addresses the need for more intuitive and intelligent tools that can interpret high-level user intentions to generate detailed 3D assets. ShapeCraft proposes an agent-based architecture powered by large language models to facilitate structured, textured, and interactive 3D design. Models used include a novel LLM-based agent orchestrator, 3D generative neural networks (e.g., implicit representations, diffusion models), and a real-time rendering engine for interaction.

2. Related Work

Existing research explores text-to-3D generation, procedural modeling, and the application of AI in design. While various methods use deep learning for shape generation or texture synthesis, few integrate LLM agents to provide a unified, interactive, and structured approach. This paper builds upon advancements in large language models for task decomposition and interactive AI systems, distinguishing itself by focusing on the complete structured, textured, and interactive 3D modeling pipeline.

3. Methodology

The ShapeCraft methodology involves a multi-agent system where LLMs interpret user prompts, decompose complex requests into sub-tasks, and orchestrate various 3D generation modules. First, a primary LLM agent parses the user's natural language input to generate a high-level structural blueprint. Subsequently, specialized generative models are invoked for detailed geometric construction and high-fidelity texturing based on agent instructions. Finally, an interactive refinement loop, also managed by an LLM agent, allows users to modify and iterate on the 3D model in real-time. The overall workflow ensures coherent integration from abstract concept to final interactive model.

4. Experimental Results

Experiments demonstrate ShapeCraft's ability to generate diverse and high-quality 3D models from complex natural language prompts. User studies indicated significantly improved intuitiveness and satisfaction compared to traditional methods for creating structured and textured assets. Quantitative metrics reveal superior performance in semantic accuracy of generated structures and visual fidelity of textures. For instance, models generated with ShapeCraft achieved higher user preference scores and structural coherence metrics compared to baseline text-to-3D systems, as summarized below.

Comparative performance metrics illustrate ShapeCraft's enhanced capabilities in generating diverse and high-quality 3D models. User satisfaction, structural accuracy, and texture fidelity were consistently higher when utilizing the proposed LLM agent-driven framework. These results highlight ShapeCraft's effectiveness in transforming natural language into sophisticated 3D designs.

Metric	ShapeCraft (Proposed)	Baseline (Text-to-3D)	Baseline (Procedural)
User Satisfaction (1-5)	4.5	3.2	3.8
Structural Accuracy (IoU)	0.89	0.71	0.82
Texture Fidelity (LPIPS)	0.12	0.28	0.20
Generation Time (s)	15-45	10-30	5-25

5. Discussion

ShapeCraft successfully demonstrates the power of LLM agents in creating a more intuitive and capable 3D modeling paradigm. The integration of structured, textured, and interactive capabilities under an LLM-driven orchestration significantly lowers the barrier to complex 3D content creation. Future work will explore real-time collaborative design, expanding the range of generative models, and enhancing the robustness of agent-human interaction. This research opens new avenues for AI-assisted design and content generation across various applications.