3Dify: Bridging Natural Language and 3D Graphics Generation

TLDR: 3Dify is a novel framework that enables users to generate complex 3D computer graphics (3D-CG) solely through natural language instructions. Built on the open-source Dify platform, it integrates Large Language Models (LLMs) with advanced technologies like Model Context Protocol (MCP) and Retrieval-Augmented Generation (RAG). The system automates Digital Content Creation (DCC) tools such as Blender and Unreal Engine, incorporates an interactive feedback loop for refining generated images, and supports the use of local LLMs to enhance security and reduce costs. This allows for efficient, flexible, and accessible 3D content creation without manual modeling.

Creating intricate 3D computer graphics (3D-CG) has traditionally been a complex and time-consuming endeavor, often requiring specialized skills and extensive manual effort. However, a new framework called 3Dify is set to change this by enabling users to generate detailed 3D content using simple natural language instructions, powered by Large Language Models (LLMs).

Developed by researchers from Nagoya University and Kyushu University, 3Dify aims to democratize 3D-CG production, making it accessible even to non-experts. The framework is built upon Dify, an open-source platform for AI application development, and integrates cutting-edge LLM technologies such as the Model Context Protocol (MCP) and Retrieval-Augmented Generation (RAG).

How 3Dify Works: A Seamless Workflow

The process of generating 3D content with 3Dify is designed to be intuitive and iterative:

First, users simply input their desired 3D image description in natural language. For example, “Create a desktop gaming PC model with side panel removed, keeping all internal components fully visible.”

Next, an LLM presents multiple 2D image candidates as pre-visualizations. Users can select the images closest to their vision, and the LLM learns from these selections to generate new, refined candidates. This feedback loop continues until the user is satisfied with a pre-visualization that closely matches their intent.

Finally, based on the refined pre-visualization, a Digital Content Creation (DCC) tool, such as Blender or Unreal Engine, automatically creates the corresponding 3D image. The remarkable aspect here is that users do not need to perform any manual 3D modeling operations within the DCC tools; 3Dify automates the entire process.

Key Innovations and Features

3Dify stands out with several distinctive features:

Dify-based Implementation: By extending Dify, an open-source platform, 3Dify can rapidly adopt the latest AI technologies and easily switch between various LLM models from providers like OpenAI, Anthropic, and Google. Its open-source nature also ensures long-term maintainability and extensibility.

Automated DCC Tool Operation: 3Dify employs two primary methods to control DCC tools. The Model Context Protocol (MCP) provides a simple and secure way for LLM agents to interact with applications. For tools or functions not supporting MCP, 3Dify utilizes the Computer-Using Agent (CUA) method, which allows LLMs to directly operate graphical user interfaces (GUIs) through screenshots, using specialized models like UI-TARS.

Retrieval-Augmented Generation (RAG): To enhance its generation capabilities and maintainability, 3Dify uses RAG. This allows LLMs to reference external information, such as DCC tool manuals and documentation, improving functional coverage and adaptability to software updates.

Image-Selection Feedback Loop: This interactive mechanism allows users to iteratively refine the generated images. By selecting preferred candidates, the LLM automatically recognizes variable patterns and applies them to subsequent generations, ensuring the final output aligns closely with user preferences.

Support for Local LLMs: Users can integrate locally deployed LLMs, leveraging their own computational resources. This reduces costs associated with external API calls and allows for the use of custom-developed models, while also preventing data leakage of sensitive information to external services.

Extensibility: Beyond 3D-CG production, 3Dify’s use of CUA enables it to access and automate a wide range of features within DCC tools, including game development and animation creation, making it a versatile framework for broader applications.

Under the Hood: Multiple LLM Agents and Smart Interactions

The framework’s sophisticated architecture involves three distinct LLM agents:

Visualizer LLM: Responsible for generating the initial 2D pre-visualization images and refining them based on user feedback.

Planner LLM: Analyzes the refined pre-visualization, predicts the necessary variations for the 3D model, extracts procedural parameters, and communicates the procedure to the Manager LLM.

Manager LLM: Receives instructions from the Planner LLM and directly operates the DCC tool to create the 3D-CG. It can also interact with the user for clarification if needed.

The system also uses Dify’s Chatflow feature to manage complex, multi-turn interactions and dynamic workflows, ensuring smooth communication between agents and the user.

Also Read:

Demonstration and Future Outlook

In a demonstration, 3Dify successfully generated a 3D model of a desktop PC in Blender from a single natural language prompt. Further instructions, such as making case fans glow, were also successfully executed. While challenges remain, such as maintaining spatial coherence with numerous objects and complex instructions, the framework shows immense promise. The current demonstration primarily utilized MCP for automation, highlighting the potential for even greater accuracy and versatility when integrating visual information through CUA.

3Dify represents a significant step forward in procedural 3D-CG generation, offering an efficient and flexible approach to creating complex 3D content. By combining the power of LLMs with automated DCC tool operations and interactive feedback, it paves the way for a future where 3D design is as simple as describing your vision in words. You can find more details about this innovative framework in the full research paper: 3Dify: a Framework for Procedural 3D-CG Generation Assisted by LLMs Using MCP and RAG.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

3Dify: Bridging Natural Language and 3D Graphics Generation

How 3Dify Works: A Seamless Workflow

Key Innovations and Features

Under the Hood: Multiple LLM Agents and Smart Interactions

Demonstration and Future Outlook

Gen AI News and Updates

NVIDIA Unveils AI Blueprint for Enhanced 3D-Guided Generative Image Creation

Broadcom Unveils VMware Tanzu Data Intelligence and Platform 10.3, Powering AI-Ready Data and Application Management

Cyberway and Dify Forge Alliance to Advance Enterprise AI Agent Development

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates