TLDR: A systematic review of literature from 2023-2025 explores the integration of Generative AI with Extended Reality (XR), covering applications, technologies, and challenges. Key findings indicate that generative AI is transforming content creation and user interaction in XR, particularly in design, education, and training, with VR and AR being dominant platforms. Diffusion Models and Large Language Models are frequently used, with natural language as a primary input. Significant challenges include multi-modal interaction, latency, system integration, security, user autonomy, child protection, and long-term performance.
The rapid advancements in Generative Artificial Intelligence (AI) are opening up unprecedented possibilities, especially when combined with Extended Reality (XR). XR, an umbrella term encompassing Virtual Reality (VR), Augmented Reality (AR), and Mixed Reality (MR), offers immersive and interactive experiences that blend digital and physical worlds. A recent systematic review delves into the applications, key technologies, and future directions of this powerful convergence.
The research paper, titled When Generative Artificial Intelligence meets Extended Reality: A Systematic Review, was compiled by Xinyu NING, Yan ZHUO, Xian WANG, Chan-In Devin SIO, and Lik-Hang LEE. Their work systematically reviews literature from 2023 to 2025, analyzing 26 articles to summarize current trends and identify research gaps.
Key Application Areas
The review highlights several domains where generative AI is significantly impacting XR. Design, along with education and training, emerged as the most prominent fields. In design, generative AI facilitates virtual prototyping, architectural design, and interactive product design, allowing users to quickly create high-quality 3D models using simple gestures and voice commands in XR environments. For instance, systems like MS2Mesh-XR enable multi-modal sketch-to-mesh generation.
In education and training, XR combined with generative AI provides immersive and interactive learning experiences, such as virtual experiments, simulated training, and customized content. This enhances visual perception and engagement, with examples like AI-powered virtual assistants in VR for anatomical questions.
Other important application areas include transmission (efficient XR content delivery), medical treatment (virtual therapies and patient monitoring), and architecture (generating 360-degree images for immersive design). Emerging domains like commerce, gaming, traffic, and cultural heritage are also beginning to explore this synergy, from simplifying 3D model creation for e-commerce to reconstructing virtual monuments for digital preservation.
Technological Landscape
In terms of XR technology, Virtual Reality (VR) and Augmented Reality (AR) currently dominate the applications of generative AI. While Mixed Reality (MR) and broader ‘pan-XR’ environments are also explored, VR and AR remain the primary platforms for these innovations.
A variety of generative AI models are being utilized. Diffusion Models, known for generating detailed visual content, are the most frequently applied. Large Language Models (LLMs) are increasingly used for generative dialogue and narrative creation, while Generative Adversarial Networks (GANs) are employed for realistic content generation, such as virtual characters and scenes. Visual-Linguistic Models (VLMs) and Transformer models also play roles in combining visual and linguistic information and processing complex data.
User interaction modes are evolving, with natural language input (voice and text) being the most dominant method for interacting with generative AI in virtual worlds. Image input is also gaining traction, while pose and behavior inputs are still in exploratory stages. The primary output or ‘feedback cue’ from these systems is image or model generation, followed by text and voice generation, enhancing immersion and interactivity.
The Fusion of AI and XR
The integration of generative AI into XR primarily occurs in two ways: object generation based on virtual environments and dynamic content generation based on user input. The first involves AI creating or enhancing static content with indirect user input, like an AI generating story content and visual elements for a VR narrative. The second, more interactive approach, sees AI responding in real-time to specific user inputs (speech, gestures) to instantly generate personalized content or adjust the virtual environment, as seen in systems that convert speech into 3D models in mixed reality.
Also Read:
- Generative AI’s Expanding Influence in Biological Research
- Unpacking AI’s Role in Elementary STEM Education: Opportunities and Obstacles
Challenges Ahead
Despite the immense potential, several technical and ethical challenges need to be addressed for the widespread adoption of generative AI in XR:
- Deep Integration of Multi-Modal Interaction: Effectively combining inputs from visual, auditory, and tactile modalities to create adaptive and natural user experiences remains a significant hurdle.
- Latency and Stuttering: Real-time XR applications demand minimal latency, but generative models often require substantial computational resources. Optimizing models for edge devices is crucial.
- System Integration and Standardization: A lack of unified standards and interfaces limits interoperability between different generative AI and XR systems, necessitating modular and extensible architectures.
- AI-enabled Virtuality in Physical Worlds: While VR is prominent, AR-specific generative AI pipelines are rapidly advancing, enabling in-situ object creation and manipulation in real-world contexts for design, education, and remote collaboration.
- Security and Privacy: XR systems collect extensive user data, including biometrics and personal habits. Protecting this sensitive information from breaches through robust encryption and data minimization is paramount.
- User Autonomy and Decision-Making: Over-reliance on AI for cognitive tasks in XR could potentially degrade users’ independent thinking and problem-solving skills. Designing AI to support rather than replace human cognition is essential.
- Child Protection Concerns: The risk of AI-generated content containing inappropriate or harmful information for children necessitates strong censorship mechanisms and parental control features.
- Long-term Performance & Sustainability: Most research focuses on short-term evaluations. Long-term testing in real-world environments is needed to ensure system stability, durability, and maintainability.
- AI in Remote Collaboration and Social XR: While AI can create shared collaboration spaces and intelligent assistants, challenges remain in real-time responsiveness, multi-user synchronization, identity authentication, and maintaining social presence and emotional realism.
- Compliance and Legal Challenges: The expressive power of generative AI in XR amplifies issues like bias and misinformation, demanding customized ethical and legal frameworks, transparency (Explainable AI), and accountability.
In conclusion, the convergence of generative AI and XR is poised to redefine content creation and user interaction, offering more natural and immersive virtual experiences. Addressing the identified challenges will be key to unlocking the full potential of this transformative technological frontier.


