Reality Proxy: Bridging Physical and Digital Interaction in Mixed Reality

TLDR: Reality Proxy is a novel Mixed Reality (MR) system that simplifies interaction with real-world objects by creating abstract digital ‘proxies’ for them. These proxies, enriched with AI-derived information, allow users to easily select, filter, group, and manipulate objects regardless of their physical distance or occlusion, using familiar gestures. The system aims to reduce physical strain and enhance user understanding of complex environments, demonstrated across applications like information retrieval, building navigation, and drone control.

Interacting with real-world objects in Mixed Reality (MR) environments can often be challenging. Imagine trying to select a specific book on a crowded, distant shelf or controlling multiple drones scattered across a large area. Traditional methods, like pointing with a hand ray or relying solely on gaze, often fall short when objects are far away, partially hidden, or tightly packed. These difficulties stem from the need to interact directly with physical objects, which are bound by their inherent physical limitations like size, position, and arrangement.

A new system called Reality Proxy offers a fresh approach to these challenges. Its core idea is to separate the act of interaction from the physical constraints of real-world objects by introducing ‘proxies.’ These proxies are abstract, digital representations of physical objects. When you interact with a proxy, it’s functionally the same as interacting with the actual object, but without the physical limitations.

Reality Proxy seamlessly shifts your interaction target from the physical object to its digital proxy during selection. This means you can easily select distant objects or perform complex manipulations using familiar gestures, without needing to learn new commands or navigate cumbersome menus. The system enhances these proxies with information derived from Artificial Intelligence (AI), including semantic attributes (like a book’s topic or color) and hierarchical spatial relationships (like a book being on a shelf, which is in a room).

How Reality Proxy Works

The process involves three main steps: activating, generating, and interacting with the proxies.

First, when a user performs a simple gesture, like a ‘pinch’ while gazing at an object, Reality Proxy activates. It uses an AI-driven pipeline to understand the scene. This involves detecting objects in hierarchical structures, meaning it can identify a whole bookshelf, individual books on it, or even smaller components like buttons on a microwave. It also extracts semantic attributes for each object, allowing for rich descriptions like ‘red book’ or ‘kitchen appliance.’ This detailed understanding of the scene forms the foundation for the proxies.

Next, the system generates these proxies. By default, it creates proxies for the primary objects within the user’s gaze, placing them conveniently near the user’s hand. These proxies are fixed-size, rectangular 3D objects, but crucially, they preserve the relative spatial relationships of the real objects. This ensures that even though you’re interacting with a digital representation, the spatial layout feels natural and coherent. For example, if two books are next to each other on a shelf, their proxies will also appear next to each other.

Finally, interacting with the proxies is designed to keep the user focused on the real world. When you manipulate a proxy, visual feedback, such as a highlight, appears directly on the corresponding physical object. To keep the proxies easily accessible, a ‘lazy-follow’ mechanism ensures they stay near your hand without constantly reacting to minor movements, allowing for fluid transitions between focusing on the real world and glancing at the proxy.

Fluid Interactions Enabled

Reality Proxy unlocks several advanced interactions that were previously difficult in MR:

Skim and Preview Objects: Users can quickly browse information by sliding a finger across multiple proxies, with details appearing near the actual object.
Multiple Selection through Brushing: Selecting several objects at once becomes easy by brushing over their proxies, even if the real objects are distant.
Filtering Objects by Attribute: Objects can be filtered based on their semantic attributes (e.g., all books on ‘AI’ or all ‘red’ items), simplifying subset selection.
Interactions Leveraging Physical Affordance: Physical surfaces, like a table, can be transformed into touchpads for interacting with proxies, using familiar gestures like dragging or spreading fingers.
Grouping Objects via Spatial Zooming: Users can intuitively navigate hierarchical groups (e.g., zooming from a building to a floor, then to individual rooms) using a two-handed zoom gesture.
Grouping Objects by Semantic Attributes: Double-tapping a proxy can group other objects by shared attributes, like grouping rooms by department.
Creating Custom Groups: Users can create their own custom groups of objects by brushing an empty space to form a container and then adding proxies to it.

Also Read:

Real-World Applications

The versatility of Reality Proxy has been demonstrated across various scenarios:

Everyday Information Retrieval: Users can easily scan objects in an office or kitchen to retrieve associated data, such as finding the price of books on a shelf or interacting with scattered kitchen items.
Building Navigation: The system allows for fluid exploration of large-scale environments like multi-floor buildings, even revealing structures that are otherwise invisible or occluded.
Controlling Drones: Reality Proxy enables direct and efficient control of dynamic objects like multiple drones, allowing users to select and command them based on spatial position or attributes like battery level.

An expert evaluation involving experienced XR developers and researchers provided highly positive feedback, praising the system’s usefulness, ease of learning, and usability. Participants noted that Reality Proxy reduces physical fatigue, increases interaction expressiveness, and enhances their understanding of scene organization. While some minor accuracy and alignment challenges were noted, the overall reception was strong, highlighting its potential for diverse MR scenarios, including those with large-scale environments, very small or hard-to-reach objects, and applications requiring enhanced accessibility or collaboration.

Reality Proxy represents a significant step towards more fluid, flexible, and expressive interaction with real-world objects in mixed reality environments. By abstracting physical objects into manipulable digital proxies, it opens up new possibilities for how we engage with the blended physical and digital worlds. For more details, you can refer to the full research paper: Reality Proxy: Fluid Interactions with Real-World Objects in MR via Abstract Representations.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Reality Proxy: Bridging Physical and Digital Interaction in Mixed Reality

How Reality Proxy Works

Fluid Interactions Enabled

Real-World Applications

Gen AI News and Updates

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates