spot_img
HomeResearch & DevelopmentReality Proxy: Bridging Physical and Digital Interaction in Mixed...

Reality Proxy: Bridging Physical and Digital Interaction in Mixed Reality

TLDR: Reality Proxy is a novel Mixed Reality (MR) system that simplifies interaction with real-world objects by creating abstract digital ‘proxies’ for them. These proxies, enriched with AI-derived information, allow users to easily select, filter, group, and manipulate objects regardless of their physical distance or occlusion, using familiar gestures. The system aims to reduce physical strain and enhance user understanding of complex environments, demonstrated across applications like information retrieval, building navigation, and drone control.

Interacting with real-world objects in Mixed Reality (MR) environments can often be challenging. Imagine trying to select a specific book on a crowded, distant shelf or controlling multiple drones scattered across a large area. Traditional methods, like pointing with a hand ray or relying solely on gaze, often fall short when objects are far away, partially hidden, or tightly packed. These difficulties stem from the need to interact directly with physical objects, which are bound by their inherent physical limitations like size, position, and arrangement.

A new system called Reality Proxy offers a fresh approach to these challenges. Its core idea is to separate the act of interaction from the physical constraints of real-world objects by introducing ‘proxies.’ These proxies are abstract, digital representations of physical objects. When you interact with a proxy, it’s functionally the same as interacting with the actual object, but without the physical limitations.

Reality Proxy seamlessly shifts your interaction target from the physical object to its digital proxy during selection. This means you can easily select distant objects or perform complex manipulations using familiar gestures, without needing to learn new commands or navigate cumbersome menus. The system enhances these proxies with information derived from Artificial Intelligence (AI), including semantic attributes (like a book’s topic or color) and hierarchical spatial relationships (like a book being on a shelf, which is in a room).

How Reality Proxy Works

The process involves three main steps: activating, generating, and interacting with the proxies.

First, when a user performs a simple gesture, like a ‘pinch’ while gazing at an object, Reality Proxy activates. It uses an AI-driven pipeline to understand the scene. This involves detecting objects in hierarchical structures, meaning it can identify a whole bookshelf, individual books on it, or even smaller components like buttons on a microwave. It also extracts semantic attributes for each object, allowing for rich descriptions like ‘red book’ or ‘kitchen appliance.’ This detailed understanding of the scene forms the foundation for the proxies.

Next, the system generates these proxies. By default, it creates proxies for the primary objects within the user’s gaze, placing them conveniently near the user’s hand. These proxies are fixed-size, rectangular 3D objects, but crucially, they preserve the relative spatial relationships of the real objects. This ensures that even though you’re interacting with a digital representation, the spatial layout feels natural and coherent. For example, if two books are next to each other on a shelf, their proxies will also appear next to each other.

Finally, interacting with the proxies is designed to keep the user focused on the real world. When you manipulate a proxy, visual feedback, such as a highlight, appears directly on the corresponding physical object. To keep the proxies easily accessible, a ‘lazy-follow’ mechanism ensures they stay near your hand without constantly reacting to minor movements, allowing for fluid transitions between focusing on the real world and glancing at the proxy.

Fluid Interactions Enabled

Reality Proxy unlocks several advanced interactions that were previously difficult in MR:

  • Skim and Preview Objects: Users can quickly browse information by sliding a finger across multiple proxies, with details appearing near the actual object.
  • Multiple Selection through Brushing: Selecting several objects at once becomes easy by brushing over their proxies, even if the real objects are distant.
  • Filtering Objects by Attribute: Objects can be filtered based on their semantic attributes (e.g., all books on ‘AI’ or all ‘red’ items), simplifying subset selection.
  • Interactions Leveraging Physical Affordance: Physical surfaces, like a table, can be transformed into touchpads for interacting with proxies, using familiar gestures like dragging or spreading fingers.
  • Grouping Objects via Spatial Zooming: Users can intuitively navigate hierarchical groups (e.g., zooming from a building to a floor, then to individual rooms) using a two-handed zoom gesture.
  • Grouping Objects by Semantic Attributes: Double-tapping a proxy can group other objects by shared attributes, like grouping rooms by department.
  • Creating Custom Groups: Users can create their own custom groups of objects by brushing an empty space to form a container and then adding proxies to it.

Also Read:

Real-World Applications

The versatility of Reality Proxy has been demonstrated across various scenarios:

  • Everyday Information Retrieval: Users can easily scan objects in an office or kitchen to retrieve associated data, such as finding the price of books on a shelf or interacting with scattered kitchen items.
  • Building Navigation: The system allows for fluid exploration of large-scale environments like multi-floor buildings, even revealing structures that are otherwise invisible or occluded.
  • Controlling Drones: Reality Proxy enables direct and efficient control of dynamic objects like multiple drones, allowing users to select and command them based on spatial position or attributes like battery level.

An expert evaluation involving experienced XR developers and researchers provided highly positive feedback, praising the system’s usefulness, ease of learning, and usability. Participants noted that Reality Proxy reduces physical fatigue, increases interaction expressiveness, and enhances their understanding of scene organization. While some minor accuracy and alignment challenges were noted, the overall reception was strong, highlighting its potential for diverse MR scenarios, including those with large-scale environments, very small or hard-to-reach objects, and applications requiring enhanced accessibility or collaboration.

Reality Proxy represents a significant step towards more fluid, flexible, and expressive interaction with real-world objects in mixed reality environments. By abstracting physical objects into manipulable digital proxies, it opens up new possibilities for how we engage with the blended physical and digital worlds. For more details, you can refer to the full research paper: Reality Proxy: Fluid Interactions with Real-World Objects in MR via Abstract Representations.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -