TLDR: This research introduces a perspectivist annotation scheme for the MapTask corpus to track how understanding emerges and diverges in asymmetric dialogues. Using an LLM-powered pipeline, the study annotated 13,000 reference expressions, revealing that while full misunderstandings are rare, ‘multiplicity discrepancies’ (where a landmark appears multiple times on one map but fewer on another) systematically lead to referential misalignments. The framework provides a resource and analytical lens for studying grounded misunderstanding and evaluating AI’s capacity to model perspective-dependent grounding.
In our daily conversations, especially when we’re working together on a task, we constantly try to make sure we’re on the same page. This process, known as ‘establishing common ground,’ is crucial for effective communication. However, what happens when people think they understand each other, but are actually referring to different things? This is particularly challenging in ‘asymmetric’ situations where participants have different pieces of information.
A recent research paper, “Grounded Misunderstandings in Asymmetric Dialogue: A Perspectivist Annotation Scheme for MapTask” by Nan Li, Albert Gatt, and Massimo Poesio, delves into this very issue. The researchers introduce a novel way to analyze how understanding develops, diverges, and gets fixed over time in collaborative dialogues. They used the HCRC MapTask corpus, a well-known dataset where two participants navigate a route using slightly different maps, leading to potential communication breakdowns.
The Challenge of Asymmetric Information
Traditional approaches to understanding how people refer to things in conversation often assume that once an agreement is reached, both speakers are successfully talking about the same entity. However, the MapTask scenario highlights that this isn’t always true. Participants might confirm understanding, but still have different mental pictures of what’s being discussed because their maps aren’t identical. Previous studies on MapTask have shown that while full misunderstandings are rare, subtle differences in interpretation can persist.
A New Way to Track Understanding
To address this, the researchers developed a ‘perspectivist annotation scheme.’ This scheme is unique because it separately captures what the speaker intends to refer to and what the listener actually interprets. This allows for a detailed tracking of how understanding evolves. They also created a new system for identifying landmarks on the maps, which helps to clearly distinguish between identical landmarks and those with discrepancies.
Beyond just identifying landmarks, the scheme uses five binary attributes to describe the nature of the reference and the listener’s state of understanding:
- is_quantificational: Is the speaker asking if something exists, or referring to a specific item?
- is_specified: Is there enough information in the conversation to know what the listener understood?
- is_accommodated: Did the listener acknowledge the reference without showing confusion?
- is_grounded: Did the listener link the reference to a specific landmark on their map?
- is_imagined: Did the listener mentally picture a landmark that wasn’t on their map, based on the speaker’s description?
Leveraging AI for Annotation
To apply this detailed scheme across the entire MapTask corpus, the researchers employed an advanced AI model, GPT-5. They designed a specific ‘scheme-constrained prompt’ to guide the AI, ensuring it followed the annotation rules and produced structured outputs. This AI-powered pipeline successfully annotated over 13,000 reference expressions, demonstrating high reliability when compared to human annotations.
Key Insights into Misunderstandings
The analysis of these annotations revealed fascinating patterns in how understanding unfolds:
- Rarity of Full Misunderstandings: Initially, about 7% of references were classified as ‘misunderstood.’ However, after accounting for ‘lexical discrepancies’ (where the same landmark had slightly different names on the maps, like ‘cliffs’ vs. ‘sandstone cliffs’), the misunderstanding rate dropped significantly to just 1.82%. This supports the idea that people actively work to repair communication breakdowns.
- Multiplicity Discrepancies are Tricky: The most significant source of misunderstandings came from ‘multiplicity discrepancies.’ This is when a landmark appears twice on the speaker’s map but only once on the listener’s. These situations accounted for over 50% of all misunderstandings, even though they represented a small fraction of all references. This highlights how easily people can assume uniqueness when it doesn’t exist.
- Tracking Understanding Over Time: By analyzing ‘reference chains’ (repeated references to the same landmark), the study found that resolving ‘multiplicity discrepancies’ often required more turns of conversation. Participants had to coordinate not just on the name, but on which specific instance was being referred to.
Also Read:
- DecompSR: Unpacking How Language Models Reason About Space
- Urban-MAS: A New Approach to Understanding Cities with AI
Implications for AI and Future Research
This research provides a valuable resource and a new way to study how misunderstandings occur in collaborative dialogue. It also sets a benchmark for evaluating the ability of Large Language Models (LLMs) and Vision-Language Models (VLMs) to understand perspective-dependent grounding. The findings suggest that future AI systems need to be better at modeling different perspectives and tracking evolving interpretations, rather than assuming an ‘omniscient’ view.
The paper acknowledges some limitations, such as relying solely on text transcripts and not capturing non-verbal cues like intonation or eye contact, which can also influence understanding. Nevertheless, this work is a significant step towards building more sophisticated AI that can truly grasp the nuances of human communication. You can read the full research paper for more details: Grounded Misunderstandings in Asymmetric Dialogue.


