TLDR: ToMMeR is a lightweight model that efficiently detects entity mentions in text by probing early layers of large language models. It achieves high recall (93% zero-shot) and precision (90%+) across 13 NER benchmarks, demonstrating that LLMs naturally encode entity boundaries as an emergent capability. The model can also be extended to achieve competitive Named Entity Recognition performance, offering an efficient and transferable solution for information extraction.
A new research paper introduces ToMMeR, a novel and efficient approach to identifying entity mentions within text using large language models (LLMs). This development is significant for information extraction, a foundational task in natural language processing that often faces performance bottlenecks.
Traditionally, identifying text spans that refer to entities—known as mention detection—has been a complex task, often conflated with entity typing (e.g., classifying a mention as a person, organization, or location). Existing systems typically require extensive training on task-specific annotations and involve hundreds of millions of parameters. However, recent evidence suggests that LLMs might already encode entity-like spans during their pretraining phase.
Introducing ToMMeR: A Lightweight Solution
ToMMeR, which stands for Token Matching for Mention Recognition, is a lightweight model designed to probe and extract these inherent mention detection capabilities from the early layers of any LLM backbone. With fewer than 300,000 parameters, ToMMeR is remarkably efficient and can be trained in a matter of hours, without modifying the underlying LLM.
The core idea behind ToMMeR is to leverage the latent binding signals within LLM representations. It uses a simple feed-forward head that aggregates token-matching and token-value features. This allows it to score spans directly from the frozen LLM’s representations, eliminating the need for schema input, prompting, or text generation.
How ToMMeR Works
ToMMeR operates by analyzing how tokens within an LLM’s early layers relate to each other. It adapts the transformer’s attention mechanism to quantify the association between token pairs, using a cosine similarity metric. This helps identify internal token bindings within a potential entity span. Complementing this, token-level information is incorporated to provide crucial cues about a span’s boundaries and context. A logistic model then predicts the probability of a span being a valid entity mention based on these matching scores and token values.
The model is trained on Pile-NER, a diverse dataset of GPT-3.5 annotations from The Pile, which offers broad semantic coverage. To address the common issue of class imbalance in mention detection, ToMMeR employs a Balanced Binary Cross-Entropy loss function, ensuring fair contribution from both entity and non-entity spans.
Key Findings and Performance
ToMMeR demonstrates impressive performance across various benchmarks:
- High Recall and Precision: Across 13 diverse Named Entity Recognition (NER) benchmarks, ToMMeR achieves a remarkable 93% zero-shot recall. Its precision, validated by an LLM-as-a-judge evaluation, stands at over 90%, indicating that it rarely produces incorrect predictions despite its high coverage.
- Emergent Capability: A cross-model analysis involving LLMs ranging from 14 million to 15 billion parameters revealed that diverse architectures converge on similar mention boundaries (DICE scores > 0.75). This strongly suggests that mention detection is a shared, emergent capability of language modeling, rather than an artifact of specific datasets or architectures.
- Early Layer Detection: The ability to detect mentions emerges very early in the LLM’s computational process, with near-optimal performance achieved using representations from the first layer of the transformer. This implies that entity-related signals are established and maintained consistently throughout the model’s depth.
- Extension to Full NER: When extended with span classification heads, ToMMeR achieves competitive performance (80-87% F1 score) on standard NER benchmarks, proving its utility as a foundational component for complete information extraction pipelines.
Also Read:
- PANER: Enhancing Entity Recognition in Data-Scarce Settings
- MemCom: Enhancing In-Context Learning Efficiency Through Layer-Wise Prompt Compression
Impact and Future Directions
The introduction of ToMMeR offers both practical and conceptual contributions. Practically, it provides a lightweight, transferable, and high-coverage method for mention detection that can be integrated into any LLM, enabling real-time streaming deployment with minimal overhead. Conceptually, it offers compelling evidence that LLMs develop structured entity representations in their early layers, which can be efficiently recovered through simple probing mechanisms.
This work positions ToMMeR at the forefront of efficient probing methods and practical information extraction systems, paving the way for more modular and schema-agnostic extraction pipelines. For more details, you can read the full research paper here.


