TLDR: CodeNER is a novel method that uses code-based prompting to enhance Named Entity Recognition (NER) capabilities of Large Language Models (LLMs). By embedding detailed BIO schema instructions within structured code prompts, CodeNER helps LLMs better understand and perform NER, overcoming limitations of traditional text-based prompting. Experimental results show CodeNER consistently outperforms text-based methods across various languages and models, demonstrating improved accuracy in identifying entity boundaries and handling complex text structures.
Named Entity Recognition, or NER, is a fundamental task in natural language processing (NLP) that involves identifying and classifying named entities in text, such as people, locations, and organizations. Traditionally, NER has been approached as a sequential labeling problem, where models assign a specific tag (like BIO schema) to each word in a sentence to mark entity boundaries. While conventional methods have achieved high performance, they often require extensive labeled datasets for training.
Recently, large language models (LLMs) have shown remarkable capabilities in various NLP tasks, including NER, through in-context learning and zero-shot task-solving. However, applying LLMs to NER using traditional text-based prompting methods presents a challenge. LLMs typically operate on a “text-in-text-out” schema, which doesn’t naturally align with the “text-in-span-out” nature of NER, where the goal is to identify specific spans of text as entities. This mismatch can lead to difficulties in accurately identifying entity boundaries and handling the sequential aspects of NER.
To address these limitations, a new method called CodeNER has been proposed. CodeNER introduces a novel approach that leverages code-based prompting to enhance LLMs’ understanding and performance in NER. Instead of relying solely on natural language instructions, CodeNER embeds detailed BIO schema labeling instructions within structured code prompts, typically in Python. This approach exploits the LLMs’ inherent ability to comprehend programming language structures, allowing them to better identify entity boundaries and process text sequentially.
The core idea behind CodeNER is to provide explicit guidance for sequential processing, which is crucial for accurate NER, especially in zero-shot and few-shot scenarios where direct supervision is minimal. By defining variables for sentences and NER tag labels, and including a function that iterates through tokens to apply BIO tags, CodeNER guides the LLM to dynamically define and populate an entity dictionary. This structured approach helps overcome issues like misinterpretation and variability often encountered with purely text-based prompts.
Experiments were conducted across ten benchmark datasets in English, Arabic, Finnish, Danish, and German, using both closed models like ChatGPT (GPT-4 and GPT-4 Turbo) and open models such as Llama-3-8B and Phi-3-mini-128k-instruct. The results consistently showed that CodeNER outperforms conventional text-based prompting methods. For instance, CodeNER demonstrated significant improvements on datasets like FIN and MIT restaurant, with average F1 score improvements across all datasets for both GPT-4 and GPT-4 Turbo. When analyzing performance across specific entity labels (Person, Location, Organization, Miscellaneous), CodeNER generally showed stronger performance, particularly for the Miscellaneous category.
Further analysis with open models like Phi-3 and Llama-3-8B also confirmed CodeNER’s superior performance compared to vanilla text-based prompts and even other partial code-based prompts like GoLLIE and GNER. The study highlighted that the effectiveness of CodeNER is closely tied to the LLM’s capability to interpret programming language instructions. The integration of the BIO schema within the code-based prompts was found to be particularly important for accurately identifying entity span boundaries.
Case studies revealed that CodeNER is more robust in handling complex scenarios. For example, it accurately recognized long sequence tokens like website URLs as single units, unlike text-based methods that might break them down. It also effectively managed duplicate and repeated tokens, ensuring each token is labeled correctly and avoiding overlapping classifications. CodeNER also showed better performance in capturing special characters attached to words, which vanilla prompts often missed.
The researchers also explored combining CodeNER with Chain-of-Thought (CoT) prompting, a technique that encourages step-by-step reasoning. This combination further improved performance in zero-shot settings, suggesting that structured programming language-style prompts can enhance LLMs’ understanding of long-range scopes. Interestingly, testing CodeNER with different programming languages like C++ showed comparable results to Python on some datasets, indicating that dataset characteristics might sometimes have a greater impact than the specific programming language chosen. However, some languages like Java proved less effective, possibly due to LLMs lacking inherent knowledge of Java in their pre-training.
Also Read:
- Bridging Logic and Language: A New Approach to Knowledge Graph Completion with LLMs
- Decoding Chain-of-Thought: Information Flow in Language Models
In summary, CodeNER offers a significant advantage by bridging the gap between LLMs’ text-in-text-out schema and NER’s text-in-span-out nature. It provides a structured, sequential approach to labeling that reduces errors like overlapping tags and improves the recognition of individual tokens. While highly effective, CodeNER may be less advantageous for datasets with very long sentences containing many function words, where a simpler, context-focused approach might sometimes perform better. The research paper, available at https://arxiv.org/pdf/2507.20423, concludes that CodeNER consistently outperforms text-based prompting methods, demonstrating the effectiveness of explicitly structuring NER instructions within a code-based framework.


