Improving Hindi Named Entity Recognition with External Knowledge

TLDR: This research explores how adding external information, specifically from Wikipedia, can significantly boost the accuracy of Named Entity Recognition (NER) in Hindi, particularly for short, “low-context” sentences. It compares various AI models, showing that Retrieval Augmentation generally enhances performance for fine-tuned models like XLM-R and MuRIL, and for generative models like GPT-3.5 Turbo, though not consistently for all large language models.

Named Entity Recognition, or NER, is a fundamental task in natural language processing that involves identifying and categorizing key information in text, such as names of people, organizations, locations, and products. While crucial for many applications like knowledge graphs and question answering, NER faces significant challenges, especially when dealing with languages like Hindi and in situations where the context around an entity is very limited.

A recent study delves into these challenges, focusing on enhancing Hindi NER, particularly for “low-context” data – sentences that are very short, often just a few words long. The researchers investigated a technique called Retrieval Augmentation (RA), which involves enriching the input text with relevant information retrieved from external sources, in this case, Wikipedia.

The core idea behind Retrieval Augmentation is to provide the language model with additional context that might not be present in the original short sentence. For instance, if a sentence is just “Vicky Burmese Literature,” the system would retrieve related information about “Burmese Literature” from Wikipedia and add it to the input. This expanded context helps the model better understand and categorize the entities within the original sentence.

The study experimented with several types of AI models. They fine-tuned Hindi-specific transformer-based encoders, MuRIL and XLM-R, both with and without Retrieval Augmentation. Additionally, they explored the performance of larger generative models like Llama-2-7B, Llama-2-70B, Llama-3-70B, and GPT-3.5 Turbo, using a “few-shot” prompting approach, again, with and without RA.

The findings highlight the effectiveness of Retrieval Augmentation. For the fine-tuned models, XLM-R showed a remarkable improvement in its macro F1 score (a measure of accuracy) from 0.495 without RA to 0.71 with RA. MuRIL also saw an increase from 0.69 to 0.70. This demonstrates that providing external knowledge significantly helps these models in identifying named entities more accurately, especially in those tricky low-context scenarios.

Interestingly, while fine-tuned Llama-2-7B also improved significantly after fine-tuning (reaching a macro F1 score of 0.37), Retrieval Augmentation did not provide an additional boost for this specific model. Among the larger generative models tested with few-shot prompting, GPT-3.5 Turbo benefited notably from RA, with its F1 score increasing from 0.20 to 0.33. However, Llama-2-70B and Llama-3-70B did not show similar improvements with RA, which the researchers suggest might be due to their relatively shorter context window sizes compared to GPT-3.5 Turbo, limiting the amount of augmented data they could effectively process.

Overall, the research concludes that Retrieval Augmentation is a powerful method for enhancing NER performance, particularly for languages with limited resources and for data with low contextual information. While fine-tuned transformer-encoder models like XLM-R and MuRIL generally outperformed the larger generative models due to being trained on the full dataset, the study also indicates the potential of few-shot prompting with LLMs when extensive training data is not available. The paper provides a detailed look into their methodology and results, which can be found in their full publication: Enhancing Hindi NER in Low Context: A Comparative study of Transformer-based models with vs. without Retrieval Augmentation.

Also Read:

The study also touched upon resource efficiency, noting that transformer-encoder models require less computational power and time compared to the much larger generative models. This suggests that for specific tasks like NER with sufficient datasets, the former might be a more efficient choice.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Improving Hindi Named Entity Recognition with External Knowledge

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates