TLDR: OpenAI has launched IndQA, a groundbreaking benchmark designed to evaluate the proficiency of AI models in understanding and reasoning within the rich tapestry of Indian languages, culture, and context. Developed with 261 Indian experts, IndQA features 2,278 natively written questions across 12 languages and 10 cultural domains, aiming to foster AI that is truly relevant and inclusive for India’s diverse population.
OpenAI has officially unveiled IndQA, a pioneering benchmark dataset aimed at rigorously testing how effectively artificial intelligence models comprehend and engage with questions deeply rooted in Indian languages, cultural nuances, and local contexts. This initiative marks OpenAI’s first dedicated effort to create a region-specific evaluation framework, with plans to extend similar benchmarks to other languages and regions globally.
IndQA is designed to bridge the gap between global AI development and the specific needs of India’s diverse linguistic and cultural landscape. The benchmark comprises an extensive collection of 2,278 questions, spanning 12 Indian languages including Hindi, Hinglish, Gujarati, Punjabi, Kannada, Odia, Marathi, Malayalam, Tamil, Bengali, Telugu, and Urdu. These questions delve into 10 distinct cultural domains: Law and ethics, Architecture and design, Food and cuisine, Everyday life, Religion and spirituality, Sports and recreation, Literature and linguistics, Media and entertainment, Arts and culture, and History.
The development of IndQA was a collaborative effort, involving 261 Indian domain experts, including journalists, linguists, scholars, artists, and industry practitioners. A key distinguishing feature of IndQA is that its questions are natively written, rather than translated, ensuring they accurately reflect the unique ways people in India think, speak, and pose questions. Each question is accompanied by a detailed grading rubric and an ideal response, crafted by these experts, to provide a precise measure of AI performance.
OpenAI stated that the questions were intentionally designed to be challenging for even leading AI models, such as GPT-4o and GPT-5. Initial performance scores indicate that even top models currently score below 40%, underscoring the significant challenges in achieving deep non-English language understanding and cultural contextual awareness. Srinivas Narayanan, CTO B2B Applications at OpenAI, highlighted India’s strategic importance, noting it was chosen as an ‘obvious starting point given its market size, linguistic diversity with approximately one billion people who don’t use English as their primary language, and cultural richness.’ He added, ‘This dataset helps our models understand Indian nuances more deeply. The experts also provide evaluation rubrics, so we can measure how well the AI performs on culturally grounded questions. Our goal is to take this as a playbook and use it in other countries too.’
Nick Turley, OpenAI’s Vice President and Head of ChatGPT, emphasized the company’s commitment to inclusivity: ‘India is a rapidly growing, digitally connected market with a unique linguistic and cultural fabric. IndQA enables us to push the boundaries on creating AI that truly understands and supports India’s diverse population.’
Also Read:
- OpenAI Appoints Pragya Misra to Lead India Strategy and Global Affairs
- Global Generative AI Models’ Free Access Poses Challenge for Indian Developers
This launch is part of OpenAI’s broader engagement strategy in India, which includes the establishment of a local office in New Delhi and initiatives like the ChatGPT Go promotion offering free premium access to Indian users. India currently stands as OpenAI’s second-largest market for ChatGPT, boasting 8 million weekly active users globally. The rollout of IndQA is expected to guide future AI product development tailored to India’s heterogeneous ecosystem, supporting applications in education, healthcare, governance, and entertainment that resonate authentically with Indian users. The benchmark will also serve as a tool to track AI improvement over time and facilitate the development of similar evaluation frameworks for other low-resource languages worldwide.


