TLDR: The Khana dataset is a new, comprehensive benchmark for Indian cuisine, featuring 131,000 images across 80 categories. It addresses the significant gap in food AI research for Indian dishes, which are complex and diverse. The dataset supports classification, segmentation, and retrieval tasks, providing a structured taxonomy and baseline evaluations using state-of-the-art models like ConvNeXT, which achieved the highest accuracy. Khana aims to advance food recognition technology for Indian culinary traditions.
As the world’s palate expands and interest in diverse culinary experiences grows, the demand for advanced food image models is soaring. These models are crucial for applications ranging from accurate food recognition and recipe suggestions to dietary tracking and automated meal planning. However, despite a wealth of food datasets available, a significant void has existed in comprehensively capturing the rich and varied nuances of Indian cuisine.
This gap is now being addressed with the introduction of Khana, a groundbreaking benchmark dataset specifically designed for food image classification, segmentation, and retrieval of Indian dishes. Khana stands out by establishing a detailed taxonomy of Indian cuisine and offering an impressive collection of approximately 131,000 images, spread across 80 distinct labels, each with a resolution of 500×500 pixels.
The Challenge of Indian Cuisine in AI
Indian cuisine is a vibrant tapestry of flavors and textures, characterized by its vast regional diversity, intricate preparations, and subtle visual distinctions. These complexities, often masked by close resemblances between dishes, pose a unique challenge for image classification algorithms. While food classification has seen considerable effort in Western and other Asian cuisines (predominantly Japanese or Chinese), Indian cuisine has remained largely underrepresented in research.
Khana directly tackles this issue by providing a comprehensive and challenging benchmark. It aims to bridge the gap between academic research and practical development, serving as a valuable resource for researchers and developers alike who are keen on leveraging the rich culinary heritage of India in real-world applications.
Building the Khana Dataset
The creation of Khana involved a meticulous process of data collection and cataloguing. Images were gathered from popular search engines and online food delivery platforms such as Swiggy and Zomato using web crawlers. Duplicate images were carefully removed, and low-quality images were filtered out to ensure the dataset’s integrity.
A key feature of Khana is its innovative taxonomy, which organizes food items hierarchically based on their preparation methods, regional origins, and cultural significance. This structure provides well-defined categories and subcategories, such as ‘breakfast’, ‘main course’, ‘snacks’, and ‘beverages’, with specific dishes like dosa, biryani, gulab jamun, and chaas. The dataset also accounts for multilingual conventions, grouping varied Hinglish keywords for the same dish (e.g., ‘pani puri’, ‘pani poori’, ‘golgappa’). Manual verification by annotators further ensured label accuracy.
Dataset Statistics and Characteristics
The Khana dataset comprises around 131,000 images across 80 different classes, with each image standardized to 500×500 pixels. It is split into training, validation, and test sets with a 70%, 15%, and 15% distribution, respectively. While comprehensive, the dataset does exhibit an imbalanced class distribution, with popular dishes like masala dosa and biryani having more samples than niche items, a common challenge that may require data augmentation techniques for optimal model performance.
Experimental Baselines and Promising Results
To establish initial benchmarks, the creators of Khana evaluated several state-of-the-art models, including Residual Networks (ResNet), EfficientNet, Vision Transformer (ViT), and ConvNeXT. These models, pre-trained on the extensive ImageNet dataset, were fine-tuned on Khana for image classification tasks.
The experimental analysis revealed that the ConvNeXT-S model achieved the highest performance, boasting a top-1 accuracy of 86.72% and a top-5 accuracy of 97.58%. This performance surpassed other leading models, demonstrating the dataset’s utility in pushing the boundaries of food recognition technology.
Also Read:
- The Hidden Challenge of AI: Generalizing Attributes Beyond Familiar Categories
- Advancing Surgical Scene Understanding with Feature-Adaptive Segmentation
Looking Ahead
While Khana represents a significant leap forward, the researchers acknowledge limitations such as class imbalance and the need for more fine-grained distinctions in evaluation metrics. Future work includes expanding the dataset with more images for underrepresented categories, incorporating new cuisines, improving annotations, and exploring the potential of multi-modal Large Language Models (LLMs) for querying images and comparing embeddings.
Khana is poised to empower research, fuel innovation, and celebrate the diversity and richness of Indian food, one pixel at a time. For more in-depth information, you can read the full research paper here.


