TLDR: EHSAN is a new framework that combines ChatGPT’s AI labeling with human review to create the first explainable dataset for analyzing Arabic patient feedback in healthcare. It addresses challenges like dialect diversity and limited data. The study found that AI-generated labels are highly effective, especially when combined with some human oversight, allowing for accurate sentiment and aspect analysis of patient reviews using Arabic-specific AI models.
Patient feedback is crucial for improving healthcare quality, but analyzing free-form reviews, especially in Arabic, presents significant challenges. Issues like diverse dialects and a lack of specific sentiment labels for different aspects of care make automated assessment difficult. To address this, researchers have introduced EHSAN, a new data-focused system that combines the power of AI, specifically ChatGPT, with human review to create the first explainable dataset for Arabic aspect-based sentiment analysis in healthcare.
EHSAN works by taking Arabic patient reviews and breaking them down into individual sentences. Each sentence is then labeled with a specific aspect of healthcare it refers to (e.g., medical staff, billing, facilities) and the sentiment expressed (positive, negative, or neutral). A unique feature of EHSAN is that ChatGPT also provides a brief explanation for each label, making the AI’s decisions more transparent and verifiable.
To understand how much human involvement is needed, the researchers created three versions of the training data: one fully reviewed by humans, one with 50% human review, and one relying entirely on AI-generated labels. They then used these datasets to train two types of AI models: AraBERT, which is specifically designed for Arabic, and DistilBERT, a more general multilingual model.
The results were very encouraging. The Arabic-specific model, AraBERT, performed exceptionally well, even when trained mostly on AI-generated labels with minimal human supervision. There was only a small drop in performance when using only ChatGPT’s labels compared to fully human-reviewed data. This suggests that AI can be a reliable tool for creating high-quality datasets, especially in languages where human-annotated data is scarce.
Another important finding was that simplifying the categories for aspects significantly improved the models’ performance. When the number of aspect categories was reduced from 17 detailed ones to 6 broader ones, the classification accuracy increased noticeably. This indicates that a more focused categorization can make it easier for AI models to learn and classify sentiments accurately.
While AI-generated labels proved highly effective, human review still added value, particularly for refining very specific or nuanced categories. A hybrid approach, combining AI’s efficiency with targeted human validation, appears to be a cost-effective way to build robust datasets. The study also highlighted the importance of using language-specific AI models like AraBERT, which consistently outperformed the general-purpose DistilBERT due to its better understanding of Arabic nuances.
The EHSAN framework offers a practical and scalable solution for analyzing patient feedback in Arabic, providing detailed and explainable insights into patient experiences. This can help hospitals and healthcare providers make data-driven improvements. Future work will explore applying this framework to different Arabic dialects and healthcare settings, refining AI prompting strategies, and integrating AI explanations directly into model training for even greater transparency.
Also Read:
- ProKG-Dial: Crafting Specialized AI Conversations with Knowledge Graphs
- Advancing Medical AI: A Deep Dive into Reasoning Capabilities of Large Language Models
The anonymized EHSAN dataset and experimental code are publicly available for further research and development. You can find more details in the full research paper: EHSAN: Leveraging ChatGPT in a Hybrid Framework for Arabic Aspect-Based Sentiment Analysis in Healthcare.


