TLDR: This research introduces an explainable AI approach using counterfactual reasoning to predict antidepressant (SSRI vs. SNRI) selection for Major Depressive Disorder (MDD) patients. It identifies how specific symptom changes, measured by the HAM-D scale, causally influence medication choice at both individual and population levels, enhancing interpretability for clinical decision support systems.
The research paper “Explainable Counterfactual Reasoning in Depression Medication Selection at Multi-Levels (Personalized and Population)” delves into how artificial intelligence can assist medical professionals in selecting the most suitable medication for individuals suffering from Major Depressive Disorder (MDD). The study specifically examines two widely used classes of antidepressants: Selective Serotonin Reuptake Inhibitors (SSRIs) and Serotonin-Norepinephrine Reuptake Inhibitors (SNRIs).
A significant challenge in treating MDD is that a considerable number of patients do not respond adequately to their initial antidepressant, leading to persistent symptoms and increased healthcare expenses. This often stems from the difficulty in accurately predicting which medication will be most effective for a patient’s unique symptom profile. This paper proposes that AI can enhance this process by identifying specific patterns within individual symptom presentations.
The researchers employed a technique known as “explainable counterfactual reasoning,” which utilizes counterfactual explanations (CFs). This method helps to clarify how alterations in a patient’s symptoms, as quantified by the Hamilton Rating Scale for Depression (HAM-D), could lead to a different medication recommendation. Essentially, it addresses hypothetical scenarios, such as “If a patient’s anxiety symptoms were less severe, would a different antidepressant be suggested?” This provides deeper insights into the causal relationships influencing medication selection, moving beyond simple correlations.
The study leveraged a dataset compiled from clinical trials, encompassing 1468 participants, with a focus on their initial HAM-D scores. The team evaluated 17 distinct machine learning models to predict medication choices. The Random Forest model emerged as the top performer, achieving an approximate score of 0.85 across various evaluation metrics including accuracy, F1 score, precision, recall, and ROC-AUC.
A core strength of this research lies in its capacity to offer insights at both individual and broader population levels. For personalized analysis, the method generates “sample-based counterfactual explanations.” For instance, if a patient is initially prescribed an SSRI, the system can illustrate the minimal changes in their HAM-D scores (e.g., an increase in appetite or sleep onset delay) that would prompt an SNRI recommendation. This includes considering real-world limitations, such as a symptom (like psychic anxiety) that cannot realistically be altered in the short term. Such specific adjustments quantitatively clarify how modifying certain depressive symptoms can lead to different treatment suggestions, highlighting the causal link between symptom severity and medication categories. This empowers clinicians to understand the precise symptom adjustments that could influence treatment decisions.
To further illuminate the model’s decision-making process, the researchers calculated “local feature importance.” This metric indicates which individual symptoms are most influential in predicting medication for a specific patient. For example, symptoms such as depressed mood, psychomotor agitation, loss of appetite, tiredness/pain, and suicidal thoughts or actions were identified as highly impactful for a particular patient. This capability enables the development of highly customized treatment strategies.
At the population level, the study determined “global feature importance” by aggregating the local importance scores from all patients. This analysis revealed that “depressed mood” (HAM-D01) was the most significant symptom influencing medication selection across the entire dataset, while “weight loss” (HAM-D16) had the least impact. These findings are consistent with clinical understanding, reinforcing the importance of focusing on key symptoms that clinicians prioritize when making treatment decisions.
The authors highlight that this work contributes to the development of quantifiable AI Clinical Decision Support Systems (CDSSs) for MDD medication selection. It offers personalized treatment recommendations and aims to build trust in AI by providing transparent explanations. While the findings are promising, the study acknowledges certain limitations, such as the dataset potentially not fully representing all patient groups and the computational intensity of the algorithm. Future research will focus on validating these findings with more diverse patient cohorts and optimizing the algorithms for practical clinical deployment.
Also Read:
- AI-Powered Clinical Decisions: A New Approach to Adaptive Patient Care
- Fairer Explanations for AI: Introducing MC3G for Actionable Recourse
For more detailed information, you can read the full research paper available at arXiv:2508.17207.


