TLDR: BiPETE is a novel deep learning model that uses a unique bi-positional embedding strategy to analyze electronic health records (EHRs) for predicting the risk of alcohol and substance use disorders (ASUD) in patients with depression and PTSD. It integrates relative and absolute temporal information from patient visits, achieving high accuracy (over 90% AUROC and AUPRC) without large-scale pretraining. The model also provides interpretable insights, identifying specific clinical factors and medications associated with increased or decreased ASUD risk, offering a practical tool for early intervention.
Predicting the risk of developing alcohol and substance use disorders (ASUD) is a critical challenge in healthcare, especially for individuals already struggling with mental health conditions like depression and post-traumatic stress disorder (PTSD). Electronic Health Records (EHRs) contain a wealth of information that could help, but their complex, irregular, and time-sensitive nature makes them difficult for traditional models to interpret effectively.
Researchers have introduced a new deep learning model called BiPETE, which stands for Bi-Positional Embedding Transformer Encoder. This innovative model is designed to improve the accuracy of ASUD risk prediction by better understanding the temporal patterns within a patient’s medical history. Unlike many other advanced models, BiPETE doesn’t require extensive pre-training on massive datasets, making it more accessible for deployment in various clinical settings.
The core strength of BiPETE lies in its unique approach to handling time-related information in EHRs. It uses a dual positional encoding strategy, combining two types of embeddings: Rotary Positional Embeddings (RoPE) and Sinusoidal Positional Embeddings (SPE). RoPE helps the model understand the relative time differences between patient visits, which can vary greatly. For example, it can discern if two events happened 5 days apart versus 50 days apart. SPE, on the other hand, maintains the absolute chronological order of visits, ensuring the model knows which visit came first, second, and so on. This combination allows BiPETE to capture both the precise timing and the sequence of medical events, offering a more nuanced understanding of a patient’s health trajectory.
To make the EHR data more manageable and meaningful, the researchers preprocessed it by grouping diagnosis codes into broader categories. This significantly reduced the complexity and redundancy of the vocabulary, allowing the model to focus on more impactful patterns. The model was trained and evaluated on EHR data from two specific mental health cohorts: patients with depressive disorders and those with PTSD, sourced from the National Institutes of Health (NIH) All of Us (AoU) program.
BiPETE demonstrated impressive performance in predicting ASUD risk. In the depression cohort, it achieved an AUROC (Area Under Receiver Operating Characteristic curve) of 96.46% and an AUPRC (Area Under Precision-Recall Curve) of 93.18%. For the PTSD cohort, the scores were similarly high, with an AUROC of 96.50% and an AUPRC of 94.04%. These results significantly outperformed traditional baseline models like BiGRU, Logistic Regression, and Bernoulli Naive Bayes, highlighting the effectiveness of BiPETE’s dual positional encoding strategy.
Beyond just prediction, BiPETE also offers valuable insights into *why* it makes certain predictions. Using a method called Integrated Gradients, the researchers identified specific clinical features that either increase or decrease the risk of ASUD. For patients with depression, indicators of higher ASUD risk included abnormal lymphocyte and coagulation markers (suggesting chronic inflammation and liver dysfunction), and medications like Vancomycin, Metronidazole, and Acyclovir. Comorbidities such as chronic pain, musculoskeletal injuries, and gastrointestinal conditions also increased risk. Conversely, lower ASUD risk was associated with normal red blood cell indices, medications like Naloxone, Cefazolin, Amoxicillin, and certain chronic but manageable conditions that encourage regular healthcare engagement.
In the PTSD cohort, higher ASUD risk was linked to abnormalities in hematological and metabolic markers (like serum albumin and mean platelet volume), and medications such as Hydrocodone, Oxybutynin, and Aripiprazole. Neurological conditions and chronic pain-related issues also contributed to increased risk. Factors associated with lower ASUD risk included adequate levels of Vitamin B12 and ferritin, and medications like Lidocaine, Hydroxyzine, Lamotrigine, and Escitalopram. Similar to the depression cohort, certain comorbidities requiring structured medical oversight, such as bone disorders or COVID-19, were also found to be protective.
Also Read:
- Advancing Cancer Diagnosis: A New End-to-End Approach for Whole Slide Image Analysis
- TimeSearch-R: A New AI Approach for Understanding Long Videos Through Adaptive Search
This research presents a practical and interpretable framework for disease risk prediction using EHR data. By effectively modeling the temporal dynamics of patient records, BiPETE can achieve strong performance without relying on large-scale pretraining, making it a valuable tool for clinicians. The insights gained from its interpretability features can help in early identification and personalized interventions for patients with co-occurring psychiatric and substance use disorders. You can read the full research paper here.


