TLDR: A study by Rodrigo Tertulino and Ricardo Almeida used a multi-level machine learning approach on Brazil’s SAEB microdata to identify factors influencing student performance. Their Random Forest model achieved 90.2% accuracy. Explainable AI (XAI) revealed that the school’s average socioeconomic level is the most dominant predictor, indicating that systemic factors, rather than isolated individual characteristics, have a greater impact on academic outcomes. The research provides actionable insights for policymakers and school leaders to address educational equity by focusing on school-level disparities.
Understanding what truly drives student performance in basic education is a critical challenge, especially in a diverse country like Brazil. A recent study by Rodrigo Tertulino and Ricardo Almeida delves into this complex issue, using a sophisticated machine learning approach to analyze vast microdata from Brazil’s System of Assessment of Basic Education (SAEB). Their findings offer profound insights, suggesting that academic success is less about individual student traits and more about the broader school environment and socioeconomic context.
Unpacking the Data: A Multi-Level Approach
The researchers developed a unique multi-level machine learning model, integrating four distinct data sources from the SAEB assessment: student socioeconomic characteristics, teacher professional profiles, school indicators, and director management profiles. This comprehensive dataset allowed for a holistic view of the factors at play, moving beyond isolated variables to understand their intricate interplay.
The Power of Prediction: Random Forest Leads the Way
To identify the most effective predictive model, the study compared four powerful tree-based ensemble algorithms: Random Forest, XGBoost, LightGBM, and CatBoost. The Random Forest model emerged as the clear winner, achieving an impressive 90.2% accuracy and an Area Under the Curve (AUC) of 96.7%. This means the model could correctly predict whether a 9th-grade or high school student would perform above or below average in about 9 out of 10 cases, demonstrating its robust and reliable predictive power.
Beyond Prediction: Explaining What Matters Most
The study didn’t stop at just predicting performance; it also sought to explain *why* certain predictions were made. Using Explainable AI (XAI) techniques, specifically SHAP (SHapley Additive exPlanation), the researchers uncovered the most influential factors. The results were striking: the school’s average socioeconomic level was identified as the single most dominant predictor of student performance. This finding highlights that systemic factors, such as the collective background of students within a school, have a greater impact than individual characteristics alone.
Other significant factors included parental education levels, access to home resources (like computers and the number of bedrooms), the percentage of teachers with adequate training, and the student participation rate in the SAEB test. While individual teacher characteristics, such as years of experience, were present, their influence was less pronounced compared to the broader school-wide indicators.
Also Read:
- Improving Student Cognitive Diagnosis Across Subjects with Deep Transfer Learning
- AI Models Achieve Human-Level Accuracy in Evaluating Teaching Quality
Actionable Insights for Educational Equity
The implications of these findings are substantial for policymakers and school leaders in Brazil. For policymakers, the study provides data-driven evidence to justify and design policies that promote equitable resource distribution. Instead of equal allocation, resources can be strategically directed to schools with lower socioeconomic profiles, including investments in infrastructure, qualified teachers, and pedagogical support. The model also offers a tool for longitudinal policy evaluation, allowing officials to track the effectiveness of interventions over time.
School leaders can leverage these insights for targeted interventions. By understanding the specific drivers of underperformance within their schools, they can move beyond generic tutoring to create tailored programs. For instance, if low maternal education is a significant factor for a group of students, specific family outreach programs could be developed. The study ultimately advocates for structural policies that reduce systemic disparities between schools, rather than focusing solely on individual-level programs.
This research underscores that academic performance is a systemic phenomenon, deeply tied to the school’s ecosystem. It provides an interpretable, data-driven tool to inform policies aimed at fostering educational equity by addressing disparities between schools. You can read the full research paper for more details at https://arxiv.org/pdf/2510.22266.


