Recommender Systems: A Call for Fundamental Rethinking, Fifteen Years On

TLDR: This research paper argues that despite significant algorithmic advancements, recommender systems research continues to operate on flawed foundations identified over a decade ago. The authors contend that the field’s focus on narrow accuracy metrics, inadequate evaluation practices, and a lack of transparency has led to sophisticated systems built on unstable ground. They highlight new issues like environmental impact and ethical fragility, advocating for a fundamental shift towards human-centered, sustainable, and epistemically humble research that prioritizes understanding, reflection, and accountability over mere optimization.

Fifteen years ago, a stark warning was issued to the field of recommender systems: “We’re doing it all wrong.” Today, a new research paper titled “We’re Still Doing It (All) Wrong: Recommender Systems, Fifteen Years Later” by Alan Said, Maria Soledad Pera, and Michael D. Ekstrand argues that this critique remains as relevant as ever. Despite significant advancements in algorithms and technology, the fundamental issues identified back then have not been corrected; instead, new layers of complexity have been built upon the same shaky foundations.

The Enduring Flaws in Recommender Systems Research

The core problem, as highlighted by the authors, stems from a persistent misinterpretation of data and methodological shortcuts. For instance, ratings, which are inherently ordinal (representing a sequence or order), are often treated as interval data (where differences between values are meaningful), leading to the application of inappropriate statistical methods. This fundamental misunderstanding has driven a narrow focus on prediction accuracy, where researchers often chase minuscule performance gains in metrics like RMSE (Root Mean Square Error) or nDCG (Normalized Discounted Cumulative Gain), without truly improving the user experience.

The paper points out that evaluation practices are largely dataset-driven, meaning models are typically judged on their ability to predict what users have already done, rather than their capacity to genuinely enhance future user experiences or help users discover new, relevant items they haven’t encountered before. This approach creates a disconnect between what is optimized in research and what truly benefits users in the real world.

Reproducibility and New Challenges

Beyond these long-standing issues, the research also sheds light on problems with reproducibility. Even simple algorithms can yield inconsistent results across different frameworks due to varying default settings, metrics, and evaluation logic. Details like data preprocessing, timestamp filtering, or risks of data leakage between training and testing sets are often overlooked or not disclosed, making it difficult to compare and build upon existing work reliably.

The authors identify “new sins” that have emerged with the field’s evolution:

Environmental Neglect: Modern recommender systems, especially those leveraging large language models (LLMs), are increasingly resource-intensive. Yet, there’s a significant lack of transparency regarding their compute costs and carbon footprint.
Unchecked Reliance on LLMs: The integration of LLMs is often done without thoroughly verifying their necessity or ensuring their benefits outweigh their substantial costs, sometimes performing on par with less resource-intensive alternatives.
Ethical Fragility: While algorithmic fairness and user autonomy are discussed, they are rarely integrated into the core design or evaluation of these systems, often being considered only as an afterthought.
User Disempowerment: Recommender systems frequently “push” content rather than “negotiate” with users, leading to a lack of transparency, control, and genuine interaction from the user’s perspective.

Also Read:

A Vision for “Doing It Right”

The paper argues that meaningful change requires more than just new metrics or better tools; it demands a fundamental reframing of what recommender systems research is for, who it serves, and how knowledge is produced and validated. “Doing it right” means asking better questions and reflecting deeply on the answers. This includes:

Utilizing diverse datasets and evaluating systems across varied contexts.
Focusing on human-aligned evaluation that prioritizes meaningful outcomes over mere offline precision.
Ensuring transparent reporting of all experimental details, from preprocessing to compute costs and code.
Adopting sustainable practices, making energy consumption a primary consideration.
Embracing epistemic humility, acknowledging the inherent noise, preference volatility, and limitations of modeling.
Grounding research in normative and human goals, explicitly connecting technical work to broader societal values like justice, inclusion, and well-being.

Community-led initiatives, such as the AltRecSys, NORMalize, and RecSoGood workshops, are attempting to shift this paradigm by encouraging critical thinking, value-sensitive research, and participatory design approaches where users are involved as co-designers and co-evaluators. Platforms like Informfully and POPROX are also emerging to support more holistic, in-vivo evaluations.

Ultimately, the authors contend that recommender systems are not just algorithms; they are sociotechnical interventions that profoundly shape what people see, believe, and desire. Therefore, research must move beyond narrow technical optimization to a broader understanding of influence and impact, ensuring these systems truly serve people and society. You can read the full research paper for more details: We’re Still Doing It (All) Wrong: Recommender Systems, Fifteen Years Later.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Recommender Systems: A Call for Fundamental Rethinking, Fifteen Years On

The Enduring Flaws in Recommender Systems Research

Reproducibility and New Challenges

A Vision for “Doing It Right”

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

Ghana Navigates Complexities in AI Regulatory Development Amidst Coordination Challenges

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates