TLDR: Initial findings from a University of Sheffield project, funded by the Economic and Social Research Council, indicate that generative artificial intelligence, such as large language models like ChatGPT, is more effective than conventional citation data for evaluating the quality of academic papers. This suggests a significant potential for AI integration into national research assessment frameworks like the Research Excellence Framework.
A groundbreaking research project, currently in its nascent stages, has unveiled compelling evidence that generative artificial intelligence (AI) possesses a superior capability to traditional citation data when it comes to judging the quality of academic research. Led by Professor Mike Thelwall, a distinguished professor of data science at the University of Sheffield, the initiative is backed by funding from the Economic and Social Research Council (ESRC).
The project specifically investigates the efficacy of large language models (LLMs), including prominent examples like ChatGPT, in assessing the caliber of academic journal articles. Professor Thelwall’s preliminary findings strongly suggest that these AI models can indeed perform this task with remarkable proficiency, potentially surpassing the long-standing reliance on citation counts as a primary metric for research evaluation.
This development carries profound implications for national research assessment exercises, such as the UK’s Research Excellence Framework (REF). If further validated, the integration of generative AI could revolutionize how academic output is evaluated, moving beyond quantitative citation metrics to a more nuanced, AI-driven qualitative assessment. While citation data has been a cornerstone of academic evaluation for decades, its limitations in capturing the full scope of research impact and quality have often been debated. The emergence of AI offering a potentially more insightful alternative marks a pivotal shift.
Also Read:
- Generative AI Propels Sophisticated Academic Fraud in Chinese Paper Mills
- When AI Reviews AI: Exploring Biases in Large Language Model Peer Review
Experts in the field note that AI’s ability to process vast amounts of textual data, understand context, and identify subtle indicators of quality could provide a more comprehensive and less biased evaluation than simple citation counts. However, the academic community also emphasizes the importance of responsible AI development and deployment, ensuring transparency, mitigating biases, and maintaining human oversight in critical assessment processes. The ongoing research by Professor Thelwall and his team will be crucial in detailing the methodologies and validating the robustness of AI’s performance in this complex domain.


