TLDR: The paper introduces CoCoNUTS, a new benchmark dataset, and CoCoDet, a novel AI detector, designed to identify AI involvement in academic peer reviews by focusing on the substantive content rather than just stylistic cues. This approach aims to accurately distinguish between legitimate AI assistance and deceptive AI-generated content, demonstrating superior detection performance and revealing a rising trend of AI usage in real-world peer reviews.
The integration of large language models (LLMs) into academic peer review has brought both opportunities and challenges. While LLMs can assist reviewers with language refinement, there’s a growing concern about their use to generate the core content of reviews. Existing AI text detectors often fall short because they primarily focus on stylistic cues, making them vulnerable to paraphrasing attacks and unable to differentiate between minor language polishing and substantial AI-generated content. This can lead to unfairly flagging legitimate AI assistance or missing deceptively humanized AI reviews.
To address this critical issue, researchers have proposed a significant shift in how we detect AI involvement in peer reviews: by concentrating on content rather than just textual style. This new approach is embodied in CoCoNUTS, a content-oriented benchmark, and CoCoDet, an AI review detector.
Introducing CoCoNUTS: A Comprehensive Benchmark
CoCoNUTS is a comprehensive peer review benchmark designed to facilitate a fair and robust evaluation of LLM involvement. It features a fine-grained dataset of AI-generated peer reviews, covering six distinct modes of human-AI collaboration. These modes are categorized into three classes based on their content composition: Human, Mix, and AI.
The dataset was constructed by collecting reviews and papers from various top-tier conferences like ICLR, NeurIPS, and EMNLP. To create the diverse collaboration modes, advanced LLMs such as DeepSeek, Gemini, Llama, and Qwen were used. The six modes include: Human-Written (HW), Human-Written & Machine-Translated (HWMT), Human-Written & Machine-Polished (HWMP), Human-Written & Machine-Generated (HWMG), Machine-Generated (MG), and Machine-Generated & Machine-Polished (MGMP). This detailed categorization allows for a nuanced understanding of how AI contributes to review content.
The core task defined by CoCoNUTS is a ternary classification: identifying whether a review’s substantive origin is purely Human, purely AI, or a Mix of both. This content-focused approach aims to guide detection models toward more equitable and reliable outcomes.
CoCoDet: A Content-Focused Detector
To overcome the limitations of style-reliant detectors, CoCoDet was developed. This Content-Concentrated Detector utilizes a multi-task learning framework. It integrates a primary task of Content Composition Identification with three auxiliary tasks: Collaboration Mode Attribution, Content Source Attribution, and Textual Style Attribution. These auxiliary tasks are crucial for enabling the model to disentangle content features from stylistic ones.
The primary task identifies the review’s origin as Human, AI, or Mix. To enhance class separability and penalize critical errors between Human and AI classifications, CoCoDet employs a Cost-Sensitive Margin Loss (CSM-Loss). The auxiliary tasks help the model learn robust representations: Content Source Attribution traces the content back to the specific model or human that initially generated it, while Textual Style Attribution identifies the model responsible for the review’s final textual style. Collaboration Mode Attribution compels the model to understand the fine-grained compositional provenance of the text.
Also Read:
- Assessing Peer Review Quality: A New Framework for Constructive Feedback
- DeepTRACE: A Framework for Auditing AI Search and Research Systems
Key Findings and Real-World Impact
Experiments on the CoCoNUTS benchmark revealed that traditional LLM-based detectors and general AI-generated text detectors struggle with content-focused detection, often relying on superficial stylistic cues. In contrast, CoCoDet achieved state-of-the-art performance, with a macro F1-score exceeding 98% on the ternary detection task, significantly outperforming other models.
When applied to real-world conference reviews from post-ChatGPT eras, CoCoDet uncovered a clear year-over-year increase in AI usage. This trend includes not only the common practice of AI-assisted language polishing but also a concerning rise in fully machine-generated reviews. This highlights the urgent need for robust, content-based detection methods to maintain the fairness and reliability of scholarly evaluation.
This research provides a practical foundation for evaluating the use of LLMs in peer review and contributes to the development of more precise, equitable, and reliable detection methods for real-world scholarly applications. For more details, you can refer to the full research paper: CoCoNUTS: Concentrating on Content while Neglecting Uninformative Textual Styles for AI-Generated Peer Review Detection.


