Simulating Academic Networks: How LLMs Uncover Citation Dynamics

TLDR: The CiteAgent framework uses LLM-based agents to simulate human behavior in academic citation networks, successfully replicating real-world phenomena like power-law distribution, citational distortion, and shrinking diameter. Through LLM-based Survey and Laboratory Experiments, it reveals that structural factors like preferential attachment and uneven author distribution, rather than intentional bias, drive many observed citation patterns. The research also demonstrates the potential of LLMs for conducting idealized social experiments and offers new insights into academic collaboration.

The field of social science research is constantly seeking new ways to understand complex human behaviors and societal dynamics. Traditional methods often rely on simplified models that struggle to capture the intricate, heterogeneous nature of human interaction. However, a groundbreaking new framework called CiteAgent, developed by a team of researchers including Jiarui Ji, Runlin Lei, and Zhewei Wei, is leveraging the power of Large Language Models (LLMs) to create more realistic and nuanced simulations of human behavior, particularly in the context of academic citation networks.

The core idea behind CiteAgent is to use LLM-based agents to mimic authors and their interactions within an academic ecosystem. These agents are endowed with distinct attributes, such as expertise, institution, and nationality, allowing for a diverse and dynamic simulation. The framework operates in iterative steps, each involving three main stages: Initialization, Socialization, and Creation. In the Initialization stage, new LLM-based authors are introduced. During Socialization, these authors engage in discussions and collaborations, sharing insights and developing paper drafts. Finally, in the Creation stage, authors utilize a simulated scholarly search engine to find relevant papers and finalize their drafts, making citation decisions.

One of the most compelling aspects of CiteAgent is its ability to accurately replicate well-established phenomena observed in real-world citation networks. For instance, the simulations successfully capture the power-law distribution of citations, where a few papers receive a disproportionately high number of citations, mirroring the “rich-get-richer” effect. The framework also demonstrates citational distortion, where certain groups (e.g., authors from “core” countries) appear to receive more citations, and the shrinking diameter phenomenon, indicating that the network becomes more interconnected over time.

To rigorously analyze these phenomena, the researchers introduced two LLM-based research paradigms: LLM-SE (LLM-based Survey Experiment) and LLM-LE (LLM-based Laboratory Experiment). LLM-SE involves posing structured questionnaires to LLM agents to understand their reference selection motivations, similar to human surveys. LLM-LE, on the other hand, manipulates independent variables in controlled settings to infer causality, much like traditional laboratory experiments.

Using these paradigms, CiteAgent provided fascinating insights into the mechanisms behind citation patterns. For example, in studying the power-law distribution, experiments revealed that both the recommendation algorithms of scholarly search engines and the LLM agents’ inherent inclination to cite highly referenced papers (preferential attachment) contribute significantly to this distribution. Interestingly, more advanced LLMs like GPT-4o-mini and LLAMA-3-70B were found to be more effective at simulating this preferential referencing behavior compared to GPT-3.5, suggesting that LLM capabilities directly impact simulation authenticity.

The framework also delved into the complex issue of citational distortion, a phenomenon where papers from certain countries appear to receive more citations for similar content. Previous research used metrics like the beta coefficient and self-citation rate (SCR) to suggest intentional bias. However, CiteAgent’s LLM-LE experiments challenged this view. They demonstrated that preferential attachment alone could explain the elevated beta coefficient for “core” countries. Furthermore, while core countries did show higher SCRs, this disparity vanished when the distribution of authors across countries was equalized. This strongly suggests that the observed inequality in international citations primarily stems from the uneven distribution of researchers globally, rather than deliberate bias based on nationality. To provide a more reliable measure, the researchers proposed a new metric: the Referencing Preference Score (RPS), which normalizes for paper volume and confirmed the absence of intentional country-wise citation bias in their simulations.

Beyond validating existing theories, CiteAgent also serves as a platform for “idealized social experiments.” It successfully replicated network evolution properties like densification (increasing ratio of edges to nodes) and shrinking diameter. It also extended traditional Citation Content Analysis (CCA) by using LLM-SE to examine citation importance levels based on motivation and placement. Furthermore, a co-authorship experiment revealed that multi-author collaborations significantly enhance the creativity and volume of published papers compared to single-author efforts, offering valuable insights for real-world academic environments.

Also Read:

While CiteAgent represents a significant leap forward, the researchers acknowledge its limitations. The authenticity of simulations is highly dependent on the LLMs’ ability to mimic human behavior, with less capable models like GPT-3.5 showing limitations in simulating complex preferences. Current models also simplify author attributes, excluding factors like academic reputation. Nevertheless, the work demonstrates the immense potential of LLMs for advancing science of science research in social science, offering a scalable and reproducible environment for studying human reference behaviors. You can find the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Simulating Academic Networks: How LLMs Uncover Citation Dynamics

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates