TLDR: The CiteAgent framework uses LLM-based agents to simulate human behavior in academic citation networks, successfully replicating real-world phenomena like power-law distribution, citational distortion, and shrinking diameter. Through LLM-based Survey and Laboratory Experiments, it reveals that structural factors like preferential attachment and uneven author distribution, rather than intentional bias, drive many observed citation patterns. The research also demonstrates the potential of LLMs for conducting idealized social experiments and offers new insights into academic collaboration.
The field of social science research is constantly seeking new ways to understand complex human behaviors and societal dynamics. Traditional methods often rely on simplified models that struggle to capture the intricate, heterogeneous nature of human interaction. However, a groundbreaking new framework called CiteAgent, developed by a team of researchers including Jiarui Ji, Runlin Lei, and Zhewei Wei, is leveraging the power of Large Language Models (LLMs) to create more realistic and nuanced simulations of human behavior, particularly in the context of academic citation networks.
The core idea behind CiteAgent is to use LLM-based agents to mimic authors and their interactions within an academic ecosystem. These agents are endowed with distinct attributes, such as expertise, institution, and nationality, allowing for a diverse and dynamic simulation. The framework operates in iterative steps, each involving three main stages: Initialization, Socialization, and Creation. In the Initialization stage, new LLM-based authors are introduced. During Socialization, these authors engage in discussions and collaborations, sharing insights and developing paper drafts. Finally, in the Creation stage, authors utilize a simulated scholarly search engine to find relevant papers and finalize their drafts, making citation decisions.
One of the most compelling aspects of CiteAgent is its ability to accurately replicate well-established phenomena observed in real-world citation networks. For instance, the simulations successfully capture the power-law distribution of citations, where a few papers receive a disproportionately high number of citations, mirroring the “rich-get-richer” effect. The framework also demonstrates citational distortion, where certain groups (e.g., authors from “core” countries) appear to receive more citations, and the shrinking diameter phenomenon, indicating that the network becomes more interconnected over time.
To rigorously analyze these phenomena, the researchers introduced two LLM-based research paradigms: LLM-SE (LLM-based Survey Experiment) and LLM-LE (LLM-based Laboratory Experiment). LLM-SE involves posing structured questionnaires to LLM agents to understand their reference selection motivations, similar to human surveys. LLM-LE, on the other hand, manipulates independent variables in controlled settings to infer causality, much like traditional laboratory experiments.
Using these paradigms, CiteAgent provided fascinating insights into the mechanisms behind citation patterns. For example, in studying the power-law distribution, experiments revealed that both the recommendation algorithms of scholarly search engines and the LLM agents’ inherent inclination to cite highly referenced papers (preferential attachment) contribute significantly to this distribution. Interestingly, more advanced LLMs like GPT-4o-mini and LLAMA-3-70B were found to be more effective at simulating this preferential referencing behavior compared to GPT-3.5, suggesting that LLM capabilities directly impact simulation authenticity.
The framework also delved into the complex issue of citational distortion, a phenomenon where papers from certain countries appear to receive more citations for similar content. Previous research used metrics like the beta coefficient and self-citation rate (SCR) to suggest intentional bias. However, CiteAgent’s LLM-LE experiments challenged this view. They demonstrated that preferential attachment alone could explain the elevated beta coefficient for “core” countries. Furthermore, while core countries did show higher SCRs, this disparity vanished when the distribution of authors across countries was equalized. This strongly suggests that the observed inequality in international citations primarily stems from the uneven distribution of researchers globally, rather than deliberate bias based on nationality. To provide a more reliable measure, the researchers proposed a new metric: the Referencing Preference Score (RPS), which normalizes for paper volume and confirmed the absence of intentional country-wise citation bias in their simulations.
Beyond validating existing theories, CiteAgent also serves as a platform for “idealized social experiments.” It successfully replicated network evolution properties like densification (increasing ratio of edges to nodes) and shrinking diameter. It also extended traditional Citation Content Analysis (CCA) by using LLM-SE to examine citation importance levels based on motivation and placement. Furthermore, a co-authorship experiment revealed that multi-author collaborations significantly enhance the creativity and volume of published papers compared to single-author efforts, offering valuable insights for real-world academic environments.
Also Read:
- AI Models Uncover and Predict Human Cooperation in Game Theory Experiments
- AI-Powered System Generates Groundbreaking Research Ideas
While CiteAgent represents a significant leap forward, the researchers acknowledge its limitations. The authenticity of simulations is highly dependent on the LLMs’ ability to mimic human behavior, with less capable models like GPT-3.5 showing limitations in simulating complex preferences. Current models also simplify author attributes, excluding factors like academic reputation. Nevertheless, the work demonstrates the immense potential of LLMs for advancing science of science research in social science, offering a scalable and reproducible environment for studying human reference behaviors. You can find the full research paper here.


