AEGIS: Automating Geographic Paper Identification in Academia

TLDR: AEGIS is a novel, fully automated AI system designed to streamline scholarly discovery. It identifies research papers from specific geographic regions within conference proceedings and then uses Robotic Process Automation (RPA) to perform predefined actions, such as submitting nomination forms. Validated on 586 papers, the system achieved 100% recall and near-perfect accuracy (99.4%), demonstrating its potential to significantly accelerate academic workflows by transitioning from data discovery to direct action.

In the fast-paced world of academia, keeping up with the ever-growing volume of research literature is a monumental task for researchers, funding bodies, and academic societies alike. The manual effort involved in discovering relevant scholarly work can be incredibly time-consuming, often diverting valuable resources from actual research.

Addressing this challenge, a novel, fully automated system named AEGIS (An Agent for Extraction and Geographic Identification in Scholarly Proceedings) has been introduced. This innovative pipeline transitions seamlessly from data discovery to direct action, promising to significantly accelerate academic workflows.

At the heart of AEGIS is a specialized AI agent, ‘Agent-E’, tasked with a crucial mission: identifying papers from specific geographic regions within conference proceedings. Once identified, Agent-E doesn’t stop there; it then executes a Robotic Process Automation (RPA) to complete predefined actions, such as submitting a nomination form for these papers.

How AEGIS Works: A Seamless Automation Pipeline

The AEGIS system operates through a meticulously designed workflow, integrating several advanced components:

Data Ingestion: The process begins with a simple URL of a conference’s proceedings page. An automated web browser framework ensures that dynamically loaded content is fully rendered before extracting the complete HTML source code.
HTML Parsing and Hyperlink Discovery: The raw HTML is converted into a structured parse tree, allowing the system to systematically extract every hyperlink present on the page.
Layout-Aware Link Normalization: Recognizing that different conference websites have varying formats, AEGIS employs a smart module that analyzes the conference identifier to determine the structural layout. It then applies specialized extraction strategies, whether for flat-list structures (like IEEE Xplore) or track-based structures (like ACM and ACL), to create a standardized list of paper URLs.
Prompt Engineering and AI Agent Invocation: For each normalized paper URL, a dynamic prompt is generated and sent to Agent-E’s REST API. This allows for real-time processing of the agent’s output.
AI Response Parsing and Data Structuring: Agent-E’s semi-structured text response is converted into a reliable format using a multi-stage parsing module. This includes rule-based parsing for simple data and robust JSON parsing for complex information, with fallback mechanisms for inconsistencies. A crucial verification layer ensures data quality, confirming that author and institution lists are not empty.
Nomination via Robotic Process Automation (RPA): The final stage involves automated submission of nomination forms. Using the Selenium framework, the system navigates to the target form, dynamically adds author fields, populates all necessary fields (title, author names, affiliation, research area) from the structured data, and submits the form, verifying a successful transaction.

Also Read:

Validation and Impact

The effectiveness of AEGIS was rigorously evaluated on 586 papers from five diverse conference datasets. The results were exceptionally promising: the system achieved perfect 100% accuracy, precision, and recall on three datasets (ACM SIGKDD, ACL, and NeurIPS). On more challenging datasets like IEEE ICDM and a custom collection, it maintained an impressive 99% accuracy, with minor errors attributed to ambiguous affiliation strings.

Crucially, AEGIS achieved a perfect recall rate of 1.00 across all 586 papers, meaning zero relevant papers were missed. For a discovery and nomination pipeline, ensuring no relevant paper is overlooked is paramount, making this a significant achievement.

This demonstration highlights the immense potential of task-oriented AI agents to not only filter information but also to actively participate in and accelerate the workflows of the academic community. By bridging the gap between simple information extraction and meaningful, real-world task execution, AEGIS offers a powerful tool for scholarly discovery.

Future work aims to expand the system’s capabilities to a wider array of publishers and repositories, and to further refine the AI agent’s ability to disambiguate complex affiliation strings, thereby reducing any remaining false positives. To learn more about this innovative system, you can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

AEGIS: Automating Geographic Paper Identification in Academia

How AEGIS Works: A Seamless Automation Pipeline

Validation and Impact

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

Astreya Unveils New Wave of Enterprise AI Agents to Boost Business Efficiency and Automation

Vida Secures $4 Million Series A Funding to Advance AI Voice Technology and Expand Leadership

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates