TLDR: AEGIS is a novel, fully automated AI system designed to streamline scholarly discovery. It identifies research papers from specific geographic regions within conference proceedings and then uses Robotic Process Automation (RPA) to perform predefined actions, such as submitting nomination forms. Validated on 586 papers, the system achieved 100% recall and near-perfect accuracy (99.4%), demonstrating its potential to significantly accelerate academic workflows by transitioning from data discovery to direct action.
In the fast-paced world of academia, keeping up with the ever-growing volume of research literature is a monumental task for researchers, funding bodies, and academic societies alike. The manual effort involved in discovering relevant scholarly work can be incredibly time-consuming, often diverting valuable resources from actual research.
Addressing this challenge, a novel, fully automated system named AEGIS (An Agent for Extraction and Geographic Identification in Scholarly Proceedings) has been introduced. This innovative pipeline transitions seamlessly from data discovery to direct action, promising to significantly accelerate academic workflows.
At the heart of AEGIS is a specialized AI agent, ‘Agent-E’, tasked with a crucial mission: identifying papers from specific geographic regions within conference proceedings. Once identified, Agent-E doesn’t stop there; it then executes a Robotic Process Automation (RPA) to complete predefined actions, such as submitting a nomination form for these papers.
How AEGIS Works: A Seamless Automation Pipeline
The AEGIS system operates through a meticulously designed workflow, integrating several advanced components:
- Data Ingestion: The process begins with a simple URL of a conference’s proceedings page. An automated web browser framework ensures that dynamically loaded content is fully rendered before extracting the complete HTML source code.
- HTML Parsing and Hyperlink Discovery: The raw HTML is converted into a structured parse tree, allowing the system to systematically extract every hyperlink present on the page.
- Layout-Aware Link Normalization: Recognizing that different conference websites have varying formats, AEGIS employs a smart module that analyzes the conference identifier to determine the structural layout. It then applies specialized extraction strategies, whether for flat-list structures (like IEEE Xplore) or track-based structures (like ACM and ACL), to create a standardized list of paper URLs.
- Prompt Engineering and AI Agent Invocation: For each normalized paper URL, a dynamic prompt is generated and sent to Agent-E’s REST API. This allows for real-time processing of the agent’s output.
- AI Response Parsing and Data Structuring: Agent-E’s semi-structured text response is converted into a reliable format using a multi-stage parsing module. This includes rule-based parsing for simple data and robust JSON parsing for complex information, with fallback mechanisms for inconsistencies. A crucial verification layer ensures data quality, confirming that author and institution lists are not empty.
- Nomination via Robotic Process Automation (RPA): The final stage involves automated submission of nomination forms. Using the Selenium framework, the system navigates to the target form, dynamically adds author fields, populates all necessary fields (title, author names, affiliation, research area) from the structured data, and submits the form, verifying a successful transaction.
Also Read:
- Charting the Path to Self-Driving Science with AI Agents
- Quantifying and Refining Large Language Model Performance in Academic Proposal Writing
Validation and Impact
The effectiveness of AEGIS was rigorously evaluated on 586 papers from five diverse conference datasets. The results were exceptionally promising: the system achieved perfect 100% accuracy, precision, and recall on three datasets (ACM SIGKDD, ACL, and NeurIPS). On more challenging datasets like IEEE ICDM and a custom collection, it maintained an impressive 99% accuracy, with minor errors attributed to ambiguous affiliation strings.
Crucially, AEGIS achieved a perfect recall rate of 1.00 across all 586 papers, meaning zero relevant papers were missed. For a discovery and nomination pipeline, ensuring no relevant paper is overlooked is paramount, making this a significant achievement.
This demonstration highlights the immense potential of task-oriented AI agents to not only filter information but also to actively participate in and accelerate the workflows of the academic community. By bridging the gap between simple information extraction and meaningful, real-world task execution, AEGIS offers a powerful tool for scholarly discovery.
Future work aims to expand the system’s capabilities to a wider array of publishers and repositories, and to further refine the AI agent’s ability to disambiguate complex affiliation strings, thereby reducing any remaining false positives. To learn more about this innovative system, you can read the full research paper here.


