TLDR: AquiLLM is a new, open-source Retrieval-Augmented Generation (RAG) system designed specifically for research groups. It helps capture, store, and retrieve informal and “tacit” knowledge—like meeting notes, emails, and experimental data—that is often fragmented and hard to access. Unlike general RAG tools, AquiLLM prioritizes privacy, supports diverse document types, and is easy to deploy and maintain on a group’s own infrastructure, fostering better collaboration and institutional memory.
Research groups, from university labs to scientific collaborations, constantly generate a vast amount of information. While formal publications and structured data are typically well-managed, a significant portion of a group’s collective knowledge often remains informal, fragmented, or undocumented. This includes crucial insights shared in meetings, through mentoring, or in day-to-day discussions, often referred to as ‘tacit knowledge’. This informal, experience-based expertise is vital but incredibly difficult to capture, store, and retrieve, making it challenging for new members to get up to speed or for existing members to find specific historical context.
Traditional search methods, like simple keyword searches, often fall short because they require users to know the exact terminology, which can vary widely across documents or over time. Information is scattered across different systems—from lab notebooks to email exchanges—making a comprehensive search a manual, time-consuming task. Furthermore, research evolves, and older documents might contain outdated information, leading to inconsistencies that traditional tools cannot resolve.
Enter Retrieval-Augmented Generation (RAG) systems, which combine information retrieval with large language models (LLMs) to provide answers grounded in source material. While many RAG-LLM applications focus on public documents, they often overlook the specific needs and privacy concerns of internal research materials. This is where AquiLLM (pronounced ah-quill-em) steps in.
Introducing AquiLLM: A Tailored Solution for Research Teams
AquiLLM is a lightweight, modular RAG system specifically designed to address the unique challenges faced by research groups. It aims to make both formal and informal knowledge more accessible by supporting varied document types and configurable privacy settings. The system is built with the academic ethos in mind, prioritizing self-hosting and control over infrastructure and data, which is crucial for confidentiality and operational independence.
One of AquiLLM’s core strengths is its ability to handle diverse information. It can ingest everything from formal publications to experimental notes, meeting minutes, and even email communications. By creating a unified knowledge base, AquiLLM allows researchers to pose natural language questions and receive coherent, contextual responses, even if the information is scattered across multiple sources and uses different terminology. For instance, if there’s conflicting information, AquiLLM’s embedded LLM can provide temporal context and highlight discrepancies, helping users make informed judgments.
Key Advantages for Research Groups
AquiLLM offers several significant benefits:
-
Semantic Search: Researchers can ask questions in natural language, and AquiLLM understands the concepts, not just keywords, finding relevant information even if the exact words aren’t present.
-
Unified Knowledge Base: It synthesizes information from various document types—publications, notes, emails—into comprehensive answers, saving researchers from extensive manual review.
-
Conflict Resolution: The system can highlight discrepancies and provide historical context when information conflicts, aiding in understanding how ideas have evolved.
-
Enhanced Collaboration: By centralizing knowledge, AquiLLM acts as a hub for collaboration, making insights and methodologies readily discoverable, which is particularly valuable for new team members.
Designed for Academic Environments
AquiLLM understands that research groups often have limited IT resources and prefer to maintain control over their data. Therefore, it is designed for minimal deployment overhead. Small groups can deploy the entire system using a single bash script on various Linux devices, including on-premise hardware or commercial cloud instances. It uses established technologies like Django and PostgreSQL, avoiding reliance on rapidly evolving AI-specific libraries to ensure long-term stability and maintainability.
For maximum data sovereignty, AquiLLM can integrate with Ollama, an open-source tool for hosting models locally, ensuring no group data ever leaves the group’s hardware. It also supports importing papers directly from academic repositories like arXiv and Zotero and offers single sign-on through popular identity providers used by universities, such as Google, Microsoft, and GitHub.
How AquiLLM Works
Interaction with AquiLLM involves two main processes: ingestion and conversation. During ingestion, users upload documents or import them via integrations with arXiv and Zotero into AquiLLM’s database. In conversation, users interact with a chat interface, similar to popular LLM tools, but with the added ability to specify which collections of documents the LLM should query. Unlike simpler RAG tools, AquiLLM uses ‘tool calling’ to give the LLM more sophisticated control over search functions, allowing it to explore the document collection more effectively to answer complex questions.
Security is a paramount concern, especially when dealing with private documents. AquiLLM allows groups to configure their deployment to meet specific security needs, from fully on-premise setups behind a VPN to cloud instances with robust permission systems. Collections of documents are private by default, with owners able to grant view and edit permissions to other users.
Also Read:
- AI Agent Streamlines Scientific Literature Review with Dynamic Hybrid Retrieval
- Bridging Knowledge and Logic in Language Models with UR2
Early Successes and Future Outlook
A functional beta version of AquiLLM is already deployed for a group of astronomers at UCLA. Users have successfully ingested research papers, meeting notes, and transcripts. A new lab member found AquiLLM particularly useful for catching up on the group’s research and understanding past decisions, demonstrating its effectiveness in addressing the very problem it was designed for. Another beta group, environmental scientists, is exploring its utility for informal data like recordings of meetings and training sessions.
AquiLLM fills a crucial gap in research infrastructure by providing a practical, privacy-conscious, and easy-to-manage system for accessing the often-hidden tacit knowledge within research teams. By focusing on the specific needs of scholarly groups, AquiLLM promises to enhance collaboration, streamline onboarding, and ensure greater continuity of knowledge within scientific endeavors. For more details, you can refer to the original research paper.


