TLDR: KnowThyself is an agentic AI assistant that simplifies large language model (LLM) interpretability. It provides a chat-based interface where users can ask natural language questions about their models, receive interactive visualizations, and get clear explanations. The platform consolidates fragmented interpretability tools into an accessible conversational workflow, featuring a multi-agent architecture with an orchestrator LLM, an agent router, and specialized agents for tasks like attention visualization and bias detection.
Large Language Models (LLMs) have become incredibly powerful, excelling in tasks from understanding language to complex reasoning. However, their inner workings often remain a mystery, a ‘black box’ that makes it hard to understand why they make certain decisions. This lack of transparency raises concerns about trust and accountability, and while research has tried to shed light on LLM behavior, interpretability tools have often been fragmented, difficult to use, and required deep technical expertise.
Introducing KnowThyself: Your Conversational AI Interpreter
To bridge this gap, researchers have developed KnowThyself, an innovative agentic assistant designed to make LLM interpretability accessible to everyone. Imagine being able to simply ask your LLM questions in plain language and receive clear, interactive visualizations and explanations, all without writing a single line of code. That’s precisely what KnowThyself offers.
KnowThyself unifies various interpretability tools into a single, chat-based interface. Users can upload their models, pose natural language questions, and get guided explanations alongside interactive visualizations. This design significantly lowers the technical barriers that typically prevent practitioners from engaging with cutting-edge interpretability research.
How KnowThyself Works: A Multi-Agent Approach
At its core, KnowThyself operates through a sophisticated multi-agent orchestration framework:
-
Orchestrator LLM: This supervisory model manages user interactions, reformulates queries, and generates necessary subtasks. Crucially, it contextualizes the results into coherent, natural language explanations, making complex data understandable.
-
Agent Router: Using embedding-based similarity search, this component efficiently dispatches user queries to the most appropriate specialized agent, ensuring that the right tool is used for the right question.
-
Specialized Agents: KnowThyself integrates several modular agents, each encapsulating a specific interpretation method. These include BertViz for visualizing attention mechanisms, TransformerLens for analyzing fine-grained layer and head-level activations, a RAG explainer that grounds responses in relevant literature, and BiasEval, which assesses safety and demographic disparities using metrics like toxicity, regard, and HONEST scores.
-
Conversational Interface: This user-friendly chat interface is where all the magic happens. Users can upload their models, ask questions, and explore results with interactive visualizations, making the entire process intuitive and accessible.
Also Read:
- Unlocking Public Data: How PublicAgent’s Multi-Agent AI Framework Simplifies Complex Analysis
- Simplifying SPARQL: An Interactive Approach to Query Refinement with Natural Language
Practical Applications and Future Vision
KnowThyself supports a variety of practical scenarios. For instance, a user could upload a model and ask, “Show me how the model attends across tokens for the word ‘she’ in a sentence.” The system would then synthesize an example sentence, use TransformerLens to compute attention maps, and present an interactive visualization with a clear explanation. In the same session, the user could then inquire, “Does my model show gender bias in how it answers questions?” KnowThyself would seamlessly switch tasks, use BiasEval to run evaluations, and summarize the bias scores.
The platform is built using LangGraph and leverages models like Gemma3-27B for orchestration and Ollama for serving various LLMs, including GPT-2, BERT, and LLaMA2-13B. Its modular design ensures that new interpretation tools can be easily integrated without disrupting the core system.
KnowThyself represents a significant step forward in democratizing LLM interpretability. By streamlining the process through a conversational workflow and providing literature-grounded explanations, it empowers a broader audience to engage with and understand complex AI models more effectively. While currently supporting a limited set of tools and text inputs, future work aims to expand tool coverage, support multimodal models, and introduce richer visualization capabilities for even deeper insights.
For those interested in exploring the implementation, the project is publicly available on GitHub. This work was accepted for publication at the Demonstration Track of the 40th AAAI Conference on Artificial Intelligence (AAAI’26).


