Self-Adaptive AI: Giving Scientists Control Over Reasoning Processes

TLDR: CLIO (Cognitive Loop via In-Situ Optimization) is a new AI approach that allows large language models (LLMs) to self-adapt their reasoning in real-time without extra training. It significantly improves accuracy on science questions (e.g., 22.37% on HLE biology/medicine with GPT-4.1, a 161.64% relative increase over base GPT-4.1) and provides transparency into its thought process through graph structures and uncertainty monitoring. This gives scientists unprecedented control and understanding of AI’s decision-making, fostering better human-AI collaboration in scientific discovery.

Artificial intelligence is rapidly transforming scientific discovery, but a key challenge remains: giving scientists precise control over how AI models think and reason. Traditional AI development often falls short, either by embedding human-like thought patterns into non-reasoning models or by abstracting away the intricate details of reasoning from the user. This lack of steerability can be a significant hurdle, especially in high-stakes scientific domains where accuracy and transparency are paramount.

Addressing this, a new approach called Cognitive Loop via In-Situ Optimization (CLIO) has been introduced. Developed by Newman Cheng, Gordon Broadbent, and William Chappell from Microsoft Discovery and Quantum, CLIO empowers large language models (LLMs) to self-formulate problem-solving strategies, adapt their behavior when uncertain, and ultimately provide scientists with well-reasoned answers. Unlike methods that rely on extensive post-training, CLIO optimizes thinking in real-time during inference, without requiring additional data or training cycles. This innovative system is designed to be an alternative or complement to reinforcement learning post-training, enhancing non-reasoning models’ ability to tackle complex problems and choose the most effective approach.

One of CLIO’s core strengths is its open design, which allows scientists to observe the model’s uncertainty levels and understand how its final conclusions are reached through graph structures. This transparency is crucial for building trust and enabling human experts to interject corrections when needed. The system’s ability to adapt and self-correct is inspired by the neuroplasticity of the human brain, which can create, modify, or remove neural connections based on experience. CLIO embodies this by dynamically adjusting its internal strategy through editable parameters, particularly to resolve self-recognized uncertainties during execution.

CLIO’s architecture incorporates both breadth-wise and depth-wise exploration capabilities. For breadth, it draws inspiration from existing techniques like chain-of-thought prompting, allowing it to explore many different options. For depth, CLIO introduces a novel recursive mechanism, enabling it to invoke itself and create independent “thought channels.” These clean context windows prevent the pollution of aggregated context with incomplete thoughts, allowing for deep dives into specific areas of exploration. To prevent endless exploration, CLIO includes algorithmic controls over its “cognitive depth,” similar to configurable reasoning effort levels in other models, ensuring efficient and focused problem-solving.

A significant innovation in CLIO is its method for overcoming the “over-indexing challenge” often faced by agentic optimization approaches that ensemble multiple perspectives. Instead of relying solely on prompt-based reduction, CLIO leverages graph structures to reduce noise and synthesize a balanced perspective. It uses GPT-4.1 to extract entities and relationships from its thought processes, which are then clustered and summarized. This graph representation is then queried to produce a final answer, especially when CLIO is configured for “more thinking,” which involves multiple runs with different configurations to build a comprehensive joint graph of all sampled thought sequences.

In evaluations, CLIO demonstrated impressive performance. When paired with OpenAI’s GPT-4.1, CLIO achieved an accuracy of 22.37% on text-based biology and medicine questions from Humanity’s Last Exam (HLE), without any further post-training. This represents a substantial 13.82% net increase (or 161.64% relative increase) compared to the base GPT-4.1 model. Furthermore, CLIO surpassed OpenAI’s o3 model in both high and low reasoning effort modes, showcasing its ability to elevate the performance of completion models to par with reasoning-class models. The system also exhibited greater stability and less variability than o3 across multiple runs, thanks to its graph-based structure and multi-resolution information querying.

Beyond just accuracy, CLIO provides critical insights into its internal workings. The research revealed that oscillations within internal uncertainty measures are key indicators of CLIO’s result accuracy. For instance, a negative gradient of uncertainty over time often correlates with correct answers, while a positive gradient or high volatility signals incorrect answers or areas where human intervention might be beneficial. This transparency allows scientists to understand when the model’s decisions can be trusted and when experts need to interject, fostering a more effective human-machine collaboration. The produced chains of thought by CLIO were also found to be more similar to human-annotated rationales compared to base models, further enhancing trust and explainability.

The development of CLIO marks a significant step towards creating AI agents that are not only powerful but also transparent and steerable. By enabling real-time adaptation and exposing its internal belief states, CLIO puts scientists in the driver’s seat, allowing them to correct thought patterns and understand the reasoning process. This is particularly vital for long-running LLM agents engaged in high-value tasks like drug discovery or materials science, where the ability to control and monitor the AI’s reasoning is essential for reliable and defensible scientific outcomes. For more details, you can refer to the full research paper: Cognitive Loop via In-Situ Optimization: Self-Adaptive Reasoning for Science.

Also Read:

Future work on CLIO will focus on optimizing its performance across accuracy, cost, and time. Researchers are exploring how control variables like temperature and depth influence performance, and how CLIO can effectively combine different reasoning and non-reasoning models (e.g., GPT-4.1 with o3, or Microsoft’s Phi-4 and xAI’s Grok-4) to solve problems that individual models cannot. While CLIO’s recursive design demands computational resources, the potential for novel scientific discoveries often outweighs the cost. Early tests also show CLIO’s capacity to autonomously orchestrate scientific tools for extended periods, paving the way for mid-stream steering to influence scientific outcomes directly.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Self-Adaptive AI: Giving Scientists Control Over Reasoning Processes

Gen AI News and Updates

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

UNESCO’s 43rd General Conference Concludes with New Leadership and Landmark Ethics Frameworks for Technology

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates