Aleks: An AI System for Autonomous Scientific Discovery in Plant Science

TLDR: Aleks is an AI-powered multi-agent system designed to autonomously conduct scientific discovery in plant science. It integrates domain knowledge, data analysis, and machine learning to formulate problems, explore modeling strategies, and refine solutions without human intervention. In a case study on grapevine red blotch disease, Aleks successfully identified meaningful features and produced robust, interpretable models. Ablation studies confirmed the critical roles of domain knowledge and a comprehensive memory system for coherent and biologically relevant outcomes, demonstrating its potential to accelerate research and foster human-AI collaboration.

Modern plant science is increasingly dealing with vast and varied datasets, but researchers often face hurdles in designing experiments, preparing data, and ensuring their findings can be reproduced. These challenges can slow down the pace of scientific discovery. To address this, a new AI-powered system called Aleks has been developed. Aleks is a multi-agent system designed to autonomously conduct scientific discovery by integrating domain knowledge, data analysis, and machine learning.

Once given a research question and a dataset, Aleks works independently, without human intervention. It iteratively formulates problems, explores different modeling strategies, and refines solutions over multiple cycles. This means it can take a scientific problem from start to finish, making its own decisions along the way.

How Aleks Works: A Team of AI Agents

Aleks is built around three specialized AI agents that collaborate through a shared memory system, mimicking how a team of human experts might work together:

Domain Scientist (DS) Agent: This agent brings specialized knowledge to the table. In the case study, it acted as a plant pathologist, evaluating modeling suggestions and machine learning results for their biological relevance. It helps ensure that the AI’s approach makes sense from a scientific perspective and suggests improvements based on domain expertise.
Data Analyst (DA) Agent: The DA agent is responsible for refining analysis strategies. It proposes modeling approaches, evaluates the outcomes, and improves data preprocessing and feature engineering (creating new features or selecting existing ones). It considers the scientific question and dataset, along with feedback from the DS agent.
Machine Learning Engineer (MLE) Agent: This agent automates the technical aspects. It generates executable Python code for training and evaluating machine learning models. It reviews suggestions from the DA agent, previews the data, and constructs prompts for its underlying language model to write the code. It also handles errors, refining the code until a valid solution is achieved.

A central shared memory system allows these agents to communicate and maintain a continuous record of the entire research process, from initial questions to experimental results and feedback. This ensures that agents have access to the necessary context, whether it’s the full history for the DA agent or just the current iteration’s details for the MLE agent.

A Case Study: Grapevine Red Blotch Disease

To test its capabilities, Aleks was applied to a critical problem in plant science: predicting Grapevine Red Blotch Disease (GRBD) infection. GRBD is a significant threat to wine-grape yield and quality, but its symptoms are hard to diagnose reliably and appear late in the season. Molecular tests are accurate but costly and time-consuming, highlighting the need for better sampling strategies.

Aleks was tasked with predicting GRBD infection status in 2023 or 2024 using a multi-year vineyard dataset. The system was given only the research question and the dataset; all subsequent decisions, including how to frame the problem, engineer features, and build models, were handled autonomously by Aleks.

The results were promising. Aleks consistently achieved full autonomy, formulating the problem (sometimes as classification, sometimes as regression), performing analysis, and summarizing results. It reliably identified relevant features, such as historical GRBD counts, geospatial coordinates, and canopy traits, which are known to be important in GRBD biology and epidemiology. As iterations progressed, Aleks even proposed new, domain-informed features, like ‘GRBD infection lag’, which incorporated spatial information.

Why Each Agent Matters: Insights from Ablation Studies

Experiments were conducted to understand the importance of each component:

The Domain Scientist Agent is Crucial: Without the DS agent, Aleks became a purely data-driven optimizer, focusing on statistical correlations rather than biological meaning. This led to less informative features and sometimes premature termination of experiments, underscoring the DS agent’s role in guiding the AI towards scientifically relevant solutions.
Full Experiment History is Key: When the DA agent only had access to the most recent experimental records instead of the complete history, Aleks showed less consistency in feature selection. It sometimes re-evaluated features already deemed unhelpful, and a case of data leakage occurred, highlighting the importance of long-term memory for coherent reasoning.

The study also found that the models developed by Aleks demonstrated strong generalizability, meaning a model trained for one year could effectively predict GRBD in another year with minimal adaptation. This robustness is largely attributed to the incorporation of domain knowledge into the modeling process.

Also Read:

The Future of Scientific Discovery

Aleks represents a significant step towards fully autonomous scientific discovery. It offers the advantage of independently managing the entire research process, integrating knowledge across domains like data science and plant science to produce scientifically sound outcomes without constant human intervention. This dramatically accelerates the pace of discovery, shortening experimentation cycles from ideation to feedback to just a few hours.

While Aleks shows great promise, there are still limitations to address. These include ensuring factual accuracy and verification of outputs from large language models, expanding the system’s ability to use a wider range of computational tools beyond tabular data, and integrating hardware components for collecting new datasets. A critical ongoing challenge is defining the optimal balance between human oversight and AI autonomy in scientific collaboration.

This exploratory work highlights the potential of agentic AI as an autonomous collaborator, paving the way for a new paradigm of human-AI collaboration in science. You can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Aleks: An AI System for Autonomous Scientific Discovery in Plant Science

How Aleks Works: A Team of AI Agents

A Case Study: Grapevine Red Blotch Disease

Why Each Agent Matters: Insights from Ablation Studies

The Future of Scientific Discovery

Gen AI News and Updates

SOCi Achieves Major Milestone with 150,000 AI Agents Automating 10 Million Local Marketing Tasks

TD Synnex Unveils Agentic AI-Powered Digital Bridge to Revolutionize Partner Sales and Productivity

Avalara Secures $500 Million Investment from BlackRock to Propel AI-Powered Tax Automation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates