How2: A Framework for Lifelong Learning in AI Agents Through Procedural Questions

TLDR: The How2 framework enables AI agents to learn complex procedures by asking “how-to” questions, storing the answers, and reusing them for future tasks. Evaluated in a Minecraft-like environment, it shows that abstract, high-level answers are more beneficial for long-term learning and reusability than highly specific, executable instructions, significantly reducing the agent’s reliance on external guidance over time.

In the rapidly evolving landscape of artificial intelligence, equipping agents with the ability to learn continuously and adapt to new challenges is paramount. A recent research paper introduces ‘How2′, a novel framework designed to enhance AI agents’ planning capabilities by enabling them to learn from procedural ‘how-to’ questions. This approach allows agents to ask questions, store the answers, and effectively reuse this knowledge for lifelong learning in interactive environments.

The core challenge addressed by How2 lies in the open-ended nature of ‘how-to’ questions. Answers can range from precise, executable actions to high-level descriptions of sub-goals, making it difficult for AI agents to interpret and for experts to provide in a universally useful format. How2 tackles this by introducing a memory agent framework that not only facilitates asking these questions but also intelligently processes and stores the responses for future application.

The How2 Framework Explained

The How2 framework operates within a student-teacher setup, where an agent (the student) interacts with an environment and, when faced with uncertainty, queries a teacher for guidance. This guidance is then processed and stored in a memory module. The framework comprises several key components:

Actor: This is the main agent that decides on the next action, which can be an environment action, a ‘think’ action for internal reasoning, or a ‘read-memory’ action to query its stored knowledge.
Memory: A simple key-value store that caches answers from the teacher. Retrieval is based on exact string matching of the query, making the storage and recall process straightforward.
Relevance Check: Before using a stored memory, an AI model assesses its applicability to the current game state, ensuring that only pertinent information is utilized.
Question Generation: If no relevant memory is found, the agent formulates a ‘how-to’ question for the teacher, often conditioning it on observed items in the environment.
Teacher Model: This component provides procedural responses to the agent’s questions. The paper explores different types of teachers, varying in the level of abstraction and context-dependency of their answers.
Parse Answer: A crucial step where the teacher’s response is abstracted to remove state-specific details (e.g., replacing a specific inventory slot with a generic item name). This abstraction is vital for making the memory entry generalizable and reusable across different situations. It also generates relevant tags for broader retrieval.

Teacher Strategies and Their Impact

The researchers designed four distinct teacher models to understand how different levels of abstraction in responses affect an agent’s learning and reusability:

Executable Teacher: Provides complete, step-by-step action sequences that are immediately actionable and fully conditioned on the current game state. While highly effective for immediate task success, these plans are tightly coupled to specific states, limiting their reusability.
Partially-Executable Teacher: Offers answers that remove state-specific information, replacing them with generics (e.g., ‘move the glass to A1’ instead of ‘move from I12 to A1’). The agent still needs to identify where to retrieve the item from.
Subgoal-Partially-Executable Teacher: Structures the partially-executable plan into identifiable subgoals, providing a more organized sequence of actions.
Non-Executable Teacher: Delivers high-level instructions using unconstrained language and pattern abstractions (e.g., ‘arrange in a V shape’). This type of teacher is closer to how a human might answer, less reliant on environment specifics.

Also Read:

Key Findings and Lifelong Learning

The How2 framework was evaluated in Plancraft, a Minecraft crafting environment. The results highlight a significant trade-off between the immediate utility of an answer and its long-term reusability. While fully executable plans from the teacher offered the highest immediate success rate, their performance dropped dramatically when reused in different contexts, confirming that highly specific instructions have low reusability.

In contrast, abstracting answers, particularly into subgoal structures, significantly enhanced reusability. The subgoal-partially-executable teacher, for instance, saw only a modest drop in success rate when answers were reused, demonstrating the effectiveness of generalized knowledge. The full How2 framework, integrating memory with both parsing and relevance checks, achieved a balance between immediate performance and long-term autonomy. It significantly reduced the agent’s reliance on teacher interventions (by over 40% in high-repetition settings) while maintaining a high success rate.

Notably, the non-executable teacher, which provides high-level, human-like instructions, achieved its highest success rate within the full How2 setup. This indicates that the parsing and relevance modules effectively ground abstract knowledge into reusable, actionable plans, allowing the agent to operationalize less specific guidance.

This research demonstrates that learning from ‘how-to’ questions is a powerful mechanism for improving AI planning capabilities, especially when answers are abstracted from the current state. The How2 framework offers a promising path for LLM-based agents to become more effective and self-sufficient learners over time. For more details, you can read the full paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

How2: A Framework for Lifelong Learning in AI Agents Through Procedural Questions

The How2 Framework Explained

Teacher Strategies and Their Impact

Key Findings and Lifelong Learning

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

Astreya Unveils New Wave of Enterprise AI Agents to Boost Business Efficiency and Automation

Vida Secures $4 Million Series A Funding to Advance AI Voice Technology and Expand Leadership

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates