TLDR: DelvePO is a novel framework for optimizing prompts for Large Language Models (LLMs). It addresses limitations of current methods by decoupling prompts into components and using a ‘working memory’ to guide a self-evolving optimization process. This approach leads to more stable and transferable prompts, consistently outperforming existing state-of-the-art methods across various tasks and LLMs, while also improving interpretability and efficiency.
Large Language Models (LLMs) have become incredibly powerful tools, capable of handling a wide array of tasks. However, getting the best performance out of them often requires carefully crafted instructions, known as prompts. This process, called Prompt Optimization, is crucial but faces several challenges. Existing methods often rely on the LLMs’ random rewriting abilities, which can lead to getting stuck in suboptimal solutions or producing prompts that don’t perform consistently across different tasks.
To tackle these issues, researchers have introduced a new framework called DelvePO, which stands for Direction-Guided Self-Evolving Framework for Flexible Prompt Optimization. This innovative approach aims to optimize prompts in a self-evolving manner, making it adaptable to various tasks without needing specific adjustments for each one. You can find the full research paper here: DelvePO Research Paper.
How DelvePO Works
The core idea behind DelvePO is to break down prompts into distinct, functional components, much like genes (Loci) and their variations (Alleles) in genetics. This decoupling allows the framework to explore how different factors within a prompt influence performance on various tasks. To guide this evolutionary process, DelvePO introduces a ‘working memory’ system, consisting of two parts:
- Component Memory: This memory tracks the evolution of individual prompt components, learning which variations perform better.
- Prompt Memory: This memory stores entire prompts and their performance, helping the LLM understand the relationships between components and guide the overall prompt optimization.
The framework operates through a four-module process:
- Initialization & Sampling: It starts by generating an initial set of diverse prompts by combining candidate values for each component. Then, it samples one or two prompts for evolution.
- Task-Evolution: Guided by the Component Memory, this module determines which components (types or values) need to evolve next, identifying promising directions for improvement.
- Solution-Evolution: Using insights from the Prompt Memory and the directions from Task-Evolution, this module performs actual evolutionary operations (mutation and crossover) on the prompt components to generate new, potentially better prompts.
- Memory-Evolution: After new prompts are generated and evaluated, this module updates both the Component Memory and Prompt Memory with the latest performance data, ensuring that the system continuously learns and refines its guidance for future evolutions.
Experimental Success and Efficiency
Extensive experiments were conducted across 11 datasets and three different LLMs: DeepSeek-R1-Distill-Llama-8B (open-source), Qwen2.5-7B-Instruct (open-source, Chinese-centric), and GPT-4o-mini (closed-source). The results consistently showed that DelvePO outperforms previous state-of-the-art prompt optimization methods, as well as manually crafted prompts. This demonstrates its effectiveness and its ability to transfer well across different tasks and LLM architectures.
While DelvePO proved highly effective, the researchers also analyzed its cost. For closed-source LLMs like GPT-4o-mini, the token usage was higher compared to some baselines. This is primarily because the working memory content is included as part of the input to the LLMs. However, the superior performance often justifies this increased expenditure, and future work aims to integrate prompt compression techniques to reduce this overhead.
Also Read:
- Systematic Prompt Improvement Through Score-Aware Multi-Agent Analysis
- AlphaOPT: A Self-Improving AI for Smarter Optimization Modeling
The Impact of Memory
An ablation study specifically highlighted the critical role of the memory mechanisms. When either Component Memory or Prompt Memory was removed, or both, the performance of DelvePO significantly dropped. This confirms that both memory modules are essential and work together synergistically to guide the prompt optimization process effectively.
DelvePO represents a significant step forward in prompt optimization. By decoupling prompts into components and employing a sophisticated memory-guided evolutionary process, it reduces the randomness often seen in other methods and greatly improves optimization speed and interpretability. This framework makes it easier for non-AI experts to leverage the full potential of LLMs for complex tasks, paving the way for more professional and adaptable prompts in various applications.


