Delethink: Enabling LLMs to Think Longer with Linear Compute

TLDR: A new research paper introduces ‘Markovian Thinking’ and ‘Delethink,’ an RL environment that allows Large Language Models (LLMs) to perform long-chain-of-thought reasoning with linear computational cost and constant memory. By structuring reasoning into fixed-size chunks and carrying over only a small textual state, Delethink overcomes the quadratic scaling issues of traditional methods. This approach matches or surpasses existing LongCoT-RL performance, significantly reduces training costs, and enables superior test-time scaling, demonstrating a path towards highly efficient and scalable reasoning LLMs.

Large Language Models (LLMs) have shown remarkable capabilities in complex reasoning tasks, often by generating a ‘long chain of thought’ (LongCoT) before arriving at an answer. This approach, while powerful, comes with a significant drawback: the computational cost grows quadratically as the length of these thought processes increases. This means that as LLMs try to think longer, the resources required for training and inference skyrocket, making very long reasoning prohibitively expensive and slow.

A new research paper introduces a novel paradigm called ‘Markovian Thinking’ and its practical implementation, ‘Delethink,’ to address this fundamental challenge. The core idea is to decouple the length of an LLM’s thought process from the size of the context it needs to process at any given moment. This innovative approach promises to enable LLMs to think for much longer durations with significantly reduced computational overhead.

The Problem with Traditional LLM Reasoning

In standard Reinforcement Learning (RL) setups for LLMs, the ‘state’ of the model is typically the initial prompt combined with all the reasoning tokens generated so far. As the LLM thinks, this state continuously grows. For attention-based models, which are common in LLMs, processing this ever-growing context leads to a quadratic increase in computational requirements and memory usage. This ‘quadratic growth’ is the bottleneck preventing LLMs from engaging in truly extensive reasoning.

Introducing Markovian Thinking and Delethink

The researchers propose ‘Markovian Thinking,’ a paradigm where the LLM’s policy advances its reasoning by conditioning on a constant-size state. This means that regardless of how long the model has been thinking, the amount of information it needs to actively process at any single step remains fixed. Delethink is an RL environment designed to train LLMs to become native Markovian Thinkers.

Here’s how Delethink works: Instead of generating one continuous, ever-growing chain of thought, reasoning is structured into a sequence of fixed-size ‘chunks.’ Within each chunk, the model thinks as usual. However, at the boundary of each chunk, the environment resets the context. The next chunk’s prompt is then reinitialized using the original query and a small ‘carryover’ of textual information from the end of the previous chunk. This carryover acts as the ‘textual Markovian state.’ Through RL, the LLM learns to write a concise, sufficient textual state at the end of each chunk, allowing it to seamlessly continue its reasoning after the context reset.

Significant Computational Benefits

The immediate consequence of Delethink’s design is profound: longer thinking requires linear compute and constant memory with respect to the total thinking length. This is a massive improvement over the quadratic scaling of traditional LongCoT methods. For instance, the paper estimates that for an average thinking length of 96,000 tokens, LongCoT-RL would cost approximately 27 H100-months of training, whereas Delethink would only cost 7 H100-months. This represents a substantial reduction in training time and resources.

Empirical results demonstrate that Delethink is highly effective. An R1-Distill 1.5B model trained with Delethink, reasoning in 8K-token chunks, can think up to 24K tokens, matching or even surpassing LongCoT-RL models trained with the same 24K budget on math benchmarks. Furthermore, Delethink shows superior ‘test-time scaling,’ meaning it continues to improve performance when allowed to think beyond its training-time limits, while LongCoT-RL methods tend to plateau.

Why Delethink Works So Well

A key insight from the research is that many off-the-shelf reasoning LLMs, even without explicit training for Markovian Thinking, already exhibit a latent ability to generate ‘Markovian traces’ zero-shot. This means they can naturally produce reasoning sequences that can be effectively chunked and continued with a limited state. This strong initial capability provides a favorable starting point for RL training, making Delethink highly effective at scale.

The researchers also tested Delethink’s compatibility with larger, state-of-the-art models like GPT-OSS 120B and Qwen3 30B-A3B. These models also demonstrated robust Markovian Thinking capabilities zero-shot across diverse tasks, including PhD-level questions, coding, and math competitions, signaling that Delethink can scale with the most advanced LLMs.

Also Read:

Implications for Future LLMs

The success of Markovian Thinking, as demonstrated by Delethink, highlights the RL environment itself as a powerful lever for progress in LLM development. By decoupling thinking length from context size, it opens a path toward efficient, scalable reasoning LLMs that could potentially think for millions of tokens. This paradigm shift could also make non-quadratic sequence architectures (like state-space models or sparse attention mechanisms) particularly beneficial for reasoning models, as they align well with the constant-memory, linear-compute nature of Markovian Thinking.

This research suggests that the way LLMs process and retain information during reasoning can be fundamentally redesigned to overcome current computational barriers, paving the way for more capable and efficient AI systems. You can read the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Delethink: Enabling LLMs to Think Longer with Linear Compute

The Problem with Traditional LLM Reasoning

Introducing Markovian Thinking and Delethink

Significant Computational Benefits

Why Delethink Works So Well

Implications for Future LLMs

Gen AI News and Updates

Vesl AI Recognized for AI Infrastructure Innovation with ASOCIO Digital Summit Award

AT&T Unleashes Agentic AI Across Business Operations for Enhanced Efficiency and Innovation

Google DeepMind Unveils SIMA 2: An Advanced AI Agent for Virtual 3D Worlds

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates