Enhancing LLM Parallel Generations Through Interdependent Information Sharing

TLDR: Bridge is a new method that allows Large Language Models (LLMs) to generate multiple responses in parallel while sharing information between them. By adding small “Bridge blocks” to the LLM architecture, it enables interdependent generations, significantly improving accuracy and consistency in tasks like math reasoning, with minimal additional parameters and full flexibility in generation width.

Large Language Models (LLMs) have made incredible strides in performance, especially when it comes to complex tasks. A common approach to improve LLM outputs is to scale inference-time compute, which often involves generating multiple responses for a single input prompt. However, a significant limitation of this method has been that these parallel responses are typically generated independently, meaning they don’t share any information with each other. This leaves a lot of potentially useful insights untapped, limiting the overall quality of the generated response set.

Researchers from Meta, Carnegie Mellon University, and Yale University have introduced a novel approach called “Bridge” that aims to overcome this challenge. Bridge allows LLMs to generate interdependent responses in parallel, fundamentally rethinking how batched LLM hidden states are processed. Instead of treating them as independent slices, Bridge views them as holistic tensors, enabling information to flow between different generation sequences.

How Bridge Works

At its core, Bridge introduces small, efficient architectural additions called “Bridge blocks” into existing LLMs. These blocks are placed after each feedforward block and add only a minimal amount of new parameters (ranging from 2.8% to 5.1% of the original model’s parameters). The magic happens within these Bridge blocks: they perform attention operations not just within a single sequence, but across different sequences that originate from the same input prompt at each step of the generation process. This is similar to a concept known as axial attention, where the model can blend information across different dimensions of its internal representations.

Crucially, Bridge maintains full generation parallelism. This means that while the responses are now interdependent and share information, the actual generation of tokens for each response still happens simultaneously. The design is also flexible, allowing for any number of parallel generations at test-time without needing to be retrained for different “widths” of parallelism.

Also Read:

Significant Performance Gains

The impact of Bridge on LLM performance is substantial. The research demonstrates that Bridge significantly improves the relative mean accuracy gains from reinforcement learning with verifiable rewards (RLVR) by up to 50%. This means that when LLMs are fine-tuned with human feedback or verifiable outcomes, Bridge helps them learn and improve much more effectively. It also boosts the consistency of correct responses, leading to higher quality and more reliable output sets.

For example, experiments on various math reasoning benchmarks showed that a DeepSeek-R1-Distill-Qwen-7B model equipped with Bridge blocks achieved a 50% further improvement with RLVR compared to the next best method. Furthermore, the rate at which all responses to a single competition math problem were correct increased from 15.3% to 18.1% with Bridge.

Bridge also proves to be highly versatile. Once trained, it scales seamlessly to any generation width, consistently outperforming independent generation methods. It’s robust to discrepancies between the width used during training and testing. The method also generalizes well to longer generation lengths, showing improved accuracy and consistency even when extrapolating beyond its training length.

This innovative approach unlocks a more general mode of parallel scaling, effectively leveraging information between sequences. It is also compatible with any post-generation aggregation technique, meaning it can be combined with other methods that synthesize multiple responses into a final output.

The full details of this research can be found in the paper: Generalized Parallel Scaling with Interdependent Generations.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Enhancing LLM Parallel Generations Through Interdependent Information Sharing

How Bridge Works

Significant Performance Gains

Gen AI News and Updates

AI’s Hyper-Growth Unlocked: OpenAI’s $500B Valuation Forces a Capital Re-evaluation for Investors

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

Ghana Navigates Complexities in AI Regulatory Development Amidst Coordination Challenges

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates