Enhancing Reliability and Speed in Analog In-Memory Computing with Hybrid Grouping

TLDR: This research introduces a novel ‘row-column hybrid grouping’ technique and an ILP-based compiler pipeline to address computational unreliability from stuck-at faults (SAFs) and high compilation overhead in analog In-Memory Computing (IMC) systems. The hybrid grouping enhances fault tolerance by adding redundancy across rows and columns, while the new compiler significantly speeds up the fault-mitigation process by reformulating it as an Integer Linear Programming task. Experimental results show up to 8%p accuracy improvement, 150x faster compilation, and 2x energy efficiency gains on CNNs and language models, making IMC systems more scalable and fault-resilient.

In-Memory Computing (IMC) is an exciting new approach to computer architecture that aims to speed up artificial intelligence (AI) tasks by performing calculations directly within memory. This non-traditional method significantly reduces the time and energy spent moving data between the processor and memory, which is a major bottleneck in conventional computer systems. Analog IMC systems, particularly those based on Resistive Random Access Memory (ReRAM) arrays, are especially promising for their energy-efficient execution of matrix-vector multiplication, a fundamental operation in deep learning.

However, the widespread adoption of analog IMC faces two significant hurdles: the unreliability of computations due to permanent defects called stuck-at faults (SAFs), and the substantial time required to compile existing fault-mitigation algorithms. SAFs are like permanent glitches in memory cells, locking them into a specific state (either Stuck-At-Zero or Stuck-At-One) and causing irreversible errors in stored data. These errors can severely degrade the accuracy of deep neural networks (DNNs), potentially rendering them unusable.

Current solutions to SAFs, such as fault-aware retraining or hardware-based compensation, often come with their own set of problems. Retraining is often impractical because fault patterns vary from chip to chip, and access to original training data is limited. Hardware-based methods add extra components, increasing energy consumption and physical space requirements. A technique called Fault-Free (FF) offers a hardware-overhead-free way to mitigate SAFs by finding redundant data representations that can mask these faults. While effective, FF suffers from extremely long compilation times, sometimes taking up to 8 hours for larger models. This is a major issue because compilation must be done for each individual chip due to unique fault patterns, and it’s a recurring cost with every model update.

A New Approach to Fault Resilience

A recent research paper, titled “Row-Column Hybrid Grouping for Fault-Resilient Multi-Bit Weight Representation on IMC Arrays,” introduces a novel solution to these challenges. The authors, Kang Eun Jeon, Sangheum Yeon, Jinhee Kim, Hyeonsu Bang, Johnny Rhe, and Jong Hwan Ko, propose a two-pronged approach: a new multi-bit weight representation technique called row-column hybrid grouping, and an innovative compiler pipeline. You can read the full paper here: https://arxiv.org/pdf/2508.15685.

The core idea behind row-column hybrid grouping is to enhance fault tolerance by introducing redundancy in how weights are stored. Traditional methods, like column grouping, distribute parts of a weight across multiple columns. Hybrid grouping takes this a step further by also grouping rows. Imagine two rows in a memory array receiving the same input voltage; they effectively act as a single, larger weight. This creates more ways to represent the same target weight, making it more resilient to individual cell failures. This distributed importance across cells means that a fault in one cell has a less severe impact on the overall weight value.

Accelerating Compilation with Smart Algorithms

To tackle the compilation overhead, the researchers designed a new compiler pipeline that reformulates the complex fault-aware weight decomposition problem as an Integer Linear Programming (ILP) task. This allows the use of highly efficient, off-the-shelf solvers to find optimal solutions much faster. The pipeline also incorporates theoretical insights that identify specific fault patterns that can be solved trivially, further reducing computation time.

The compilation process works in stages. First, it checks the ‘representable range’ of weights for a given fault map. If a target weight falls outside this range, a simple, trivial solution can be applied. If it’s within the range, the pipeline then checks for ‘consecutivity’ – whether the representable weights form a continuous set. Depending on these checks, it either uses a table-based approach (for smaller configurations) or the more powerful ILP-based algorithms (for larger, more complex scenarios) to find the best way to represent the weight while mitigating faults.

Impressive Results Across AI Models

The experimental results are highly encouraging. On convolutional networks, the proposed method achieved up to an 8 percentage point improvement in accuracy, a remarkable 150 times faster compilation time, and a 2 times gain in energy efficiency compared to existing fault mitigation baselines. For instance, the R2C2 configuration (2 rows, 2 columns grouped) with 4.95-bit precision even outperformed the traditional R1C4 (1 row, 4 columns grouped) at 8-bit precision, highlighting that fault impact can be more detrimental than quantization loss.

The approach also proved effective for compact language models like OPT-125M and OPT-350M, demonstrating superior fault tolerance and maintaining model performance much closer to ideal, fault-free conditions. Crucially, the accelerated compilation pipeline made it possible to compile these larger language models in just a few minutes, a task that would be prohibitively long with previous methods. This scalability is vital for deploying fault-resilient IMC systems in real-world AI applications.

Furthermore, hardware evaluations showed significant energy savings, up to 50%, primarily due to improved array utilization in the hybrid grouping configurations. By reducing column usage and increasing row utilization, especially in shallower network layers, the method enhances overall energy efficiency.

Also Read:

Conclusion

This research marks a significant step forward in making analog IMC systems more reliable and scalable. By introducing row-column hybrid grouping and an efficient ILP-based compilation pipeline, the authors have addressed critical challenges related to computational unreliability and compilation overhead. The theoretical framework for understanding fault-induced errors, coupled with the practical gains in accuracy, speed, and energy efficiency, paves the way for broader deployment of fault-resilient analog IMC in both computer vision and natural language processing applications. The open-sourcing of their implementation further supports reproducibility and future advancements in this field.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Enhancing Reliability and Speed in Analog In-Memory Computing with Hybrid Grouping

A New Approach to Fault Resilience

Accelerating Compilation with Smart Algorithms

Impressive Results Across AI Models

Conclusion

Gen AI News and Updates

Baidu Unveils Next-Generation AI Accelerators and ERNIE 5.0 Model

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates