TLDR: This research introduces a novel ‘row-column hybrid grouping’ technique and an ILP-based compiler pipeline to address computational unreliability from stuck-at faults (SAFs) and high compilation overhead in analog In-Memory Computing (IMC) systems. The hybrid grouping enhances fault tolerance by adding redundancy across rows and columns, while the new compiler significantly speeds up the fault-mitigation process by reformulating it as an Integer Linear Programming task. Experimental results show up to 8%p accuracy improvement, 150x faster compilation, and 2x energy efficiency gains on CNNs and language models, making IMC systems more scalable and fault-resilient.
In-Memory Computing (IMC) is an exciting new approach to computer architecture that aims to speed up artificial intelligence (AI) tasks by performing calculations directly within memory. This non-traditional method significantly reduces the time and energy spent moving data between the processor and memory, which is a major bottleneck in conventional computer systems. Analog IMC systems, particularly those based on Resistive Random Access Memory (ReRAM) arrays, are especially promising for their energy-efficient execution of matrix-vector multiplication, a fundamental operation in deep learning.
However, the widespread adoption of analog IMC faces two significant hurdles: the unreliability of computations due to permanent defects called stuck-at faults (SAFs), and the substantial time required to compile existing fault-mitigation algorithms. SAFs are like permanent glitches in memory cells, locking them into a specific state (either Stuck-At-Zero or Stuck-At-One) and causing irreversible errors in stored data. These errors can severely degrade the accuracy of deep neural networks (DNNs), potentially rendering them unusable.
Current solutions to SAFs, such as fault-aware retraining or hardware-based compensation, often come with their own set of problems. Retraining is often impractical because fault patterns vary from chip to chip, and access to original training data is limited. Hardware-based methods add extra components, increasing energy consumption and physical space requirements. A technique called Fault-Free (FF) offers a hardware-overhead-free way to mitigate SAFs by finding redundant data representations that can mask these faults. While effective, FF suffers from extremely long compilation times, sometimes taking up to 8 hours for larger models. This is a major issue because compilation must be done for each individual chip due to unique fault patterns, and it’s a recurring cost with every model update.
A New Approach to Fault Resilience
A recent research paper, titled “Row-Column Hybrid Grouping for Fault-Resilient Multi-Bit Weight Representation on IMC Arrays,” introduces a novel solution to these challenges. The authors, Kang Eun Jeon, Sangheum Yeon, Jinhee Kim, Hyeonsu Bang, Johnny Rhe, and Jong Hwan Ko, propose a two-pronged approach: a new multi-bit weight representation technique called row-column hybrid grouping, and an innovative compiler pipeline. You can read the full paper here: https://arxiv.org/pdf/2508.15685.
The core idea behind row-column hybrid grouping is to enhance fault tolerance by introducing redundancy in how weights are stored. Traditional methods, like column grouping, distribute parts of a weight across multiple columns. Hybrid grouping takes this a step further by also grouping rows. Imagine two rows in a memory array receiving the same input voltage; they effectively act as a single, larger weight. This creates more ways to represent the same target weight, making it more resilient to individual cell failures. This distributed importance across cells means that a fault in one cell has a less severe impact on the overall weight value.
Accelerating Compilation with Smart Algorithms
To tackle the compilation overhead, the researchers designed a new compiler pipeline that reformulates the complex fault-aware weight decomposition problem as an Integer Linear Programming (ILP) task. This allows the use of highly efficient, off-the-shelf solvers to find optimal solutions much faster. The pipeline also incorporates theoretical insights that identify specific fault patterns that can be solved trivially, further reducing computation time.
The compilation process works in stages. First, it checks the ‘representable range’ of weights for a given fault map. If a target weight falls outside this range, a simple, trivial solution can be applied. If it’s within the range, the pipeline then checks for ‘consecutivity’ – whether the representable weights form a continuous set. Depending on these checks, it either uses a table-based approach (for smaller configurations) or the more powerful ILP-based algorithms (for larger, more complex scenarios) to find the best way to represent the weight while mitigating faults.
Impressive Results Across AI Models
The experimental results are highly encouraging. On convolutional networks, the proposed method achieved up to an 8 percentage point improvement in accuracy, a remarkable 150 times faster compilation time, and a 2 times gain in energy efficiency compared to existing fault mitigation baselines. For instance, the R2C2 configuration (2 rows, 2 columns grouped) with 4.95-bit precision even outperformed the traditional R1C4 (1 row, 4 columns grouped) at 8-bit precision, highlighting that fault impact can be more detrimental than quantization loss.
The approach also proved effective for compact language models like OPT-125M and OPT-350M, demonstrating superior fault tolerance and maintaining model performance much closer to ideal, fault-free conditions. Crucially, the accelerated compilation pipeline made it possible to compile these larger language models in just a few minutes, a task that would be prohibitively long with previous methods. This scalability is vital for deploying fault-resilient IMC systems in real-world AI applications.
Furthermore, hardware evaluations showed significant energy savings, up to 50%, primarily due to improved array utilization in the hybrid grouping configurations. By reducing column usage and increasing row utilization, especially in shallower network layers, the method enhances overall energy efficiency.
Also Read:
- Boosting Edge AI Efficiency: A New Dataflow Minimizes Memory Traffic in Computing-In-Memory Systems
- Unlocking Analog Circuit Design for AI: Introducing Image2Net
Conclusion
This research marks a significant step forward in making analog IMC systems more reliable and scalable. By introducing row-column hybrid grouping and an efficient ILP-based compilation pipeline, the authors have addressed critical challenges related to computational unreliability and compilation overhead. The theoretical framework for understanding fault-induced errors, coupled with the practical gains in accuracy, speed, and energy efficiency, paves the way for broader deployment of fault-resilient analog IMC in both computer vision and natural language processing applications. The open-sourcing of their implementation further supports reproducibility and future advancements in this field.


