Boosting Edge AI Efficiency: A New Dataflow Minimizes Memory Traffic in Computing-In-Memory Systems

TLDR: Researchers have developed a novel Computing-In-Memory (CIM) dataflow that significantly reduces buffer traffic and improves memory utilization for lightweight AI models like MobileNet and EfficientNet. By introducing ‘Convolution with Duplicated Kernels’ (ConvDK) and a ‘BIG/LITTLE scheduler’, the new dataflow achieves 77.4–87.0% less buffer traffic, leading to 10.1–17.9% lower data traffic energy and 15.6–27.8% reduced latency compared to conventional methods, making edge AI devices more efficient.

In the rapidly evolving world of artificial intelligence, especially for devices at the ‘edge’ like smartphones and smart sensors, efficient computing is paramount. One of the biggest hurdles in achieving this efficiency is the ‘memory wall’ – the bottleneck created by the constant movement of data between processing units and memory. Computing-In-Memory (CIM) emerges as a promising solution, aiming to minimize this data movement by performing computations directly within the memory itself.

While CIM offers significant energy efficiency, it faces particular challenges when handling lightweight AI models such as MobileNet and EfficientNet, which heavily rely on a process called depthwise convolution for feature extraction. The main issues are underutilization of CIM memory and, critically, heavy buffer traffic. Buffer traffic, which refers to data moving between different levels of on-chip memory buffers, has often been overlooked despite its substantial impact on a device’s speed (latency) and power consumption (energy).

Addressing these challenges, researchers Choongseok Song and Doo Seok Jeong from Hanyang University have introduced a novel CIM dataflow. This innovative approach is designed to drastically reduce buffer traffic by maximizing data reuse and significantly improving memory utilization during depthwise convolution operations. Their work is detailed in their research paper, which you can read here: Computing-In-Memory Dataflow for Minimal Buffer Traffic.

A Novel Approach to Data Handling

The core of their proposed dataflow lies in two key innovations: a new data mapping method called ‘Convolution with Duplicated Kernels’ (ConvDK) and a ‘BIG/LITTLE scheduler’.

ConvDK works by strategically duplicating parts of the computational ‘kernel’ (a small matrix used in convolution) within the CIM memory. This clever duplication allows the input data to be reused multiple times without needing to be reloaded from external buffers, thereby dramatically cutting down on buffer traffic. The method is grounded in solid theoretical principles, ensuring its effectiveness.

The BIG/LITTLE scheduler is designed to maintain high memory utilization across various sizes of feature maps (the intermediate data representations in a neural network). Depending on the width of the input feature map, the scheduler intelligently partitions and distributes data across multiple CIM tiles (processing units), ensuring that the memory is always used efficiently, whether the data is large or small.

Impact on Performance and Efficiency

The researchers rigorously evaluated their new dataflow against conventional methods, including traditional weight-stationary (WS) and input-stationary (IS) dataflows. The results are compelling. When applied to MobileNet and EfficientNet models, their dataflow reduced buffer traffic by an impressive 77.4% to 87.0% compared to the conventional weight-stationary approach. This significant reduction in data movement translated into substantial energy and latency savings.

Specifically, the total data traffic energy (including both off-chip memory and on-chip buffer traffic) was reduced by 10.1% to 17.9%, and the overall latency (the time it takes to complete operations) saw a reduction of 15.6% to 27.8%. Even when applied to input-stationary dataflows, their method showed similar improvements, highlighting its versatility.

Hardware Considerations

The proposed dataflow is implemented using an analog CIM macro based on 8T-SRAM technology. Key hardware components include a multi-access-capable memory for kernel duplication, which allows multiple identical data to be written simultaneously, and an IA shift-and-mask unit that enables efficient input data shifting and block-wise operations. The design ensures that data transfer from off-chip memory (DRAM) can be pipelined, meaning it doesn’t add extra latency to the overall computation.

Also Read:

Beyond Current Limitations

This research stands out by directly tackling the often-overlooked issue of buffer traffic, which previous works like Morphable CIM and MobiLattice did not fully address. By supporting both weight-stationary and input-stationary dataflows and focusing on maximizing data reuse within the CIM macro, this novel dataflow offers greater flexibility and superior performance, making it a significant step forward for energy-efficient AI at the edge.

In conclusion, the Computing-In-Memory dataflow with kernel duplication and the BIG/LITTLE scheduler represents a crucial advancement in optimizing AI computations for edge devices. By intelligently managing data movement and maximizing memory utilization, it paves the way for more powerful and energy-efficient AI applications in the future.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Boosting Edge AI Efficiency: A New Dataflow Minimizes Memory Traffic in Computing-In-Memory Systems

A Novel Approach to Data Handling

Impact on Performance and Efficiency

Hardware Considerations

Beyond Current Limitations

Gen AI News and Updates

MindSpeed RL: Enhancing Large-Scale Reinforcement Learning for Language Models

Boosting Large Language Model Performance on Edge Devices with a Hybrid Accelerator

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates