spot_img
HomeResearch & DevelopmentBoosting Edge AI Efficiency: A New Dataflow Minimizes Memory...

Boosting Edge AI Efficiency: A New Dataflow Minimizes Memory Traffic in Computing-In-Memory Systems

TLDR: Researchers have developed a novel Computing-In-Memory (CIM) dataflow that significantly reduces buffer traffic and improves memory utilization for lightweight AI models like MobileNet and EfficientNet. By introducing ‘Convolution with Duplicated Kernels’ (ConvDK) and a ‘BIG/LITTLE scheduler’, the new dataflow achieves 77.4–87.0% less buffer traffic, leading to 10.1–17.9% lower data traffic energy and 15.6–27.8% reduced latency compared to conventional methods, making edge AI devices more efficient.

In the rapidly evolving world of artificial intelligence, especially for devices at the ‘edge’ like smartphones and smart sensors, efficient computing is paramount. One of the biggest hurdles in achieving this efficiency is the ‘memory wall’ – the bottleneck created by the constant movement of data between processing units and memory. Computing-In-Memory (CIM) emerges as a promising solution, aiming to minimize this data movement by performing computations directly within the memory itself.

While CIM offers significant energy efficiency, it faces particular challenges when handling lightweight AI models such as MobileNet and EfficientNet, which heavily rely on a process called depthwise convolution for feature extraction. The main issues are underutilization of CIM memory and, critically, heavy buffer traffic. Buffer traffic, which refers to data moving between different levels of on-chip memory buffers, has often been overlooked despite its substantial impact on a device’s speed (latency) and power consumption (energy).

Addressing these challenges, researchers Choongseok Song and Doo Seok Jeong from Hanyang University have introduced a novel CIM dataflow. This innovative approach is designed to drastically reduce buffer traffic by maximizing data reuse and significantly improving memory utilization during depthwise convolution operations. Their work is detailed in their research paper, which you can read here: Computing-In-Memory Dataflow for Minimal Buffer Traffic.

A Novel Approach to Data Handling

The core of their proposed dataflow lies in two key innovations: a new data mapping method called ‘Convolution with Duplicated Kernels’ (ConvDK) and a ‘BIG/LITTLE scheduler’.

ConvDK works by strategically duplicating parts of the computational ‘kernel’ (a small matrix used in convolution) within the CIM memory. This clever duplication allows the input data to be reused multiple times without needing to be reloaded from external buffers, thereby dramatically cutting down on buffer traffic. The method is grounded in solid theoretical principles, ensuring its effectiveness.

The BIG/LITTLE scheduler is designed to maintain high memory utilization across various sizes of feature maps (the intermediate data representations in a neural network). Depending on the width of the input feature map, the scheduler intelligently partitions and distributes data across multiple CIM tiles (processing units), ensuring that the memory is always used efficiently, whether the data is large or small.

Impact on Performance and Efficiency

The researchers rigorously evaluated their new dataflow against conventional methods, including traditional weight-stationary (WS) and input-stationary (IS) dataflows. The results are compelling. When applied to MobileNet and EfficientNet models, their dataflow reduced buffer traffic by an impressive 77.4% to 87.0% compared to the conventional weight-stationary approach. This significant reduction in data movement translated into substantial energy and latency savings.

Specifically, the total data traffic energy (including both off-chip memory and on-chip buffer traffic) was reduced by 10.1% to 17.9%, and the overall latency (the time it takes to complete operations) saw a reduction of 15.6% to 27.8%. Even when applied to input-stationary dataflows, their method showed similar improvements, highlighting its versatility.

Hardware Considerations

The proposed dataflow is implemented using an analog CIM macro based on 8T-SRAM technology. Key hardware components include a multi-access-capable memory for kernel duplication, which allows multiple identical data to be written simultaneously, and an IA shift-and-mask unit that enables efficient input data shifting and block-wise operations. The design ensures that data transfer from off-chip memory (DRAM) can be pipelined, meaning it doesn’t add extra latency to the overall computation.

Also Read:

Beyond Current Limitations

This research stands out by directly tackling the often-overlooked issue of buffer traffic, which previous works like Morphable CIM and MobiLattice did not fully address. By supporting both weight-stationary and input-stationary dataflows and focusing on maximizing data reuse within the CIM macro, this novel dataflow offers greater flexibility and superior performance, making it a significant step forward for energy-efficient AI at the edge.

In conclusion, the Computing-In-Memory dataflow with kernel duplication and the BIG/LITTLE scheduler represents a crucial advancement in optimizing AI computations for edge devices. By intelligently managing data movement and maximizing memory utilization, it paves the way for more powerful and energy-efficient AI applications in the future.

Karthik Mehta
Karthik Mehtahttps://blogs.edgentiq.com
Karthik Mehta is a data journalist known for his data-rich, insightful coverage of AI news and developments. Armed with a degree in Data Science from IIT Bombay and years of newsroom experience, Karthik merges storytelling with metrics to surface deeper narratives in AI-related events. His writing cuts through hype, revealing the real-world impact of Generative AI on industries, policy, and society. You can reach him out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -