Boosting AI Performance on RISC-V: A New Approach to Vector Unit Optimization

TLDR: This research introduces a novel method to significantly improve the performance of AI workloads on RISC-V processors equipped with the Vector Extension (RVV). By integrating RVV into TVM’s MetaSchedule framework, the authors developed a system that automatically tunes tensor operations for optimal efficiency. The approach outperforms traditional compiler autovectorization and existing hand-crafted libraries like muRISCV-NN, showing substantial speedups (up to 84% for matrix multiplications and 46% for full AI models) and reducing code memory footprint, making it ideal for a wide range of RISC-V-based AI applications, especially in embedded systems.

Artificial Intelligence (AI) models are becoming increasingly prevalent, from powerful data centers to compact embedded devices. The RISC-V Instruction Set Architecture (ISA), known for its open-source nature and scalability, is an excellent candidate for accelerating these AI workloads across diverse hardware platforms.

While RISC-V’s Vector Extension (RVV) has gained support in various commercial and research platforms, efficient software to utilize these vector units for AI workloads has been a challenge. Existing solutions, such as compiler autovectorization (e.g., GCC, LLVM) or hand-crafted libraries like muRISCV-NN, often fall short. Autovectorization doesn’t always maximize vector unit efficiency, and hand-crafted libraries struggle to adapt to different hardware configurations, leading to suboptimal performance.

This research introduces a novel approach to optimize AI workloads for RISC-V vector units by integrating the RVV extension into TVM’s MetaSchedule framework. MetaSchedule is a probabilistic program framework designed for tuning tensor operations, allowing for an efficient exploration of various mapping possibilities for AI workloads onto RISC-V vector units.

The core of this proposal involves extending MetaSchedule with specialized “tensor intrinsics” that leverage the RISC-V RVV extension. These intrinsics define small tensor operations that can be accelerated by the target hardware’s instructions. By using probabilistic sampling, MetaSchedule can explore a vast design space of potential schedules for each tensor operation, identifying the most efficient ones.

A key challenge addressed is the flexibility of the RISC-V RVV extension, particularly its variable vector lengths. The researchers tackled this by registering multiple versions of the same tensor intrinsics within TVM, each configured with different vector lengths. This allows MetaSchedule to match and accelerate both large and small tensor operations effectively.

The evaluation of this new workflow involved implementing various RISC-V Systems-on-Chip (SoCs) on an FPGA and also testing on a commercially available Banana Pi BPI-F3 board. A wide range of AI workloads, including matrix multiplications and complete neural networks (like MobileNetV2, BERT, and MobileLLM), were tuned and compared against existing methods.

The results are compelling. For single matrix multiplications, the proposed solution demonstrated a mean improvement of 84% compared to GCC’s autovectorization and 50% against muRISCV-NN. For complete AI models, the improvements were 46% against GCC’s autovectorization and 29% against muRISCV-NN. On the Banana Pi board, the solution provided a 35% speedup for complete AI models over standard LLVM autovectorization.

Furthermore, an analysis of instruction traces revealed that the optimized schedules generated by this approach utilize vector registers more efficiently, leading to fewer instructions executed and a significantly smaller code memory footprint (around 90% reduction in many cases) compared to muRISCV-NN. This makes the resulting binaries more suitable for embedded devices with limited memory.

Also Read:

While the tuning process requires some time, the significant performance gains achieved make it a worthwhile investment for deploying AI workloads on RISC-V platforms. The open-source nature of this work encourages further expansion to other RISC-V extensions, benefiting both embedded devices and high-performance computing applications. For more technical details, you can refer to the original research paper: Tensor Program Optimization for the RISC-V Vector Extension Using Probabilistic Programs.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Boosting AI Performance on RISC-V: A New Approach to Vector Unit Optimization

Gen AI News and Updates

FeNN-DMA: A Programmable RISC-V System for Advanced Spiking Neural Networks

ReSpec: Boosting LLM Inference Speed with Adaptive Retrieval

KANalogue: A Fully Analog Approach to AI Computation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates