Accelerating Hyperspectral AI for Autonomous Vehicles on FPGA Hardware

TLDR: This research presents practical optimization techniques for deploying deep neural network-based hyperspectral imaging segmentation on FPGA-based Systems-on-Chip for autonomous driving. By combining hardware/software co-design, advanced data preprocessing, and significant model compression (reducing operations by an order of magnitude and parameters by two orders of magnitude), the system achieves high accuracy with improved speed and power efficiency, addressing the challenges of real-time edge deployment.

Autonomous driving systems (ADS) rely heavily on vision sensors to understand their surroundings, but traditional greyscale and RGB cameras have limitations, especially when different materials appear similar under certain lighting conditions—a phenomenon known as metamerism. Hyperspectral imaging (HSI) offers a promising solution by capturing a wider range of spectral information, providing richer data that can help overcome these limitations and improve the accuracy of detection and scene understanding.

However, integrating advanced computer algorithms like deep neural networks (DNNs) with HSI for real-time applications in safety-critical systems like ADS presents significant challenges. DNNs are often computationally intensive, and HSI data requires extensive preprocessing. Deploying these complex systems on edge platforms, which have limited resources, demands a careful co-design of both software and hardware to ensure efficiency, low latency, and reduced resource consumption.

A Practical Approach to Optimization

Researchers from the University of the Basque Country (UPV/EHU) have developed a set of optimization techniques for a DNN-based HSI segmentation processor deployed on a field-programmable gate array (FPGA)-based System-on-Chip (SoC) specifically for ADS. Their work, detailed in the paper “Optimization of DNN-based HSI Segmentation FPGA-based SoC for ADS: A Practical Approach”, focuses on practical co-design strategies.

The core of their solution involves several key optimizations: a functional distribution of tasks between software and hardware, hardware-aware preprocessing of HSI data, and significant compression of the machine learning model. They also implemented a complete pipelined deployment, ensuring smooth and efficient operation.

Model Compression and Performance

The study utilized a U-Net architecture, a type of DNN well-suited for image segmentation, and trained it on the HSI-Drive v2.0 dataset. While U-Net is simpler than some state-of-the-art models, it still required optimization for edge deployment. The team applied advanced compression techniques, including post-training quantization, which converted the model from high-accuracy floating-point arithmetic to more efficient 8-bit integer operations. This drastically reduced the model’s memory footprint without noticeable degradation in segmentation accuracy.

Beyond quantization, an iterative structured pruning method was employed. Pruning involves removing the least significant parameters from the DNN. This technique significantly reduced the complexity of the designed DNN to just 24.34% of its original operations and a mere 1.02% of its original number of parameters. This massive reduction led to a 2.86x speed-up in the inference task—the process of making predictions—without any noticeable loss in segmentation accuracy. The researchers found that iterative pruning consistently outperformed one-shot pruning and even pre-training pruning methods, maintaining higher accuracy while achieving greater compression.

Optimizing Data Preprocessing and Deployment

A crucial, yet often overlooked, aspect of HSI systems is the intensive data preprocessing required to convert raw 2D camera data into 3D hyperspectral cubes compatible with DNNs. This stage was identified as a significant bottleneck. To address this, the researchers implemented a refined hardware/software co-design approach on the AMD-Xilinx KV260 board, a platform tailored for edge vision applications.

They optimized the preprocessing pipeline by carefully managing memory arrangement and inter-task communication. For instance, they found that converting raw data to a Band Sequential (BSQ) format initially, which stores each band sequentially, was more efficient for early channel-wise operations. Later in the pipeline, just before DNN inference, the data was converted to Band Interleaved by Pixel (BIP) format, which is required by the DPU (Deep Processing Unit) inference engine and is faster for pixel-wise operations. This strategic conversion avoided performance penalties from premature format changes.

To further enhance throughput, the entire application was restructured into a multi-stage pipeline. Instead of a single sequential process, they created three concurrently executing stages: two for preprocessing on the ARM processor and one for DNN inference on the DPU. This parallelization, combined with the model compression, significantly reduced the overall latency and improved the frames per second (FPS) processed by the system. The overall throughput increased by 8.18 times from the least optimized to the most optimized scenario, while maintaining efficient power consumption.

Also Read:

Future Outlook

This research demonstrates that a holistic hardware/software co-design approach, coupled with targeted optimization techniques like iterative pruning and strategic data handling, can enable the practical deployment of HSI-based intelligent vision systems for autonomous driving on embedded platforms. The work paves the way for more robust and accurate ADS by leveraging the rich spectral information of HSI without compromising real-time performance or power efficiency. Future work will explore further accelerating raw image preprocessing through specialized hardware or integrating it directly into the DNN feature extraction module, and investigating stream or dataflow-type accelerators for even higher inference throughput.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Accelerating Hyperspectral AI for Autonomous Vehicles on FPGA Hardware

A Practical Approach to Optimization

Model Compression and Performance

Optimizing Data Preprocessing and Deployment

Future Outlook

Gen AI News and Updates

UC Irvine Introduces Master’s Program in Applied AI for Scientists to Bridge Industry Skill Gaps

Advanced AI Maps Critical Road Networks for Disaster Response

Accelerating ML Hardware Design: A New Benchmark and AI Models for FPGA Resource Estimation

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates