TLDR: This research presents practical optimization techniques for deploying deep neural network-based hyperspectral imaging segmentation on FPGA-based Systems-on-Chip for autonomous driving. By combining hardware/software co-design, advanced data preprocessing, and significant model compression (reducing operations by an order of magnitude and parameters by two orders of magnitude), the system achieves high accuracy with improved speed and power efficiency, addressing the challenges of real-time edge deployment.
Autonomous driving systems (ADS) rely heavily on vision sensors to understand their surroundings, but traditional greyscale and RGB cameras have limitations, especially when different materials appear similar under certain lighting conditions—a phenomenon known as metamerism. Hyperspectral imaging (HSI) offers a promising solution by capturing a wider range of spectral information, providing richer data that can help overcome these limitations and improve the accuracy of detection and scene understanding.
However, integrating advanced computer algorithms like deep neural networks (DNNs) with HSI for real-time applications in safety-critical systems like ADS presents significant challenges. DNNs are often computationally intensive, and HSI data requires extensive preprocessing. Deploying these complex systems on edge platforms, which have limited resources, demands a careful co-design of both software and hardware to ensure efficiency, low latency, and reduced resource consumption.
A Practical Approach to Optimization
Researchers from the University of the Basque Country (UPV/EHU) have developed a set of optimization techniques for a DNN-based HSI segmentation processor deployed on a field-programmable gate array (FPGA)-based System-on-Chip (SoC) specifically for ADS. Their work, detailed in the paper “Optimization of DNN-based HSI Segmentation FPGA-based SoC for ADS: A Practical Approach”, focuses on practical co-design strategies.
The core of their solution involves several key optimizations: a functional distribution of tasks between software and hardware, hardware-aware preprocessing of HSI data, and significant compression of the machine learning model. They also implemented a complete pipelined deployment, ensuring smooth and efficient operation.
Model Compression and Performance
The study utilized a U-Net architecture, a type of DNN well-suited for image segmentation, and trained it on the HSI-Drive v2.0 dataset. While U-Net is simpler than some state-of-the-art models, it still required optimization for edge deployment. The team applied advanced compression techniques, including post-training quantization, which converted the model from high-accuracy floating-point arithmetic to more efficient 8-bit integer operations. This drastically reduced the model’s memory footprint without noticeable degradation in segmentation accuracy.
Beyond quantization, an iterative structured pruning method was employed. Pruning involves removing the least significant parameters from the DNN. This technique significantly reduced the complexity of the designed DNN to just 24.34% of its original operations and a mere 1.02% of its original number of parameters. This massive reduction led to a 2.86x speed-up in the inference task—the process of making predictions—without any noticeable loss in segmentation accuracy. The researchers found that iterative pruning consistently outperformed one-shot pruning and even pre-training pruning methods, maintaining higher accuracy while achieving greater compression.
Optimizing Data Preprocessing and Deployment
A crucial, yet often overlooked, aspect of HSI systems is the intensive data preprocessing required to convert raw 2D camera data into 3D hyperspectral cubes compatible with DNNs. This stage was identified as a significant bottleneck. To address this, the researchers implemented a refined hardware/software co-design approach on the AMD-Xilinx KV260 board, a platform tailored for edge vision applications.
They optimized the preprocessing pipeline by carefully managing memory arrangement and inter-task communication. For instance, they found that converting raw data to a Band Sequential (BSQ) format initially, which stores each band sequentially, was more efficient for early channel-wise operations. Later in the pipeline, just before DNN inference, the data was converted to Band Interleaved by Pixel (BIP) format, which is required by the DPU (Deep Processing Unit) inference engine and is faster for pixel-wise operations. This strategic conversion avoided performance penalties from premature format changes.
To further enhance throughput, the entire application was restructured into a multi-stage pipeline. Instead of a single sequential process, they created three concurrently executing stages: two for preprocessing on the ARM processor and one for DNN inference on the DPU. This parallelization, combined with the model compression, significantly reduced the overall latency and improved the frames per second (FPS) processed by the system. The overall throughput increased by 8.18 times from the least optimized to the most optimized scenario, while maintaining efficient power consumption.
Also Read:
- FOCUS: A New Framework for Interpreting Hyperspectral Vision Transformers
- HeCoFuse: A Unified Approach for Cooperative Perception in Diverse V2X Environments
Future Outlook
This research demonstrates that a holistic hardware/software co-design approach, coupled with targeted optimization techniques like iterative pruning and strategic data handling, can enable the practical deployment of HSI-based intelligent vision systems for autonomous driving on embedded platforms. The work paves the way for more robust and accurate ADS by leveraging the rich spectral information of HSI without compromising real-time performance or power efficiency. Future work will explore further accelerating raw image preprocessing through specialized hardware or integrating it directly into the DNN feature extraction module, and investigating stream or dataflow-type accelerators for even higher inference throughput.


