Proactive Beam Selection in Connected Vehicles Using AI

TLDR: A new framework uses multi-modal sensing (visual and GPS data) and Transformer models to proactively predict optimal millimeter-wave (mmWave) beams for vehicle-to-vehicle (V2V) communication. This approach significantly reduces beam training overheads, improves prediction accuracy by up to 77.58% for top-15 beams, and lowers average power loss compared to single-modality methods, making V2V communication more efficient and reliable.

Future transportation systems are rapidly evolving with connected and autonomous vehicles, promising significant improvements in traffic efficiency, mobility, and safety. A crucial element for this revolution is seamless, high-throughput, and low-latency communication, especially for sharing vast amounts of sensor data between vehicles, infrastructure, and their surroundings. Millimeter-wave (mmWave) communication, operating at frequencies above 24 GHz, is an ideal candidate for these demands due to its abundant spectrum resources.

However, mmWave signals face a significant challenge: severe atmospheric path loss due to their short wavelengths. To overcome this, beamforming techniques are employed, using phased-antenna arrays to direct narrow beams of RF energy, ensuring sufficient signal strength and reliable links. The current 5G New Radio (5G-NR) standard defines a codebook-based beam training approach where communicating nodes exchange pilot signals and measurements to select the best beam direction. While effective, this method introduces high beam training overheads and reduces available airtime, particularly in highly dynamic vehicular environments where beam directions change constantly.

To address these limitations, a new research paper, “Multi-Modal Sensing Aided mmWave Beamforming for V2V Communications with Transformers”, proposes an innovative multi-modal sensing and fusion learning framework. This framework aims to significantly reduce beam training overheads by proactively predicting the best line-of-sight communication links between vehicles.

How the Framework Works

The core idea involves leveraging out-of-band contextual information from multiple sensors. The proposed framework extracts features individually from visual (camera) and GPS coordinates sensing modalities using specialized encoders. These modality-specific features are then fused together to predict a limited set of top-k beams, effectively narrowing down the search space for the optimal beam.

The system model considers two moving vehicles in a downlink vehicular communication system, where one acts as a receiver and the other as a transmitter. The receiver vehicle is equipped with a camera sensor, while the transmitter carries a GPS receiver. The collected multi-modal data (GPS latitude/longitude and visual images) is preprocessed and fed into two distinct encoders:

Position Encoder: This component uses an embedding layer and a multi-layered Transformer encoder to process GPS coordinates, transforming them into a higher-dimensional feature space and capturing rich contextual relationships.
Visual Encoder: A variant of the vision transformer, named multi-axis vision transformer (MaxViT), is employed. This model combines the benefits of both convolution and transformer architectures to extract rich visual features from the camera images, adapting well to high-resolution and dense prediction tasks.

After individual feature extraction, the outputs from both encoders are concatenated (fused) to create a unified representation. This fused output is then passed through a multi-layer perceptron network, which uses a Softmax function in its final layer to make a probabilistic prediction, determining the top-k best beams.

Experimental Validation and Results

To demonstrate the generalizability and effectiveness of the proposed framework, a comprehensive experiment was conducted using the DeepSense 6G dataset, which contains real-world multi-modal sensor observations. The study focused on four different vehicle-to-vehicle (V2V) communication scenarios.

The results were highly promising. The proposed framework achieved an impressive accuracy of up to 77.58% in correctly predicting the top-15 beams. This performance significantly outperforms approaches that rely on single modalities (either GPS or visual data alone). Furthermore, the framework incurred a remarkably low average power loss, roughly as low as 2.32 dB, and considerably reduced the beam searching space overheads by 76.56% for top-15 beams compared to standard methods.

Also Read:

Conclusion

This research presents a compelling solution for 60 GHz mmWave enabled connected vehicles. By integrating multi-modal sensing information (visual and GPS) with a Transformer-based fusion framework, the system can proactively predict optimal beams, thereby addressing the critical challenges of high beam training overheads and latency in dynamic V2V environments. The findings underscore the framework’s ability to meet the high throughput and low latency demands essential for the next generation of connected vehicles, paving the way for more efficient and reliable communication in future transportation systems.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Proactive Beam Selection in Connected Vehicles Using AI

How the Framework Works

Experimental Validation and Results

Conclusion

Gen AI News and Updates

PASA Unveils New ‘Data for AI’ Guidance to Foster Responsible Innovation in Pensions Administration

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

HKU Spearheads AI Integration in Hong Kong’s Digital Education Future

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates