TLDR: A new framework uses a Deep Reinforcement Learning (DRL) agent with Integrated Sensing and Communications (ISAC) to proactively defend millimeter-wave (mmWave) communication systems against sophisticated beam-stealing attacks. The DRL agent, trained with a unique curriculum learning strategy, learns to dynamically adjust beamforming and sensing efforts, achieving a 92.8% attacker detection rate while maintaining strong communication quality, significantly outperforming traditional defense methods.
Millimeter-wave (mmWave) communication systems are at the forefront of next-generation wireless technology, promising incredibly high data rates for applications like augmented reality/virtual reality (AR/VR) and connected vehicles. However, this advanced technology, which relies on highly directional beamforming, also introduces new vulnerabilities, particularly to sophisticated ‘beam-stealing’ attacks. These attacks can compromise the integrity and confidentiality of communication links by hijacking or eavesdropping on the beams.
A new research paper, titled “Secure mmWave Beamforming with Proactive-ISAC Defense Against Beam-Stealing Attacks,” by Seyed Bagher Hashemi Natanzi, Hossein Mohammadi, Bo Tang, and Vuk Marojevic, introduces a groundbreaking framework to tackle this critical security challenge. The core of their innovation lies in leveraging Integrated Sensing and Communications (ISAC) capabilities, not just for communication, but as an active, intelligent tool for threat assessment and defense.
The proposed framework employs an advanced Deep Reinforcement Learning (DRL) agent, built on the Proximal Policy Optimization (PPO) algorithm. This DRL agent is designed to proactively and adaptively defend against these intelligent attacks. Imagine a base station that can not only communicate with a legitimate user but also actively ‘sense’ its environment for suspicious activities. This is precisely what the DRL agent enables, dynamically controlling ISAC probing actions to investigate potential threats.
One of the significant hurdles in training such an intelligent defense system is the complexity of the environment, where successful threat detection can be a rare event. To overcome this, the researchers introduced an intensive curriculum learning strategy. This strategy ensures that the DRL agent experiences successful detections during its training phase, helping it to understand the value of security-oriented actions and guiding it towards a robust and adaptive defense policy. This ‘forced success’ mechanism in the early training phases is crucial for the agent to learn effectively.
The system works by having the DRL agent dynamically adjust the base station’s beam direction and ISAC sensing effort. The agent’s ‘brain’ consists of two neural networks: an Actor Network that learns the best actions to take, and a Critic Network that evaluates the quality of those actions. The reward system for the agent is carefully designed to balance proactive defense with maintaining excellent communication quality for the legitimate user.
Numerical results from the simulations are highly promising. The framework achieved an impressive mean attacker detection rate of 92.8%. This is a significant improvement compared to a physics-based defense protocol called SecBeam, which only managed a 68% detection rate. Crucially, this high level of security was achieved while maintaining an average user SINR (Signal-to-Interference-plus-Noise Ratio) of over 13 dB, indicating strong communication performance. The median detection rate was even higher at 100%, meaning that in over half of the autonomous episodes, the agent successfully detected the attacker at every single step.
A key finding is that the DRL agent learns an intelligent, adaptive policy rather than a rigid, static one. It can dynamically adjust its strategy based on the immediate threat level, sometimes sacrificing a bit of communication quality for near-perfect security when the threat is high, and other times achieving both high detection and excellent communication when the threat is less immediate. This adaptability is a hallmark of an intelligent defense system.
Furthermore, the analysis of the ISAC effort strategy showed that the agent learns to be resource-efficient. It intensifies its sensing efforts (allocating maximum ISAC effort) primarily when an attacker is perceived to be near (e.g., within 75 meters). When the attacker is far away, it conserves resources by using lower sensing effort. This targeted approach demonstrates a sophisticated, security-driven use of ISAC.
Also Read:
- Large AI Models Reshape Wireless Communication’s Core
- Evo-MARL: Building Safer AI Systems with Internalized Agent Defenses
While the framework shows great promise, the researchers acknowledge limitations, such as the current model being limited to a single-attacker context and its performance depending on the fidelity of ISAC sensing data. Future work will focus on multi-attacker scenarios, imperfect Channel State Information (CSI), and real-world testbed validation. This research represents a significant step towards securing future mmWave communication systems against evolving threats. You can read the full research paper here.


