AquaVLM: Smart Communication for Safer Underwater Exploration

TLDR: AquaVLM is a novel underwater communication system that uses mobile Vision-Language Models (VLMs) on smartphones to generate and transmit context-aware messages. It addresses limitations of traditional systems by analyzing images and sensor data to create relevant messages, and employs error-resilient fine-tuning for reliable transmission. Evaluated through VR simulations and real-world tests, AquaVLM significantly improves diver situational awareness and communication effectiveness, demonstrating the potential of on-device AI in extreme environments.

Exploring the underwater world, whether for recreation or scientific research, is an incredible experience. However, maintaining safety and effective communication among divers has always been a significant challenge. Traditional underwater communication systems are often cumbersome, expensive, or rely on predefined messages that lack the crucial context of the surrounding environment.

Imagine being able to effortlessly share detailed observations and critical status updates with your dive buddy, just by tapping your smartphone. This is precisely the vision behind AquaVLM, a groundbreaking new system designed to enhance underwater situational awareness using the power of mobile Vision-Language Models (VLMs).

AquaVLM transforms ubiquitous smartphones into smart underwater communication devices. It allows divers to ‘tap-and-send’ messages that are automatically generated and highly context-aware. Instead of relying on a limited set of pre-programmed phrases, AquaVLM analyzes multimodal data – including images captured by the phone’s camera and sensor readings from the phone or a diving watch – to understand the current diving situation. Based on this understanding, it generates suitable message options for the diver to choose from.

The system works in two main stages. First, an existing mobile VLM is specially ‘instruct-tuned’ for underwater scenarios. This involves training it on a custom dataset of underwater conversations, which helps it understand context, generate relevant messages, and even recover corrupted messages. This fine-tuning process incorporates different communication purposes, such as safety alerts or environmental descriptions, allowing the VLM to produce fewer, yet highly relevant, message options, thus reducing computational load on the mobile device.

Second, AquaVLM features ‘error-resilient fine-tuning’. Underwater acoustic transmission is notoriously prone to errors. To combat this, the mobile VLM is further trained on datasets containing randomly corrupted messages. This unique approach allows the VLM to interpret and recover messages even when they contain a certain degree of character corruption, much like how humans can understand text with typos.

To evaluate AquaVLM’s effectiveness, the researchers developed both a virtual reality (VR) simulator and a fully functional prototype on the iOS platform. The VR simulator allowed users to experience AquaVLM in a realistic underwater environment, encountering various events like shark encounters or equipment malfunctions, and communicating with virtual divers. This subjective evaluation showed an impressive 80% ‘purpose-align rate’, meaning the generated messages largely matched the users’ intended communication goals.

Real-world experiments were conducted in a lake, testing the system’s reliability over distances up to 20 meters. The results were highly promising, with AquaVLM consistently maintaining an average of 90% semantic similarity between the original and received messages over distances up to 15 meters. This indicates that the meaning of the messages was largely preserved, even with the challenges of underwater transmission. The system also demonstrated low Bit Error Rates (BER) and acceptable latency for a messaging system.

Compared to existing methods, AquaVLM stands out by offering context-rich, informative messaging through readily available smartphones, without the need for bulky or expensive specialized equipment. It represents a significant leap forward from traditional hand signals, basic diving computer messages, or costly underwater talking devices.

The development of AquaVLM showcases the immense potential of deploying large language models on mobile devices, not just for everyday tasks, but for critical applications in challenging environments. While the current system has an effective transmission distance of around 20 meters and some latency due to VLM inference and transmission, future improvements could include smaller, more efficient models and lightweight underwater modems for greater range and speed.

Also Read:

AquaVLM is more than just a communication tool; it’s a step towards a future where divers can have a richer, safer, and more informed experience exploring the underwater world, all powered by the device in their pocket. For more technical details, you can refer to the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

AquaVLM: Smart Communication for Safer Underwater Exploration

Gen AI News and Updates

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates