Evaluating YOLO Models for Underwater Object Detection: A Deep Dive into Performance and Efficiency

TLDR: This research compares the performance of YOLOv8, YOLOv9, YOLOv10, and YOLOv11 models on two underwater datasets (coral disease and fish species). It finds that while accuracy improvements are minimal after YOLOv9 in marine environments, inference speed significantly improves across versions, with YOLOv10 offering the best speed-accuracy balance for autonomous underwater vehicles. The study also highlights challenges in interpreting these complex models using Grad-CAM.

Autonomous underwater vehicles (AUVs) are becoming increasingly vital for tasks like mapping marine habitats, monitoring ecosystems, and inspecting underwater infrastructure. These vehicles rely heavily on computer vision systems to understand their surroundings. However, the underwater environment presents unique challenges for these systems, including poor lighting, murky water, and often small, densely packed objects like marine organisms. Additionally, AUVs have limited computational power, making efficient computer vision models crucial.

Understanding the Challenges of Underwater Vision

Traditional computer vision models, especially two-stage detectors, can be too slow and computationally demanding for real-time deployment on AUVs. This is where the YOLO (You Only Look Once) family of models comes in. YOLO models are known for their ability to combine object localization and classification into a single, fast network, making them ideal for time-sensitive applications like autonomous navigation.

While YOLO models have shown impressive performance on land-based benchmarks like COCO and PASCAL VOC, their effectiveness in the marine domain has been less explored. The significant differences between terrestrial and underwater imagery mean that performance on one doesn’t necessarily translate to the other. This research paper addresses this gap by providing a controlled comparison of recent YOLO versions in underwater settings.

The Study’s Approach

Researchers curated two publicly available datasets to evaluate YOLOv8-s, YOLOv9-s, YOLOv10-s, and YOLOv11-s. The first, a Coral Disease dataset, contained 4,480 images across 18 classes, while the second, a Fish Species dataset, had 7,500 images with 20 distinct classes. To understand how data availability affects performance, models were trained using 25%, 50%, 75%, and 100% of the training images, while validation and test sets remained consistent.

All models were trained with identical settings (100 epochs, 640 px input, batch size 16, on a T4 GPU) and evaluated using standard metrics such as precision, recall, mAP50, mAP50-95 (measures of accuracy), and per-image inference time and frames-per-second (FPS) to assess speed. The study also utilized Grad-CAM visualizations to understand which features the models were focusing on during their predictions.

Each YOLO version introduces architectural innovations. YOLOv8 moved to an anchor-free head and decoupled regression and classification tasks for better small object detection. YOLOv9 introduced a hybrid detection head and a Generalized Efficient Layer Aggregation Network (GELAN) for improved accuracy and efficiency, along with a Dynamic Receptive Field Selection (DRS) Block to handle closely packed objects. YOLOv10 is a lightweight model optimized for edge devices, using Neural Architecture Search (NAS) and an improved C3 module. YOLOv11, the latest, incorporates CNN-based backbones with attention mechanisms and a dynamic scaling mechanism for optimal width and depth, along with lightweight transformers in its neck design for faster context understanding.

Key Findings: Accuracy vs. Speed

The study revealed interesting trends. Across both the Coral Disease and Fish Species datasets, the accuracy of the YOLO models, as measured by mAP50 and mAP50-95, tended to saturate after YOLOv9. This suggests that while newer versions introduce architectural innovations, these primarily target efficiency rather than significant accuracy gains in marine environments. In many cases, YOLOv8 and YOLOv9 even achieved comparable or slightly better accuracy than YOLOv10 and YOLOv11, especially with varying amounts of training data.

However, inference speed showed a marked improvement across successive YOLO versions. YOLOv8 was the slowest, while YOLOv10 consistently demonstrated the best inference speed, often outperforming YOLOv11. This indicates that YOLOv10 offers the most favorable speed-accuracy trade-off, making it particularly suitable for deployment on resource-constrained AUVs.

Also Read:

Insights from Visual Attention

To understand how these models make predictions, Grad-CAM visualizations were employed. These heatmaps highlight the regions of an image that a model considers most important for its classification. The analysis showed that despite advancements, YOLO models could still leverage irrelevant or ‘spurious’ features, sometimes focusing on background elements rather than the object itself. This was particularly evident in the Coral Dataset, where performance didn’t always improve with more training data, and inconsistencies were observed.

The researchers also noted the inherent limitations of applying Grad-CAM to complex, regression-based object detectors like YOLO. Grad-CAM assumes a single class prediction, whereas YOLO generates outputs for every grid cell, often leading to heatmaps that highlight background noise. This discrepancy suggests that current explainability techniques may not fully capture the intricate workings of multi-task models like YOLO.

In conclusion, while newer YOLO versions offer significant improvements in inference speed, their accuracy gains in underwater object detection are minimal beyond YOLOv9. YOLOv10 stands out for its optimal balance of speed and accuracy, making it a strong candidate for AUV deployment. The study also underscores the need for better explainability metrics tailored for complex, multi-task computer vision models. For a deeper dive into the methodology and detailed results, you can access the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Evaluating YOLO Models for Underwater Object Detection: A Deep Dive into Performance and Efficiency

Understanding the Challenges of Underwater Vision

The Study’s Approach

Key Findings: Accuracy vs. Speed

Insights from Visual Attention

Gen AI News and Updates

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates