COXNet: A New Approach to Spotting Tiny Objects in Drone Imagery Using Combined Visible and Thermal Data

TLDR: COXNet is a new framework for detecting tiny objects in RGBT (visible and thermal) drone imagery. It uses a Cross-Layer Fusion Module to combine high-level visible and low-level thermal features, a Dynamic Alignment and Scale Refinement module to correct misalignments and handle varying object sizes, and a GeoShape-based label assignment for precise localization. This approach significantly improves detection accuracy for small, occluded objects while maintaining efficiency, making it suitable for real-time drone applications.

Detecting small objects in images captured by drones, especially when combining visible light (Red-Green-Blue) and thermal infrared data (RGBT), presents a significant challenge in computer vision. These “tiny objects” are hard to spot due to their small size, blending into backgrounds, and issues like spatial misalignment between the visible and thermal cameras, low-light conditions, and obstructions. Traditional methods often struggle to effectively combine the complementary information from these two different types of imagery.

Addressing these critical issues, researchers have introduced a novel framework called COXNet. This new system is specifically designed for RGBT tiny object detection and brings three key innovations to the forefront, aiming to improve accuracy and efficiency in complex environments like those encountered by drones.

Cross-Layer Fusion for Enhanced Detail

The first innovation is the Cross-Layer Fusion Module (CLFM). Unlike conventional approaches that merge features from similar processing stages, CLFM intelligently combines high-level visible features with low-level thermal features. This unique fusion strategy enhances both semantic understanding (what the object is) and spatial accuracy (where the object is). It achieves this by using a wavelet-based alignment technique, which effectively separates and combines different frequency components of the images. This allows COXNet to precisely align and fuse complementary information from visible and thermal modalities, preserving fine details crucial for tiny objects while reducing computational complexity.

Dynamic Alignment and Scale Refinement

The second core component is the Dynamic Alignment and Scale Refinement (DASR) module. This module is crucial for correcting spatial misalignments between the visible and thermal data and for handling objects of varying sizes. DASR consists of two parts: the Adaptive Alignment Module (AAM) and the Dynamic Scale Refinement (DSR) mechanism. AAM dynamically adjusts the positions of visible and thermal features at a pixel level, ensuring they correspond accurately. DSR, on the other hand, uses different-sized convolution kernels and dynamic weighting to adjust feature scales, effectively capturing both fine-grained details and broader contextual information. This dual approach ensures precise alignment and robust multi-scale feature adjustment, which is particularly beneficial for drone-based detection in challenging conditions.

Optimized Label Assignment for Precision

Finally, COXNet introduces an optimized label assignment strategy that utilizes the GeoShape Similarity Measure. Traditional methods often rely on Intersection over Union (IoU) for assigning labels, which can be overly sensitive to small shifts, especially for tiny objects. The GeoShape Similarity Measure is more robust because it considers not only the overlap but also the spatial and shape characteristics of the bounding boxes. This ensures a more accurate and adaptive assignment process, significantly improving the localization accuracy of tiny objects even under difficult circumstances.

Also Read:

Performance and Efficiency

Extensive experiments were conducted on several challenging datasets, including RGBTDronePerson, VTUAV-det, and NII-CU. The results consistently show that COXNet significantly outperforms existing state-of-the-art methods. For instance, on the RGBTDronePerson dataset, COXNet achieved a notable 3.32% improvement in mAP 50 (mean Average Precision at 50% IoU) over previous leading models. Crucially, despite its enhanced accuracy, COXNet maintains competitive efficiency, making it suitable for real-time applications in resource-constrained environments like drone-based surveillance. This balance between high detection accuracy and computational demand positions COXNet as a leading solution for RGBT tiny object detection.

The effectiveness of COXNet’s architectural innovations, including the DASR and CLFM modules, is evident in its ability to enhance detection capabilities while managing resource requirements. This makes it an optimal choice for real-world scenarios where both precision and real-time performance are paramount. For more technical details, you can refer to the full research paper here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

COXNet: A New Approach to Spotting Tiny Objects in Drone Imagery Using Combined Visible and Thermal Data

Cross-Layer Fusion for Enhanced Detail

Dynamic Alignment and Scale Refinement

Optimized Label Assignment for Precision

Performance and Efficiency

Gen AI News and Updates

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates