TLDR: A new post-detection framework significantly improves fire and smoke detection in compact AI models like YOLOv5n and YOLOv8n. It refines detection confidence by combining statistical uncertainty (from single-pass dropout) with domain-relevant visual features (color, edge, texture) using a lightweight Confidence Refinement Network (CRN). This approach boosts precision, recall, and mAP, reducing false alarms and missed detections with only a modest increase in computational overhead, making it ideal for real-world deployment on resource-constrained devices.
In the critical domain of safety and disaster response, accurate and timely fire and smoke detection is paramount. However, current vision-based systems, especially those relying on compact deep learning models like YOLOv5n and YOLOv8n, often struggle to balance efficiency with reliability. These smaller models, ideal for deployment on drones, CCTV, and IoT devices, can suffer from false alarms or missed detections due to their reduced processing capacity. Traditional methods for refining detections, such as Non-Maximum Suppression (NMS), only consider how much bounding boxes overlap, which can lead to errors in complex or crowded scenes involving fire and smoke.
Addressing these challenges, independent researchers Aniruddha Srinivas Joshi, Godwyn James William, and Shreyas Srinivas Joshi have proposed an innovative uncertainty-aware post-detection framework. This framework aims to significantly enhance fire and smoke detection in compact deep learning models without altering their core architecture. The core idea is to refine the confidence scores of detected objects by considering both the model’s statistical uncertainty and relevant visual characteristics of fire and smoke.
A Smarter Approach to Confidence
The proposed framework introduces a lightweight Confidence Refinement Network (CRN) that acts as a crucial post-processing step. Instead of relying on simple overlap rules, the CRN integrates several key pieces of information to adjust detection scores:
- Uncertainty Estimation: The framework uses a clever technique called single-pass dropout during inference. While dropout is typically used during training to prevent overfitting, here it’s repurposed to estimate how confident (or uncertain) the model is about its predictions. This provides a statistical measure of reliability for each detected bounding box.
- Feature-Aware Confidence Normalization: To ensure detections align with the physical appearance of fire and smoke, the framework analyzes specific visual cues within each detected region. This includes:
- Color: Using HSV histograms, it assesses color intensity and saturation, recognizing that fire exhibits strong red-orange saturation, while smoke appears more diffuse.
- Edge: Canny edge detection is employed to identify smooth, gradient-like transitions typical of fire and smoke, helping to filter out false positives that often have sharp, unnatural edges.
- Texture: Haralick texture features, such as contrast and homogeneity, are used to differentiate the high-frequency patterns of fire from the smoother textures of smoke.
These raw confidence scores, uncertainty estimates, and visual features are then fed into the CRN. The CRN, a compact neural network, learns to combine these inputs to produce a more accurate and refined confidence score for each detection. This learned approach replaces heuristic-based adjustments, making the detection pipeline more adaptive and robust.
Experimental Validation and Promising Results
The researchers rigorously evaluated their framework using the D-Fire dataset, a benchmark specifically designed for fire and smoke detection, containing over 21,000 images. They applied their method to two popular compact models, YOLOv5n and YOLOv8n, and compared its performance against several existing post-detection techniques, including NMS, Soft-NMS, and various feature-based filters.
The results were compelling. For YOLOv8n, the framework significantly improved precision from 0.712 to 0.845 and recall from 0.674 to 0.82. The mean Average Precision (mAP), a comprehensive measure of detection accuracy, also saw a boost from 0.625 to 0.651. Similar improvements were observed with YOLOv5n, where precision rose from 0.703 to 0.84 and recall from 0.659 to 0.818, with mAP increasing from 0.609 to 0.641. Notably, the framework showed strong gains in detecting both fire and smoke categories.
While the framework introduces a modest increase in processing time (from approximately 12-14 ms to 20-23 ms per image), this overhead is considered well within acceptable limits for many real-time applications, especially fixed surveillance systems where sub-second responses are sufficient. This demonstrates a justifiable trade-off between a slight increase in latency and substantial gains in accuracy, which is crucial for safety-critical applications.
Also Read:
- Advancing Object Detection Without Source Data: A New Framework for DETR Models
- New Research Links Visual Uncertainty to Object Hallucinations in AI Models
Impact and Future Directions
This research offers a practical and effective solution for enhancing the reliability of compact deep learning models in fire and smoke detection. Its model-agnostic nature means it can be integrated with existing detectors without requiring extensive retraining or architectural changes, making it highly suitable for deployment on resource-constrained edge devices. The framework’s ability to reduce false positives and recover true positives, which heuristic methods might miss, is a significant step forward in building more robust vision-based fire safety systems.
The authors acknowledge certain limitations, such as the use of a single-pass dropout approximation for efficiency and the handcrafted nature of visual features optimized for fire and smoke. Future work aims to explore alternative uncertainty estimation strategies, extend the framework to video-based detection to leverage temporal information, and validate its adaptability in other object detection domains by designing new domain-specific visual features. For more in-depth technical details, you can refer to the full research paper here.


