AI Models Predict Pedestrian Road Crossing Intent for Enhanced Autonomous Vehicle Safety

TLDR: This research paper details a system that uses pose detection (MediaPipe) and deep learning sequence models (LSTM, GRU, 1D CNN) to predict a pedestrian’s intent to cross the road. The study aims to improve the safety and decision-making of autonomous vehicles in complex urban environments by analyzing pedestrian movements from video data. GRU models demonstrated the highest prediction accuracy, while 1D CNN models offered the fastest inference speed, making them suitable for real-time applications despite slightly lower accuracy. The findings are crucial for advancing intelligent transportation systems and have potential applications beyond autonomous driving.

The rapid advancement of artificial intelligence (AI) is transforming various aspects of our daily lives, from manufacturing robots to autopilot systems in aviation and self-driving cars. While autonomous vehicles excel on highways, they face significant challenges in bustling urban environments where pedestrian behavior can be highly unpredictable. This unpredictability poses a substantial risk of accidents, making it crucial for self-driving cars to accurately anticipate pedestrian actions.

A recent research paper, “Predicting Road Crossing Behaviour using Pose Detection and Sequence Modelling”, addresses this critical issue by developing a system designed to predict a pedestrian’s intent to cross the road based on their movements. The goal is to enhance the decision-making capabilities of autonomous vehicles, particularly in complex urban settings where human drivers often switch to manual control.

How the Study Was Conducted

The researchers created an experimental setup where participants simulated various road-crossing behaviors, which were meticulously recorded on video. This rich dataset allowed the team to train deep learning models to recognize subtle cues and patterns indicative of a person’s intent to cross. By analyzing sequences of movements, the models learned to differentiate between genuine crossing attempts and other movements that might not result in crossing the road.

A key part of the methodology involved using MediaPipe, a Python package maintained by the Google AI team, for pose detection. MediaPipe’s pre-trained model extracts 33 key points (landmarks) of the human body from each video frame. For this study, only the X and Y coordinates of these 33 points (66 coordinates per frame) were used, simplifying the data while still capturing essential movement information. Each frame was then manually labeled as either ‘crossing’ or ‘not crossing’ based on observed behavior.

Deep Learning Models for Prediction

The study employed three different sequence modeling techniques to process and interpret the video data: Long Short-Term Memory (LSTM) networks, Gated Recurrent Units (GRU), and one-dimensional Convolutional Neural Networks (1D CNN). These models are particularly well-suited for understanding temporal dynamics in video sequences.

LSTM and GRU: These are types of recurrent neural networks (RNNs) known for their ability to handle sequences and maintain a memory of previous frames. GRU models are generally faster than LSTMs due to having fewer gates, often with comparable performance.
1D CNN: A convolutional neural network applied to one-dimensional data, which is much faster for inference compared to LSTMs and GRUs.

The models were trained to classify the 16th frame based on the preceding 15 frames, predicting whether a pedestrian was crossing the road. An interesting aspect of the training involved reclassifying frames where pedestrians initially appeared to cross but then decided to backtrack, allowing the models to learn these nuanced changes in intent.

Key Findings and Performance

After training, the models were evaluated on a separate test dataset. The results showed:

GRU: Achieved the highest Test AUC (Area Under the ROC Curve) at 89.24% and a Test Accuracy of 85.52%. Its inference time was 2 ms.
LSTM: Showed a Test AUC of 86.38% and the highest Test Accuracy at 87.00%. Its inference time was 3 ms.
1D CNN: While having slightly lower performance (Test Accuracy of 81.95% and Test AUC of 74.27%), it was significantly faster with an inference time of just 1 ms.

The study highlighted that while GRU was the most accurate and faster than LSTM, the 1D CNN model offered the best speed, making it a strong candidate for real-time applications where inference speed is critical, even with a slight trade-off in accuracy. The end-to-end framework, combining MediaPipe with these sequence models, processed frames at an average speed of 43 milliseconds per frame, which is faster than some previous research but still presents a lag for 30 FPS video.

Also Read:

Future Implications

This research demonstrates the effectiveness of using pose detection and sequence modeling for predicting pedestrian behavior. The ability to accurately anticipate pedestrian intent is invaluable for enhancing the safety and decision-making capabilities of autonomous driving systems. Beyond autonomous vehicles, this methodology could be applied to other areas requiring sequential analysis of human movements, such as sports analytics or detecting suspicious activities in restricted areas.

The study also acknowledged limitations, including focusing on a single pedestrian in videos and the need to further optimize inference time for real-time performance with higher frame rates. Future research aims to address these challenges, paving the way for more robust and efficient pedestrian prediction systems.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

AI Models Predict Pedestrian Road Crossing Intent for Enhanced Autonomous Vehicle Safety

How the Study Was Conducted

Deep Learning Models for Prediction

Key Findings and Performance

Future Implications

Gen AI News and Updates

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Generative AI Powers Next-Gen Autonomous Emergency Response

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates