spot_img
HomeResearch & DevelopmentAI Models Predict Pedestrian Road Crossing Intent for Enhanced...

AI Models Predict Pedestrian Road Crossing Intent for Enhanced Autonomous Vehicle Safety

TLDR: This research paper details a system that uses pose detection (MediaPipe) and deep learning sequence models (LSTM, GRU, 1D CNN) to predict a pedestrian’s intent to cross the road. The study aims to improve the safety and decision-making of autonomous vehicles in complex urban environments by analyzing pedestrian movements from video data. GRU models demonstrated the highest prediction accuracy, while 1D CNN models offered the fastest inference speed, making them suitable for real-time applications despite slightly lower accuracy. The findings are crucial for advancing intelligent transportation systems and have potential applications beyond autonomous driving.

The rapid advancement of artificial intelligence (AI) is transforming various aspects of our daily lives, from manufacturing robots to autopilot systems in aviation and self-driving cars. While autonomous vehicles excel on highways, they face significant challenges in bustling urban environments where pedestrian behavior can be highly unpredictable. This unpredictability poses a substantial risk of accidents, making it crucial for self-driving cars to accurately anticipate pedestrian actions.

A recent research paper, “Predicting Road Crossing Behaviour using Pose Detection and Sequence Modelling”, addresses this critical issue by developing a system designed to predict a pedestrian’s intent to cross the road based on their movements. The goal is to enhance the decision-making capabilities of autonomous vehicles, particularly in complex urban settings where human drivers often switch to manual control.

How the Study Was Conducted

The researchers created an experimental setup where participants simulated various road-crossing behaviors, which were meticulously recorded on video. This rich dataset allowed the team to train deep learning models to recognize subtle cues and patterns indicative of a person’s intent to cross. By analyzing sequences of movements, the models learned to differentiate between genuine crossing attempts and other movements that might not result in crossing the road.

A key part of the methodology involved using MediaPipe, a Python package maintained by the Google AI team, for pose detection. MediaPipe’s pre-trained model extracts 33 key points (landmarks) of the human body from each video frame. For this study, only the X and Y coordinates of these 33 points (66 coordinates per frame) were used, simplifying the data while still capturing essential movement information. Each frame was then manually labeled as either ‘crossing’ or ‘not crossing’ based on observed behavior.

Deep Learning Models for Prediction

The study employed three different sequence modeling techniques to process and interpret the video data: Long Short-Term Memory (LSTM) networks, Gated Recurrent Units (GRU), and one-dimensional Convolutional Neural Networks (1D CNN). These models are particularly well-suited for understanding temporal dynamics in video sequences.

  • LSTM and GRU: These are types of recurrent neural networks (RNNs) known for their ability to handle sequences and maintain a memory of previous frames. GRU models are generally faster than LSTMs due to having fewer gates, often with comparable performance.
  • 1D CNN: A convolutional neural network applied to one-dimensional data, which is much faster for inference compared to LSTMs and GRUs.

The models were trained to classify the 16th frame based on the preceding 15 frames, predicting whether a pedestrian was crossing the road. An interesting aspect of the training involved reclassifying frames where pedestrians initially appeared to cross but then decided to backtrack, allowing the models to learn these nuanced changes in intent.

Key Findings and Performance

After training, the models were evaluated on a separate test dataset. The results showed:

  • GRU: Achieved the highest Test AUC (Area Under the ROC Curve) at 89.24% and a Test Accuracy of 85.52%. Its inference time was 2 ms.
  • LSTM: Showed a Test AUC of 86.38% and the highest Test Accuracy at 87.00%. Its inference time was 3 ms.
  • 1D CNN: While having slightly lower performance (Test Accuracy of 81.95% and Test AUC of 74.27%), it was significantly faster with an inference time of just 1 ms.

The study highlighted that while GRU was the most accurate and faster than LSTM, the 1D CNN model offered the best speed, making it a strong candidate for real-time applications where inference speed is critical, even with a slight trade-off in accuracy. The end-to-end framework, combining MediaPipe with these sequence models, processed frames at an average speed of 43 milliseconds per frame, which is faster than some previous research but still presents a lag for 30 FPS video.

Also Read:

Future Implications

This research demonstrates the effectiveness of using pose detection and sequence modeling for predicting pedestrian behavior. The ability to accurately anticipate pedestrian intent is invaluable for enhancing the safety and decision-making capabilities of autonomous driving systems. Beyond autonomous vehicles, this methodology could be applied to other areas requiring sequential analysis of human movements, such as sports analytics or detecting suspicious activities in restricted areas.

The study also acknowledged limitations, including focusing on a single pedestrian in videos and the need to further optimize inference time for real-time performance with higher frame rates. Future research aims to address these challenges, paving the way for more robust and efficient pedestrian prediction systems.

Ananya Rao
Ananya Raohttps://blogs.edgentiq.com
Ananya Rao is a tech journalist with a passion for dissecting the fast-moving world of Generative AI. With a background in computer science and a sharp editorial eye, she connects the dots between policy, innovation, and business. Ananya excels in real-time reporting and specializes in uncovering how startups and enterprises in India are navigating the GenAI boom. She brings urgency and clarity to every breaking news piece she writes. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -