TLDR: This research introduces a novel competence metric for AI-based trajectory planners in automated driving. It assesses whether an autonomous vehicle’s AI is adequately trained for a given situation by combining two factors: the “coverage” (how often similar situations appeared in training data) and the “complexity” of the driving scene. Driving scenes are modeled as knowledge graphs, allowing for detailed analysis of sub-scene configurations. The metric helps identify situations where the AI might be unreliable, enhancing trustworthiness and safety. Experiments on the NuPlan dataset show a correlation between this competence metric and the actual performance of a trained trajectory planner.
Automated driving functions are increasingly relying on advanced machine learning algorithms for critical tasks like perception and trajectory planning. While these AI systems offer immense potential, they often operate as ‘black boxes,’ making it difficult to understand their internal workings and predict their behavior in all situations. This lack of transparency poses a significant challenge to ensuring public acceptance and trustworthiness in autonomous vehicles.
To address this, a new research paper titled “What Did I Learn? Operational Competence Assessment for AI-Based Trajectory Planners” by Michiel Braat, Maren Buermann, Marijke van Weperen, and Jan-Pieter Paardekooper introduces a novel method to assess the operational competence of AI-based trajectory planners. The core idea is to determine if an automated vehicle has been sufficiently trained for a specific driving situation, thereby identifying potential operational risks.
Modeling Driving Data with Knowledge Graphs
The researchers propose a unique approach: modeling driving data as knowledge graphs (KGs). Imagine a detailed map of a driving scene where every object – from the ego vehicle itself to other cars, pedestrians, lane segments, and intersections – is represented as a ‘node.’ The relationships between these objects, such as a car being ‘ahead of’ another or a lane being ‘connected to’ an intersection, are represented as ‘edges.’ These knowledge graphs provide a human-understandable, symbolic description of complex driving scenes.
These KGs are constructed by taking snapshots of the world model data, parsing infrastructure elements like lanes and connectors, and adding dynamic actors like the ego vehicle and other objects. Each node and edge is enriched with labels and attributes to distinguish their types and properties, such as a lane’s speed limit or an object’s velocity.
Measuring Competence: Coverage and Complexity
The competence metric developed in this paper combines two crucial aspects: ‘coverage’ and ‘complexity.’
- Coverage: This measures how often a specific driving situation, or ‘sub-scene,’ has been encountered in the AI’s training dataset. Sub-scenes are specific patterns extracted from the knowledge graphs using queries – for example, an ‘ego vehicle driving on a straight road’ or ‘ego vehicle approaching an intersection.’ The more frequently a sub-scene appears in the training data, the higher the coverage.
- Complexity: This assesses how challenging a particular driving scene is. The complexity is broken down into three components: environment complexity (number of unique elements), road obstacle complexity (number of distinct obstacles like traffic cones), and dynamic entities complexity (number, speed, and distance of other road users, and the ego vehicle’s own velocity). More complex scenes naturally require greater coverage in the training data for the AI to achieve high competence.
The competence score is calculated by multiplying the coverage by (1 minus complexity). This means a high competence score is achieved when coverage is high and complexity is low. For more complex scenes, a significantly higher coverage is needed to maintain adequate competence.
Real-World Application and Results
The method was applied to the NuPlan dataset, which includes driving data from cities like Boston and Singapore. The researchers processed thousands of scenes, converting them into knowledge graphs and analyzing the coverage and complexity of various sub-scene configurations.
Experiments demonstrated that the Singapore dataset generally contained more diverse contexts, leading to higher coverage for many sub-scenes compared to the Boston data. Conversely, Boston’s data exhibited slightly higher complexity values, likely due to its more urban environment captured in the dataset.
To validate the competence metric, a deep neural network (DNN) trajectory planner was trained on the Singapore data and evaluated on the Boston data. The results showed a weak but statistically significant negative correlation between the competence metric and common trajectory prediction evaluation metrics (like Miss Rate, minADE, minFDE, and brier-minFDE). This indicates that as the competence score increases, the performance of the trajectory planner generally improves, and vice-versa. In simpler terms, when the AI is deemed more competent for a situation, its predictions are more accurate.
The research highlights that while a low competence score suggests an unseen or complex situation where the model’s output might be unpredictable, the model could still occasionally infer a correct trajectory. This underscores the need for continuous improvement in modeling context and expanding the range of sub-scenes queried.
Also Read:
- BridgeDrive: A Principled Advance in Autonomous Driving Trajectory Planning
- Assessing Agent-Level Risk in Autonomous Vehicles: The NuRisk Dataset
Towards Trustworthy AI in Automated Driving
This novel competence metric offers a valuable tool for enabling safer AI in automated driving. By providing insights into what an AI model has truly ‘learned’ and where its operational limits might lie, it helps predict whether a trajectory planning output is trustworthy. Future work aims to refine the competence equation, enhance the knowledge graph model to include more context and temporal interactions, and expand the list of sub-scene queries to reduce ‘unknown’ scene classifications.
For more details, you can read the full research paper here.


