spot_img
HomeResearch & DevelopmentSoccerHigh: Advancing Automatic Highlight Generation for Soccer Videos

SoccerHigh: Advancing Automatic Highlight Generation for Soccer Videos

TLDR: Researchers introduce SoccerHigh, a new public dataset of 237 soccer matches with professionally curated highlight summaries from Spanish, French, and Italian leagues. They also propose a semi-automated annotation pipeline, a baseline AI model for summarization, and a novel evaluation metric, establishing a comprehensive benchmark to advance automatic soccer video highlight generation.

In the fast-paced world of sports media, highlight reels are essential for capturing the most thrilling moments of a game. From goals and pivotal plays to crowd reactions, these summaries allow fans to quickly catch up on the action. However, the process of creating these highlights manually is time-consuming and labor-intensive for video editors. This challenge is particularly pronounced in soccer, where matches are long and the visual content can be highly repetitive.

A significant hurdle in developing automated systems for sports video summarization has been the scarcity of publicly available, well-annotated datasets. Researchers Artur Díaz-Juan, Coloma Ballester, and Gloria Haro from Universitat Pompeu Fabra have addressed this critical gap by introducing SoccerHigh: A Benchmark Dataset for Automatic Soccer Video Summarization. This new dataset is designed to serve as a robust foundation for training and evaluating models that can automatically generate soccer highlights.

Introducing the SoccerHigh Dataset

The SoccerHigh dataset is a meticulously curated collection of 237 soccer matches from the Spanish, French, and Italian leagues. What makes it unique is that each full-match broadcast video is paired with its corresponding official highlight summary, professionally crafted by media and soccer experts. These summaries are sourced from the SoccerNet dataset, a well-known resource in soccer video analysis.

The diversity of leagues and seasons included in SoccerHigh is crucial. It captures a wide range of editorial styles, demonstrating that video summarization goes beyond simply identifying in-game events. For instance, the average length of summaries varies significantly, with Spanish league summaries ranging from over five minutes to under two minutes across different seasons, while Italian and French leagues show more consistent durations. This variability highlights the subjective nature of highlight creation, which is heavily influenced by editorial preferences.

The dataset also provides insights into the composition of these summaries. On average, about 84% of the summary content is actual gameplay, with the remaining 16% consisting of non-playable phases like pre-match, half-time, and post-match segments. This breakdown further illustrates the distinct editorial approaches across different leagues, such as the differing emphasis on half-time content in Spanish versus Italian and French summaries.

A Smart Approach to Annotation

To minimize the extensive manual effort typically required for dataset annotation, the researchers developed a semi-automated pipeline. This innovative method involves two main stages: an initial shot segmentation of the summary video, followed by alignment with the corresponding segments in the full-match broadcast video. A final manual refinement step ensures accuracy and consistency.

The pipeline leverages advanced computer vision techniques. For shot boundary detection, a custom k-Nearest Neighbors (kNN) frame comparison method was found to be most effective. For extracting frame-level features, the DINOv2 model, pretrained on a large visual dataset, proved to be highly efficient and accurate. This semi-automated process significantly speeds up the annotation process, making it feasible to expand the dataset as more content becomes available.

A Baseline for Future Innovation

Alongside the dataset, the paper introduces a baseline model specifically designed for automatic soccer video summarization. This model serves as a starting point for future research and development in the field. The architecture comprises three main components: a feature extraction stage, a Transformer encoder, and a classification head. It processes video in fixed-length “chunks” to manage computational costs and identify relevant shots.

Through extensive experiments, the researchers identified the optimal configuration for this baseline model. The VideoMAEv2 giant encoder was found to be the most effective for extracting rich visual features, capturing both spatial and short-term temporal dynamics. Chunks of 60 seconds provided the best balance of contextual information without introducing too much redundant content. The model also benefits from combining both classification and regression heads, along with Non-Maximum Suppression (NMS) during inference to refine shot proposals, and MixUp data augmentation during training to improve generalization.

Also Read:

A New Way to Evaluate Summaries

One of the key contributions of this research is the proposal of a new, objective evaluation metric. Unlike traditional methods that constrain summaries to a fixed percentage of the original video length, this new metric aligns the predicted summary length with that of the ground truth. During evaluation, predicted shots are ranked by importance, and only the top-ranked shots are selected until their cumulative duration matches the ground truth summary.

This approach, termed Precision@T, Recall@T, and F1 Score@T, allows for a more objective assessment of a model’s ability to identify truly important moments, such as goals, while being less influenced by subjective editorial choices. The baseline model achieved an F1 Score@T of 0.3883 using this new metric, setting a clear benchmark for future advancements.

The SoccerHigh dataset, its semi-automated annotation pipeline, the proposed baseline model, and the innovative evaluation metrics collectively establish a comprehensive benchmark for automatic soccer video summarization. This work fills a critical void in the literature, providing researchers with the tools needed to develop more robust and effective AI models for generating compelling sports highlights. You can find more details about this research in the full paper available here.

Meera Iyer
Meera Iyerhttps://blogs.edgentiq.com
Meera Iyer is an AI news editor who blends journalistic rigor with storytelling elegance. Formerly a content strategist in a leading tech firm, Meera now tracks the pulse of India's Generative AI scene, from policy updates to academic breakthroughs. She's particularly focused on bringing nuanced, balanced perspectives to the fast-evolving world of AI-powered tools and media. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -