SoccerHigh: Advancing Automatic Highlight Generation for Soccer Videos

TLDR: Researchers introduce SoccerHigh, a new public dataset of 237 soccer matches with professionally curated highlight summaries from Spanish, French, and Italian leagues. They also propose a semi-automated annotation pipeline, a baseline AI model for summarization, and a novel evaluation metric, establishing a comprehensive benchmark to advance automatic soccer video highlight generation.

In the fast-paced world of sports media, highlight reels are essential for capturing the most thrilling moments of a game. From goals and pivotal plays to crowd reactions, these summaries allow fans to quickly catch up on the action. However, the process of creating these highlights manually is time-consuming and labor-intensive for video editors. This challenge is particularly pronounced in soccer, where matches are long and the visual content can be highly repetitive.

A significant hurdle in developing automated systems for sports video summarization has been the scarcity of publicly available, well-annotated datasets. Researchers Artur Díaz-Juan, Coloma Ballester, and Gloria Haro from Universitat Pompeu Fabra have addressed this critical gap by introducing SoccerHigh: A Benchmark Dataset for Automatic Soccer Video Summarization. This new dataset is designed to serve as a robust foundation for training and evaluating models that can automatically generate soccer highlights.

Introducing the SoccerHigh Dataset

The SoccerHigh dataset is a meticulously curated collection of 237 soccer matches from the Spanish, French, and Italian leagues. What makes it unique is that each full-match broadcast video is paired with its corresponding official highlight summary, professionally crafted by media and soccer experts. These summaries are sourced from the SoccerNet dataset, a well-known resource in soccer video analysis.

The diversity of leagues and seasons included in SoccerHigh is crucial. It captures a wide range of editorial styles, demonstrating that video summarization goes beyond simply identifying in-game events. For instance, the average length of summaries varies significantly, with Spanish league summaries ranging from over five minutes to under two minutes across different seasons, while Italian and French leagues show more consistent durations. This variability highlights the subjective nature of highlight creation, which is heavily influenced by editorial preferences.

The dataset also provides insights into the composition of these summaries. On average, about 84% of the summary content is actual gameplay, with the remaining 16% consisting of non-playable phases like pre-match, half-time, and post-match segments. This breakdown further illustrates the distinct editorial approaches across different leagues, such as the differing emphasis on half-time content in Spanish versus Italian and French summaries.

A Smart Approach to Annotation

To minimize the extensive manual effort typically required for dataset annotation, the researchers developed a semi-automated pipeline. This innovative method involves two main stages: an initial shot segmentation of the summary video, followed by alignment with the corresponding segments in the full-match broadcast video. A final manual refinement step ensures accuracy and consistency.

The pipeline leverages advanced computer vision techniques. For shot boundary detection, a custom k-Nearest Neighbors (kNN) frame comparison method was found to be most effective. For extracting frame-level features, the DINOv2 model, pretrained on a large visual dataset, proved to be highly efficient and accurate. This semi-automated process significantly speeds up the annotation process, making it feasible to expand the dataset as more content becomes available.

A Baseline for Future Innovation

Alongside the dataset, the paper introduces a baseline model specifically designed for automatic soccer video summarization. This model serves as a starting point for future research and development in the field. The architecture comprises three main components: a feature extraction stage, a Transformer encoder, and a classification head. It processes video in fixed-length “chunks” to manage computational costs and identify relevant shots.

Through extensive experiments, the researchers identified the optimal configuration for this baseline model. The VideoMAEv2 giant encoder was found to be the most effective for extracting rich visual features, capturing both spatial and short-term temporal dynamics. Chunks of 60 seconds provided the best balance of contextual information without introducing too much redundant content. The model also benefits from combining both classification and regression heads, along with Non-Maximum Suppression (NMS) during inference to refine shot proposals, and MixUp data augmentation during training to improve generalization.

Also Read:

A New Way to Evaluate Summaries

One of the key contributions of this research is the proposal of a new, objective evaluation metric. Unlike traditional methods that constrain summaries to a fixed percentage of the original video length, this new metric aligns the predicted summary length with that of the ground truth. During evaluation, predicted shots are ranked by importance, and only the top-ranked shots are selected until their cumulative duration matches the ground truth summary.

This approach, termed Precision@T, Recall@T, and F1 Score@T, allows for a more objective assessment of a model’s ability to identify truly important moments, such as goals, while being less influenced by subjective editorial choices. The baseline model achieved an F1 Score@T of 0.3883 using this new metric, setting a clear benchmark for future advancements.

The SoccerHigh dataset, its semi-automated annotation pipeline, the proposed baseline model, and the innovative evaluation metrics collectively establish a comprehensive benchmark for automatic soccer video summarization. This work fills a critical void in the literature, providing researchers with the tools needed to develop more robust and effective AI models for generating compelling sports highlights. You can find more details about this research in the full paper available here.

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

Financial Sector Fortifies Against Surging AI-Powered Scams

Deloitte’s 2025 Outlook: Navigating Escalating AI Challenges in Human Capital

Salesforce Study Reveals Data Quality is Pivotal for Employee Trust in AI Adoption

Top Executives Sidestep Company AI Guidelines, Fueling Shadow AI Risks

Intel’s Evolving IP Strategy: A Calculated Shift Towards Core AI Innovation

Generative AI Prompts Increased Workforce Surveillance in Indian IT Sector

SoccerHigh: Advancing Automatic Highlight Generation for Soccer Videos

Introducing the SoccerHigh Dataset

A Smart Approach to Annotation

A Baseline for Future Innovation

A New Way to Evaluate Summaries

Gen AI News and Updates

Amazon Bedrock’s A2A Protocol: The Catalyst for Next-Gen Cross-Framework Multi-Agent AI Systems

AZTECH Introduces Comprehensive AI Training Series to Propel Regional Digital Transformation

Financial Sector Fortifies Against Surging AI-Powered Scams

Boosting Business Efficiency: A New AI and Big Data Model for Process Optimization

AlphaCast: A New Approach to Time Series Prediction Through Human-AI Collaboration

New Graph Neural Networks Improve Reasoning in Assumption-Based Argumentation

Enhancing AI Reasoning: How Recursive Refinement and Multi-Agent Systems Improve Language Model Performance

ARGUS: A Proactive Framework for Enhancing Autonomous Driving Safety

Generative AI Powers Next-Gen Autonomous Emergency Response

OR-R1: Advancing Automated Optimization with Smart, Data-Efficient AI

Enhancing GUI Agents with Memory: A New Framework for History-Aware Reasoning

ProBench: A Deeper Look into How We Evaluate AI Agents for Mobile Apps

Enhancing Large Language Model Reasoning with Concise Outputs

Ensuring Trust in Autonomous AI: A Two-Layered Monitoring Approach for Agentic Systems

MedFuse: A Multiplicative Approach to Understanding Irregular Clinical Time Series Data

HyperD: A New Framework for More Accurate and Robust Traffic Predictions

Beyond Training: Researchers Propose ‘Model Raising’ for AI with Intrinsic Values

Bridging the Divide: Why AI Needs a Qualitative Revolution

Language Models Enhance Safety Certificate Synthesis for Dynamic Systems

Subscribe to get the latest news and updates