spot_img
HomeResearch & DevelopmentDeep Learning Continues to Lead in Information Retrieval: Insights...

Deep Learning Continues to Lead in Information Retrieval: Insights from TREC 2021

TLDR: The TREC 2021 Deep Learning Track evaluated advanced information retrieval methods using significantly expanded MS MARCO datasets. The track confirmed the superior performance of deep neural ranking models, especially those utilizing large-scale pretraining, over traditional methods. It also highlighted the growing trend towards single-stage retrieval with deep models. The report discusses the challenges of dataset scale, judgment completeness, and the impact of query length on evaluation, offering insights into future directions for information retrieval research and benchmarking.

The TREC Deep Learning Track, now in its third year, continues to be a pivotal benchmark for ad hoc retrieval methods, particularly in the context of vast datasets. The 2021 edition brought significant updates, leveraging refreshed and substantially expanded versions of the MS MARCO datasets. These datasets are crucial as they provide hundreds of thousands of human-annotated training labels for both passage and document ranking tasks.

A major highlight of the TREC 2021 track was the introduction of the MS MARCO v2 dataset. This new version dramatically increased the scale of the collections, with the document collection growing nearly four times and the passage collection expanding by almost sixteen times. This expansion aimed to provide a more realistic large-data environment for evaluating retrieval systems and to incorporate additional metadata, such as passage-to-document mappings, which can be valuable for ranking.

Key Tasks and Evaluation

Similar to previous years, the 2021 Deep Learning Track featured two primary tasks: document retrieval and passage retrieval. Participants could submit up to three runs for each task, detailing the external data, pre-trained models, and other resources used, as well as the model style. A consistent set of 477 queries was used across both tasks, with a subset selected for judging based on query length (short vs. long queries).

For evaluation, judgments were collected on a four-point scale, ranging from ‘Irrelevant’ (0) to ‘Perfectly relevant’ (3). The document retrieval task included two subtasks: full retrieval, which models an end-to-end scenario from the entire document collection, and top-100 reranking, where participants re-ranked an initial set of 100 documents provided by Pyserini. The passage retrieval task followed a similar structure, with full retrieval from a large passage collection and a top-100 reranking subtask.

The Evolution of MS MARCO Datasets

The MS MARCO dataset originated from a natural language generation task, where crowd workers generated answers to queries based on provided passages. This data was later adapted for ranking tasks, leading to the v1 datasets used in TREC 2019 and 2020. The v1 datasets, while valuable, had some limitations, such as corpus generation based on queries and character set issues.

The MS MARCO v2 dataset, used for the first time in TREC 2021, addressed many of these issues. It started by identifying documents, expanding the collection to 11.9 million documents, and then identifying promising passages within them, resulting in 138 million passages. The v2 data also improved character encoding and whitespace issues, making it a cleaner and more comprehensive resource for information retrieval research. For more in-depth details, you can refer to the full research paper here.

Performance Trends: Neural vs. Traditional Methods

The track saw participation from 19 groups, submitting a total of 129 runs. A notable trend observed was the continued dominance of deep neural ranking models, particularly those employing large-scale pretraining (categorized as ‘nnlm’). These ‘nnlm’ runs consistently outperformed traditional retrieval methods (‘trad’) across both document and passage ranking tasks. The percentage of ‘nnlm’ submissions significantly increased, while runs without pre-trained models (‘nn’) almost disappeared, indicating a convergence in the neural information retrieval community towards large language models.

While ‘nnlm’ runs showed clear superiority, the paper also explored the performance of single-stage retrieval methods. Surprisingly, these methods performed well, though they still lagged behind multi-stage retrieval pipelines. The analysis also delved into how system performance varied with query length, finding that longer queries might be more discriminative for evaluation, especially for neural systems.

Also Read:

Challenges and Future Directions

The increased size of the v2 datasets, while beneficial for realism, posed challenges for judgment completeness due to budget constraints for NIST assessors. This led to concerns about the reusability of the dataset for benchmarking outside of TREC settings. The paper also discussed the agreement between NIST judgments and the original sparse MS MARCO labels, noting a decrease in agreement over the years, partly attributed to an ‘oldness’ artifact where models learned to favor older documents in the corpus due to the training data’s characteristics.

Looking ahead, the track organizers are considering options for future evaluations, such as focusing on a ‘v1 universe’ for development set evaluations or adjusting training procedures to mitigate the ‘oldness’ bias. The potential for inferring document-level labels from passage-level labels was also explored as a way to create more complete test collections, suggesting a hybrid evaluation dataset combining inferred and actual labels for future tracks.

In conclusion, the TREC 2021 Deep Learning Track reinforced the strong performance of pre-trained deep neural models in information retrieval, while also shedding light on the complexities of large-scale dataset creation, judgment completeness, and evaluation methodologies in this rapidly evolving field.

Meera Iyer
Meera Iyerhttps://blogs.edgentiq.com
Meera Iyer is an AI news editor who blends journalistic rigor with storytelling elegance. Formerly a content strategist in a leading tech firm, Meera now tracks the pulse of India's Generative AI scene, from policy updates to academic breakthroughs. She's particularly focused on bringing nuanced, balanced perspectives to the fast-evolving world of AI-powered tools and media. You can reach her out at: [email protected]

- Advertisement -

spot_img

Gen AI News and Updates

spot_img

- Advertisement -