TLDR: This research paper compares eleven Information Retrieval (IR) techniques applied to IT support tickets to help analysts quickly find solutions. Using a dataset of over 20,000 tickets, the study found that Sentence-BERT, particularly its multi-language variation, achieved the best results, with 78.7% of recommendations being relevant. Other techniques like TF-IDF, Word2vec, and LDA also performed well. The project also open-sourced its dataset and code, proposed a new evaluation metric, and implemented a functional prototype system to demonstrate the practical benefits of using AI to streamline IT help desk operations.
In today’s world, where technology is deeply integrated into our daily lives, the smooth functioning of IT services is paramount. When issues arise, IT support teams are the first line of defense, handling a constant stream of service requests, often referred to as support tickets. These teams accumulate a vast amount of knowledge in their databases from past resolved issues. However, sifting through this information to find relevant solutions for new problems can be incredibly time-consuming and complex, leading to delays and impacting user experience.
Addressing the IT Support Challenge
A recent research paper delves into this very challenge, exploring how Information Retrieval (IR) techniques can revolutionize the way IT support analysts find solutions. The core idea is to leverage past, similar support tickets to quickly guide analysts to effective resolutions, thereby saving effort and significantly improving service quality. The study, conducted at Skaylink, an IT services company, utilized a proprietary database of IT support tickets to test various IR methods.
Comparing Information Retrieval Techniques
The researchers embarked on a comprehensive comparison of eleven different Information Retrieval techniques. These techniques represent a broad spectrum of approaches, from traditional methods like Expert Systems and statistical ones like TF-IDF (Term Frequency-Inverse Document Frequency), to probabilistic models such as BM25 and LDA (Latent Dirichlet Allocation). The study also included advanced neural network-based approaches, such as Word2vec, Doc2vec, and variations of BERT and Sentence-BERT.
The dataset for this extensive comparison comprised 20,356 anonymized IT support tickets submitted between 2017 and 2022. These tickets, primarily in English but also including Portuguese, German, and Spanish, often contained grammatical errors and technical jargon, reflecting real-world conditions. For evaluation, 300 representative tickets were meticulously labeled by IT analysts, who manually identified the five most similar past tickets for each.
The Standout Performer: Sentence-BERT
The results of the comparison were clear: the Sentence-BERT technique, specifically its multi-language variation `distiluse-base-multilingual-cased-v1`, emerged as the top performer. This model achieved an impressive 78.7% “at least one accuracy,” meaning that nearly four out of five times, the system recommended at least one relevant past solution. Its precision stood at 35.1%. Other techniques like TF-IDF (69.0% accuracy), Word2vec (68.7% accuracy), and LDA (66.3% accuracy) also showed consistent and strong results.
The study highlighted that while Sentence-BERT, a more recent technique, delivered the best performance, even simpler methods like TF-IDF proved to be robust and computationally efficient, making them viable alternatives depending on specific implementation needs. Interestingly, retraining some neural networks with the specific dataset did not always yield superior results, suggesting that large, pre-trained models often retain their advantage.
Also Read:
- Enhancing Conversational Recommender Systems with Smart Data Augmentation
- Retrieval-Augmented Generation: A Comprehensive Review of Its Landscape
A Practical Prototype and Future Directions
Beyond theoretical comparison, the research team developed a minimal viable prototype system using the best-performing Sentence-BERT technique. This prototype, designed to integrate into daily IT support workflows, allows analysts to input a new ticket and receive instant recommendations of similar, previously resolved issues. The system also includes a feedback mechanism for analysts to rate the usefulness of the recommendations, ensuring continuous improvement.
This work not only provides valuable insights into the effectiveness of various IR techniques for IT support but also contributes significantly to the academic community by making the dataset and code open source. Furthermore, it introduces a novel metric, “at least one accuracy,” which more closely reflects an IT analyst’s real-world perception of a retrieval system’s quality. The full details of this research can be found in the paper: Comparison of Information Retrieval Techniques Applied to IT Support Tickets.
Future work could involve expanding the search capabilities to the entire database, integrating with advanced vector similarity search databases, and exploring other cutting-edge techniques to further enhance the efficiency and accuracy of IT support systems.


