TLDR: A new data-driven framework for bandwidth estimation, developed by Microsoft, significantly improves the Quality of Experience (QoE) in real-time video communication. By training objective QoE reward models from subjective user evaluations and utilizing a novel distributional offline reinforcement learning algorithm on 1 million real-world Microsoft Teams call traces, the system reduced the subjective poor call ratio by 11.41% and enhanced video quality. This approach ensures safer deployment by learning from historical data and is now active in Microsoft Teams.
In today’s interconnected world, video conferencing has become an indispensable tool for work, education, and social interaction. However, the quality of these real-time video calls, often referred to as Quality of Experience (QoE), can be significantly impacted by how accurately the system estimates the available internet bandwidth between participants. This estimation is a complex challenge due to constantly changing network conditions, diverse device types, and the difficulty of truly understanding what makes a user’s experience good or bad.
Understanding the Challenge of Video Call Quality
When you’re on a video call, your device constantly tries to figure out how much data it can send without overwhelming the network. If it sends too much, you get congestion, leading to frustrating issues like video freezes, choppy audio, and dropped packets. Send too little, and you’re not using the network’s full potential, resulting in lower quality video and audio than what’s possible. The goal is to find that sweet spot for optimal QoE, which goes beyond simple technical metrics like speed and packet loss to truly capture user satisfaction.
A New Approach to Bandwidth Estimation
Researchers at Microsoft have developed a sophisticated, data-driven framework designed to tackle these challenges. Their approach integrates human feedback into the system, using advanced machine learning to predict and optimize the quality of experience. This framework is already deployed in Microsoft Teams, serving millions of users daily.
Measuring User Experience: QoE Reward Models
A core part of this system involves creating objective models that can predict audio and video quality. This starts with extensive subjective user evaluations, where real people rate the quality of audio and video samples according to international standards (ITU-T P.808 and P.910). These human ratings are then used to train AI models that can measure audio and video quality in real-time. To ensure these models are efficient and privacy-preserving for deployment on user devices, they are ‘distilled’ into simpler versions that rely on key media metrics (like audio receive rate, jitter, packet loss concealment for audio; and resolution, frame rate, freezes for video) rather than raw audio or video signals. The final QoE reward is a weighted combination of these predicted audio and video quality scores, ensuring the system optimizes for what users actually perceive.
Learning from Real-World Data with Offline Reinforcement Learning
To train the bandwidth estimator, the team collected an enormous dataset: approximately 1 million network traces from actual Microsoft Teams calls. These traces were rich with information, including network conditions and the QoE rewards predicted by the newly developed models. Instead of using traditional online reinforcement learning, which can be risky in live production environments due to potential for suboptimal actions, they employed a novel distributional offline reinforcement learning (RL) algorithm. This ‘offline’ approach allows the AI to learn optimal strategies from historical data without needing to experiment in real-time, making deployment much safer. The algorithm, called DIQL (Distributional Implicit Q-learning), is designed to handle the complex, partially observable nature of network conditions and predict the full range of possible QoE outcomes, not just an average.
Real-World Impact: Microsoft Teams Deployment
The true test of this framework came with a large-scale A/B test conducted within Microsoft Teams. Over two weeks, involving more than 25 million calls globally, the new bandwidth estimator was compared against the existing baseline system. The results were highly encouraging: the proposed approach led to an 11.41% reduction in the subjective poor call ratio – meaning significantly fewer users reported a bad call experience. There were also statistically significant improvements in objective video quality scores, while audio quality remained consistently high.
Also Read:
- Optimizing Video Encoding for High-Quality Production with LiteVPNet
- AI-Powered Network Control for Time-Sensitive Data Delivery
Beyond the Lab: Robustness and Generalization
Further evaluations in controlled testbed environments demonstrated the algorithm’s robust performance across a wide variety of network conditions, including fluctuating bandwidth and different types of packet loss. It consistently outperformed other state-of-the-art offline reinforcement learning methods. To prove its versatility, the DIQL algorithm was also benchmarked on standard continuous control tasks from the D4RL suite, showing competitive performance even outside the specific domain of bandwidth estimation.
This work represents a significant step forward in optimizing real-time video communication. By combining human-aligned QoE modeling with safe, data-driven offline reinforcement learning, Microsoft has successfully deployed a system that genuinely enhances user experience in a complex, latency-sensitive environment. For more in-depth technical details, you can read the full research paper available here.


