Video Quality of Experience and Rebuffers
In the early days of video streaming, viewers were willing to endure a frustrating playback experience to gain access to exclusive content. As the number of content providers sharing their content among multiple distributors has grown, Quality of Experience (QoE) has become vital to viewer retention.
Quality of Experience refers to the overall experience of a user watching a video stream. Unlike Quality of Service (QoS), QoE is a more subjective matter, thus difficult to measure, or to guarantee a certain level. QoE is made up of many key performance indicators (KPIs) that video services track to gain clarity of their platform’s performance. These quality metrics can be broken down into more specific areas of concern such as rebuffering or extensive bitrate fluctuation.
Of the various metrics, rebuffering is the most noticeable and annoying fault for viewers. That little spinning wheel is the symbol for a bad viewer experience. Video industry research consistently shows that viewers quickly abandon a stream when they experience rebuffering. The blame for rebuffering and a degraded QoE can be difficult to pinpoint and could stem from any number of sources across the viewer’s Internet Service Provider (ISP), the content delivery network (CDN), the client’s browser/player app, or the original publisher’s video infrastructure.
While problems with the ISP or the publisher are largely out of our control, we are now able to capture actionable data that enables us to identify and resolve QoE issues stemming from the CDN. To do this, we’ve developed an algorithm we call “Estimate Rebuffer” to identify video QoE issues using web server logs. This real-time monitoring system uses granular data to identify a range of QoE issues and drill down to understand root causes and corresponding resolution actions. In this post, we’ll look at how this algorithm works to determine QoE problems and how we’re able to use it to improve QoE.
The Estimate Rebuffer algorithm overview
One way to track QoE is for the player to send QoE data to the CDN. This requires players and clients to adopt a software development kit (SDK). And given the broad diversity in playback devices, client-side QoE metrics are nearly impossible to capture consistently. The Estimate Rebuffer algorithm mitigates the need for player/client changes or SDK adoption. It’s an estimate because it doesn’t need information sent via beacons from the client side. However, given its breadth across the data center and delivery networks, it provides much sharper insight to the root cause of QoE issues compared to client side alone.
The Estimate Rebuffer tool identifies QoE issues using server-side client access logs from the video services on our platform. To evaluate QoE, it uses three pieces of information:
- A timestamp of when a client requested for an asset/video-stream-chunk
- The filename of the asset/video-stream-chunk
- A session or client identifier From this information, without the need for third-party tools, the Estimate Rebuffer algorithm can determine key elements that influence QoE, including the following:
- Rebuffering — The algorithm provides detail on the number of rebuffers a client has seen, the duration of the rebuffer events and the ratio of the time spent rebuffering to the time spent watching the video stream.
- Average bitrate — Video quality is a function of the video bitrate. A higher average bitrate means better video quality, clearer and crisper picture, richer colors, and a better experience.
- Rate of fluctuation — Viewers tend to respond negatively to fluctuations in bitrate, preferring a constant bitrate. This metric determines the number of times the video stream changes its quality.
- Quality distribution — This allows us to determine what fraction of the video was served at what quality to a given client. For example, 80% was served at high quality, 10% medium, 10% low.
How it works
How is the Estimate Rebuffer algorithm able to provide such a useful evaluation of QoE with just two pieces of information? Let’s take a look.
An adaptive bitrate (ABR) video stream is made up of many individual video chunks or assets. Each chunk is of a fixed size, typically 4 seconds. For instance, a 40 second ABR video stream has 10 video chunks (40/4 = 10 chunks).
Each chunk is named sequentially, for example, A1.ts, A2.ts, A3.ts…A10.ts, and so on. The first letter is the quality type. In our case: A is lowest, B is higher than A, C is higher than B…, and so on. With this knowledge, we look at requests from each client and check them in sequence. If their quality changes, for example, A1.ts, B2.ts, A3.ts, we then add it to the rate of fluctuation metric.
Since we know when a client requested a chunk from us, and we know how long each chunk is (4 seconds), we can add all the time/duration for all the chunks requested. If we see a gap in between, that is, request gaps longer than the number of chunks the player requested in the past — which is the video in the buffer — we count it as a rebuffer. We also take into account how much of the buffered video a client would have watched when they make a new chunk request from the CDN.
This algorithm is not exclusive to Verizon Media and can be extended to other video services on the Verizon Media delivery network as long as they use a similar file naming convention.
With the QoE data in hand, we can focus on improving QoE in several ways, including debugging specific issues and identifying under-performing networks. Once we identify QoE issues, we can easily dig deeper to understand why it happened.
When we see poor QoE, we can look at per data center QoE metrics to identify which data center observed the poor QoE. Once we identify the data center, we can drill down to identify which network caused it, isolate causes, and recommend fixes. For example, one fix could be to not use a specific network during the next live video stream if that network has shown to be prone to failures. Usually, when we have rebuffering issues, we manually move traffic before the event begins to make room for new traffic. Based on the data generated by the estimate rebuffer algorithm, our traffic management team can create a pre-game buffer to move traffic before the game starts to preempt the capacity issues.
And since the system can work in real time, we can potentially take proactively corrective actions during a live streaming video. This could entail, for example, moving traffic from a data center experiencing poor QoE to a healthier data center. Real-time error detection and resolution is a highly effective tool for reducing the number of clients experiencing rebuffering or other issues.
Although it may never be possible to eliminate the dreaded spinning wheel, server-side analysis tools like the Estimate Rebuffer algorithm go a long way toward making its appearance much less frequent.
Our prior work on this subject was published in the 2017 IEEE International Conference on Network Protocols (ICNP)