How streaming giants deal with huge file sizes?

Case Study: How Netflix is improving video quality with neural networks

A corporation that has always been at the forefront of innovation is Netflix, especially in online video streaming. They've had great success, in particular, employing machine learning and neural networks to raise the caliber of their video output. The Netflix Deep Downscaler, a neural network-based algorithm used to downscale high-resolution video footage to a lower resolution while preserving as much of the original video quality as possible, is one of the most recent developments in this field. In this case study, we'll go in-depth on the Deep Downscaler and examine how it enables Netflix to reach millions of people worldwide with high-caliber video content.

It's vital to remember that using neural networks and machine learning for video compression and quality enhancement is not a novel idea before getting into the Deep Downscaler. In reality, scientists have been studying this topic for a long time. However, it has only recently become feasible to use these technologies in actual applications thanks to developments in hardware and software. One of the industry leaders, Netflix has made significant investments in creating and improving its neural network-based video processing algorithms. One example of their continual attempts to give their consumers the finest viewing experience possible is the Deep Downscaler.

What is video encoding?

Encoding is an important step in video streaming that aids in the effective transmission of video data over the internet. The original video data must be converted from one format to another in this process, frequently to a compressed format that uses less storage space. Different video formats, including H.264, H.265/HEVC, and AV1, each have advantages and disadvantages in terms of quality, device compatibility, and compression efficiency. The crucial stage of encoding that is accomplished with codecs is compression. To decrease the amount of data in the video file, codecs employ strategies including predictive coding, transform coding and entropy encoding. Finding the ideal mix between quality and compression is difficult, though. Therefore, Netflix is experimenting with new approaches such as neural networks to enhance the quality of compressed video and improve the video streaming experience for their customers.

AV1 video codec format

In the video streaming market, AV1 is a relatively new video codec format. Unlike its predecessors, AV1 is a royalty-free format, which means it is not restricted to a single corporation or technology. This has piqued the curiosity and adoption of big tech giants like Google, Amazon, Netflix, Microsoft, and Apple. These five technology titans have formed the Alliance for Open Media (AOMedia), an organization dedicated to producing open, royalty-free video codecs and other relevant technologies. AOMedia created AV1 to provide a next-generation video codec capable of delivering high-quality video at low bitrates, making it suitable for internet streaming.

AV1 Provided great advantages like:

  1. AV1 uses advanced compression techniques, such as variable block sizes, dynamic motion vector prediction, and improved entropy coding, to achieve higher compression efficiency

  2. AV1 also supports higher resolutions, greater color depth, and better HDR capabilities, making it a popular choice for streaming high-quality video content over the internet.

Despite its promising features, AV1 has faced challenges in terms of

  1. High encoding complexity

  2. High computational requirements

  3. Compatibility issues with legacy devices.

Deep Downscaler

Netflix has adopted a creative approach to addressing the AV1 codec's difficulties by developing a neural network-based solution to improve end-to-end video quality. This method focuses on building a deep learning model to improve video stream quality by reducing artifacts, noise, and other distortions that can arise during the encoding and decoding processes.

Using the power of neural networks Netflix's technique for improving end-to-end video quality is known as the "deep downscaler," and it offers many major benefits. It is possible to increase video quality and match it to Netflix content by applying a learning strategy for downscaling. One advantage of this technique is that it may be implemented as a drop-in replacement without requiring any changes to the Netflix encoding process or client device. our means that our approach will immediately benefit millions of devices that allow Netflix streaming. Furthermore, the neural network-based video processing block can evolve independently, be used for purposes other than video downscaling, and be combined with various codecs. Overall, the deep downscaler technique is a potential alternative for improving Netflix subscribers' video streaming experiences.

Deep Downscaler contains two blocks:

  1. Pre-processing block

  2. Resizing block

Pre-processing block: The pre-processing block is an essential component of Netflix's deep downscaler technique to improving video quality. Several steps are involved, including color space conversion from RGB to YcBcr, a standard color space used in video processing. This stage is critical because it isolates the luma (brightness) and chroma (color) information, allowing the network to handle them individually. The noise reduction block is also included in the pre-processing block, which helps to remove any unwanted noise from the incoming video signal, enhancing overall image quality. Overall, the pre-processing block is critical to the deep downscaler technique because it prepares the incoming video signal for subsequent processing by the resizing block.

Resizing block: The resizing block is another critical component of Netflix's deep downscaler technique to improving video quality. It is made up of a convolutional layer that extracts features from the input video stream. A deconvolutional layer uses the information extracted by the convolutional layer to generate the output at the appropriate resolution. The deconvolutional layer is in charge of enhancing the input signal's resolution, which is often lower than the desired output resolution. A deep neural network is used to train the resizing block, which learns to extract features from the input signal and generate the output signal with low loss.

Improvements over conventional codecs

The deep downscaler has been shown to improve quality across a wide range of standard video codecs and encoding setups in objective and subjective visual tests. For example, assuming a bicubic upscaler, the deep downscaler has shown an average VMAF Bjntegaard-Delta (BD) rate improvement of 5.4% over typical Lanczos downscaling for VP9 encoding. Furthermore, a 4.4% BD rate gain for VMAF-NEG has been measured. As shown in the example result below from one of Netflix's releases, the deep downscaler has delivered greater VMAF at similar bitrates or yielded equivalent VMAF scores at lower bitrates.

In addition to quantitative data and subjective tests, Netflix used A/B testing to assess the overall impact of the deep downscaler on streaming. The results revealed that the down scaler increased the Quality of Experience (QoE) for members while having no negative impact on streaming. A/B testing also confirmed that there were no playback issues on any device, indicating that the deep downscaler may be deployed for all Netflix-streaming devices without risk of playback issues or quality deterioration for members. This demonstrates that the deep downscaler is a dependable and efficient method for boosting video quality for Netflix subscribers.

Conclusion

To summarise, while traditional video compression technologies have been successful in lowering video file size, they frequently result in a loss of quality. Netflix's neural network-based deep downscaler, on the other hand, offers a viable answer to this challenge. The deep downscaler is able to learn and encode the most significant characteristics of a video while preserving its quality, resulting in a superior end-to-end viewing experience for Netflix users by utilising the power of deep learning. The findings of empirical measures, subjective visual tests, and A/B testing all show that the deep downscaler is effective. As the application of neural networks in video compression evolves, it will be interesting to see how this technology will be applied in other areas of video streaming and distribution.

Thank you for reading this blog! I hope you found it informative and helpful. Until next time, happy learning! Cheers!

References:

For your eyes only: improving Netflix video quality with neural networks

Bringing AV1 Streaming to Netflix Members’ TVs

How Instagram Stores BILLIONS of Videos

Did you find this article valuable?

Support Shreyas' Blog by becoming a sponsor. Any amount is appreciated!