digital audio guide

A Possible AI Modern Video Codec

Estimated Read Time: 3 minute(s)
Common Topics: video, ai, based, codecs, netflix

Introduction

Modern Video codecs are all based on similar principles.   Recently, these have been complemented by AI techniques, such as super-resolution, to form hybrid codecs.   The current state of the art is one of transition to have one based only on AI eventually:

https://deeprender.ai/

Codecs using only AI are several years away.   For the foreseeable future, we will use conventional codecs supplemented with AI.

This insights article will start by looking at conventional codecs and gradually build up a hybrid one.

H264

The easiest way to understand modern codecs is to look at the most common one in use today and the one most other codecs are based on, H264:

https://www.youtube.com/watch?v=ZXXDXZfEcAQ&t=17s

EVC Base Line

Conventional codecs have moved on.   For our purposes, we will start with the EVC Baseline codec – a simple but effective codec:

https://thebroadcastknowledge.com/2021/02/18/video-mpeg-5-essential-video-coding-evc-standard/

Immediately, it can be improved by AI:

https://www.mdpi.com/1424-8220/24/4/1336/pdf?version=1708338117

Scalenet

The first use of AI is a system proposed by Samsung called Scalenet:

https://research.samsung.com/blog/VIDEO-SCALENET-VSN-TOWARDS-THE-NEXT-GENERATION-VIDEO-STREAMING-SERVICE

Please view the video link at the end and read the paper below that link.

Note that EVC Baseline has performance about the same as HEVC, but is royalty free.

Invertible Image Rescaling

Scalenet uses convolutional neural networks (variations of TAD-TAU)

https://openaccess.thecvf.com/content_ECCV_2018/papers/Heewon_Kim_Task-Aware_Image_Downscaling_ECCV_2018_paper.pdf

But things move on, and a new downscaling and restoration method called Invettable Image Rescaling is now the state of the art

https://arxiv.org/abs/2210.04188

This is particularly useful in converting colour to greyscale and encoding the colour information.   To understand how colour is encoded, the concept of the Bayer Filter is needed.

The Bayer Filter

For exactness, I will assume an 8k television system.

To understand how AI can help, we will start at the camera.   It may be thought that each pixel contains a red, blue and green pixel.

But that assumption is incorrect.   What is used is called the Bayer Filter

https://en.wikipedia.org/wiki/Bayer_filter

This means that an 8 K camera produces four 4 K streams. As we will see, this can be used to convert the output to a single 4 K greyscale stream.

Converting Bayer Filter Output To Greyscale

Invertible image rescaling can convert the four different streams into a single 4K grayscale stream with a close to non-observable degradation of about 40 dB. This stream could then be encoded as an EVC video.

As detailed in the Invertable Image Rescaling paper, this can be combined with a convolutional neural network (e.g. Scalenet) for the best image reconstruction before inverting back to a colour to create a hybrid AI-based system.

Shot Based Encoding

Netflix has developed a major advance in encoding that efficiently uses available internet speed called Shot Based Encoding:

https://hackaday.com/2020/09/16/decoding-the-netflix-announcement-explaining-optimized-shot-based-encoding-for-4k/

Using that alone, Netflix has reduced 4 K to as low as 2 MBS. It is very effective. I rarely have problems with 4 K content and internet speed using Netflix, but other video services like Prime have problems, even internet dropouts. Netflix handles that using sophisticated buffering algorithms.

Conclusion

The above presents some state-of-the-art AI enhancements in streaming video.

Changes are coming thick and fast.   Companies like Netflix will incorporate them into their streaming services.   An overview of further possibilities can be found here:

https://arxiv.org/abs/2101.06341

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply