A Possible AI Modern Video Codec
Table of Contents
Introduction
Modern Video codecs are all based on similar principles. Recently, these have been complemented by AI techniques, such as super-resolution, to form hybrid codecs. The current state of the art is one of transition to have one based only on AI eventually:
Codecs using only AI are several years away. For the foreseeable future, we will use conventional codecs supplemented with AI.
This insights article will start by looking at conventional codecs and gradually build up a hybrid one.
H264
The easiest way to understand modern codecs is to look at the most common one in use today and the one most other codecs are based on, H264:
https://www.youtube.com/watch?v=ZXXDXZfEcAQ&t=17s
EVC Base Line
Conventional codecs have moved on. For our purposes, we will start with the EVC Baseline codec – a simple but effective codec:
https://thebroadcastknowledge.com/2021/02/18/video-mpeg-5-essential-video-coding-evc-standard/
Immediately, it can be improved by AI:
https://www.mdpi.com/1424-8220/24/4/1336/pdf?version=1708338117
Scalenet
The first use of AI is a system proposed by Samsung called Scalenet:
Please view the video link at the end and read the paper below that link.
Note that EVC Baseline has performance about the same as HEVC, but is royalty free.
Invertible Image Rescaling
Scalenet uses convolutional neural networks (variations of TAD-TAU)
But things move on, and a new downscaling and restoration method called Invettable Image Rescaling is now the state of the art
https://arxiv.org/abs/2210.04188
This is particularly useful in converting colour to greyscale and encoding the colour information. To understand how colour is encoded, the concept of the Bayer Filter is needed.
The Bayer Filter
For exactness, I will assume an 8k television system.
To understand how AI can help, we will start at the camera. It may be thought that each pixel contains a red, blue and green pixel.
But that assumption is incorrect. What is used is called the Bayer Filter
https://en.wikipedia.org/wiki/Bayer_filter
This means that an 8 K camera produces four 4 K streams. As we will see, this can be used to convert the output to a single 4 K greyscale stream.
Converting Bayer Filter Output To Greyscale
Invertible image rescaling can convert the four different streams into a single 4K grayscale stream with a close to non-observable degradation of about 40 dB. This stream could then be encoded as an EVC video.
As detailed in the Invertable Image Rescaling paper, this can be combined with a convolutional neural network (e.g. Scalenet) for the best image reconstruction before inverting back to a colour to create a hybrid AI-based system.
Shot Based Encoding
Netflix has developed a major advance in encoding that efficiently uses available internet speed called Shot Based Encoding:
Using that alone, Netflix has reduced 4 K to as low as 2 MBS. It is very effective. I rarely have problems with 4 K content and internet speed using Netflix, but other video services like Prime have problems, even internet dropouts. Netflix handles that using sophisticated buffering algorithms.
Conclusion
The above presents some state-of-the-art AI enhancements in streaming video.
Changes are coming thick and fast. Companies like Netflix will incorporate them into their streaming services. An overview of further possibilities can be found here:
My favourite interest is exactly how can we view the world so what science tells us is intuitive.
Leave a Reply
Want to join the discussion?Feel free to contribute!