Artificial Intelligence in Video

  • Thread starter bhobba
  • Start date
  • Tags
    Neural
In summary, Artificial Intelligence (AI) in video refers to the use of advanced algorithms and computer systems to analyze, understand, and manipulate video data. This technology allows for the automation of tasks such as video editing, object recognition, and scene detection, making it easier and more efficient for video creators to produce high-quality content. AI can also be used to personalize video recommendations and improve video search and discovery. With the continued development of AI, it is expected to have a significant impact on the video industry, enhancing the viewing experience for audiences and streamlining production processes for creators.
  • #1
10,824
3,690
Behind the scenes, artificial intelligence usually makes use of what is known as a Neural Network:



In image applications, an implementation called a Convolutional Neural Network is often used:



In particular, for image super-resolution a General Adversarial Network or GAN is often used:



These form the basis of modern super-resolution:



For those who are interested in the details, see:
https://arxiv.org/abs/2204.13620

But things move on. Someone thought of using a CNN to down-scale the image first, then using super-resolution to recover the original image. One example is TAD-TAU:
https://openaccess.thecvf.com/conte...k-Aware_Image_Downscaling_ECCV_2018_paper.pdf

This is an example of an important AI concept - the Autoencoder:



Again, things move on, and it has been improved to not only be simpler and give improved performance but also allow down-scaling and super-resolution by arbitrary amounts, as well as encoding the colour in a resultant black and white image:
https://arxiv.org/pdf/2201.12576

So far, super-resolution has been done using lower-resolution images, but can also be done using a sequence of images from a video:
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4088133

It was mentioned to quantify how close a super-resolution image is to the original as perceived by a human being, and SSIM was invented. However, further work has been done on this, and a new measure, invented and used a lot by NETFLIX, has largely replaced it, called VMAF:
https://visionular.ai/vmaf-ssim-psnr-quality-metrics/

Image super-resolution is one of many proposals for reducing the bit rate of high-resolution images. ISIZE (recently acquired by SONY) preprocesses an image to make it more efficient to encode, yet still has a high VMAF:
https://discovery.ucl.ac.uk/10152967/1/SMPTE_v9_RPS.pdf

It produces substantial reductions in the bit rate of 8K video:
https://8kassociation.com/industry-info/8k-news/pre-encoding-8k-with-isize-bitsave/

A lot of ideas and concepts have been introduced in this post. If the reader has not seen them before, like anything new, it may take a while to get up to speed. However, they form the basis of my proposed method of an all-AI video codec,

My next post will be an overview of current video codecs, including EVC baseline, which forms the basis of the AI codec.

Thanks
Bill
 
Last edited:
  • Like
Likes russ_watters, FactChecker and jedishrfu
Computer science news on Phys.org
  • #2
very interesting
 
  • Like
Likes bhobba
Back
Top