Printer Friendly

Quality Metrics Up and Down the Encoding Ladder.

When encoding an adaptive bitrate ladder, oftentimes you have to compare videos with different resolutions, which raises multiple issues. For example, when measuring peak signal-to-noise ratio (PSNR) or video multimethod assessment fusion (VMAF) to compare 640x360 video against an 854x480 video, what resolution do you compare them at? And how do you interpret the PSNR or VMAF scoring, and which metric is best? In this column, I'll tackle all of these issues.

Regarding the first issue, there's a theoretically correct answer, and then there's how it's generally done, and they don't always correspond. The theoretically correct answer is to compare at the resolution at which the video will be viewed. For example, if you knew for certain that the video was going to be watched in a 480p window, you should scale the source and output files to 480p as needed and run your comparisons there. However, few publishers have that degree of certainty, so most scale the encoded files up to the resolution of the source video and compare there. This certainly makes sense for over-the-top (OTT) providers whose videos are almost always watched at full screen, and is a nice compromise position for other publishers.

Some programs handle this scaling behind the scenes; for most others, you have to scale in FFmpeg, which is a royal pain from a time and disk-space perspective. My one tip here is to convert your encoded files to the Y4M container format, rather than YUV, because the Y4M header contains resolution, frame rate, and format information that simplifies comparisons in your quality control tool. If you use the YUV container format, you'll have to insert resolution, frame rate, or format data into your command line or input it into the program itself, which can be time-consuming.

The second question is how to interpret the scores once you have them. If you're comparing cross-resolution files to the source, understand that scores will drop at lower resolutions because the smaller files contain more scaling artifacts and loss of detail. This means files encoded at the source resolution will have the highest scores, with lower resolutions scoring increasingly lower.

For example, in another article I wrote for this issue on per-title encoding, I compared technologies using an encoding ladder that started at 1080p and dropped to 180p. The typical PSNR scores were 45-50 dB for the 1080p rung, and dropped to around 30 dB for the lowest rung. That's not a lot of range. The rule of thumb for PSNR is that quality above 45 dB is typically not perceivable by the viewer, while scores below 35 typically presage visible artifacts. But that's only for the 1080p rung; the 180p rung will never get close to 45 dB, although the files might look good at 32 dB. So you can't predict how a human would perceive a 360p file with a PSNR score of 38 dB, although when you're comparing cross-resolution results, higher is always better.

What's great about VMAF is that it was designed for this type of cross-resolution analysis. Specifically, a score of 100 is mapped to a 1080p file encoded at a constant rate factor (CRF) of 22, while a score of 20 is mapped to a file encoded at 240p at a CRF value of 28. In the same per-title analysis, typical 1080p scores were in the mid-to upper 90s, while the 180p files often scored in the single digits.

This range made VMAF scores much easier to interpret than PSNR, but you still can't predict how a viewer will perceive the quality of a clip in the middle, say a 480p clip with a VMAF score of 42. However, you do know that six VMAF points equals one just-noticeable difference (JND). Technically, this means that 75% of viewers would notice a six-point swing, while closer to 90% would notice a 12-point, two-JND swing.

The ability to identify a JND is exceptionally useful to a range of encoding decisions, from configuring your encoding ladder to choosing an encoder or a codec. If you haven't already started working with VMAF, it's time to try it.

By Jan Ozer

Jan Ozer (jan@streaminglearningcenter.com) is a streaming media producer and consultant, a frequent contributor to industry magazines and websites on streaming-related topics, and the author of Video Encoding by the Numbers. He blogs frequently at streaminglearningcenter.com.

Comments? Email us at letters@streamingmedia.com, or check the masthead for other ways to contact us.
COPYRIGHT 2017 Information Today, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2017 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:The Producer's View
Author:Ozer, Jan
Publication:Streaming Media
Date:Oct 1, 2017
Words:741
Previous Article:Get Ready for Autonomous Streaming.
Next Article:The 2017 Streaming Media 100: 100 Companies Steering, the Online Video Industry.

Terms of use | Privacy policy | Copyright © 2020 Farlex, Inc. | Feedback | For webmasters