Video formats comparison

Introduction

This study compares 4 differents video encoders, AOM AV1 libaom, Google VP9 libvpx, x264 and x265. We use five algorithms in order to compare each format:

Materials

Video set

The video set is comprised of 30 videos from objective-1-fast by Xiph. All videos are YCbCr 4:2:0 Y4M files.

Encoders

Metrics

Tools

Methods

Video conversion

Each Y4M videos is exported to 4:2:0 10 bits Y4M with FFMPEG:

ffmpeg -y -i [input] -pix_fmt yuv420p10le -strict -1 [output]

Image compression

All videos are compressed over a range of qualities for each codec:

aomenc --cpu-used=4 --tile-columns=4 --passes=2 --pass=1 --bit-depth=10 --input-bit-depth=10 --end-usage=q --cq-level=$q --fpf=[output].log -o [output] [input(Y4M_10bits)]

aomenc --cpu-used=4 --tile-columns=4 --passes=2 --pass=2 --bit-depth=10 --input-bit-depth=10 --end-usage=q --cq-level=$q --fpf=[output].log -o [output] [input(Y4M_10bits)]

vpxenc --tile-columns=4 --row-mt=1 --passes=2 --cpu-used=2 --bit-depth=10 --input-bit-depth=10 --profile=2 --end-usage=q --cq-level=$ -o [output] [input(Y4M_10bits)]

x264 --profile high10 --preset slower --input-depth=10 --output-depth=10 --crf $q -o [output] [input(Y4M_10bits)]

x265 --profile main10 --preset slower --input-depth=10 --output-depth=10 --crf $q -o [output] [input(Y4M_10bits)]

The Python script used to generate the compressed videos are available on the GIT repository.

Encoding and decoding speeds:

For each codec and videos, the encoding and decoding duration are mesured using Python timeit. It is then converted in frames per minute.

metrics

For each codec and videos, we apply the following metrics, Y-SSIM, RGB-SSIM, Y-MSSSIM, PSNR-HVS-M and VMAF, over videos samples of increasing quality. For VMAF, we use the trained model vmaf_v0.6.1.pkl given by Netflix.

For each sample, we first decode the compressed videos to 4:2:0 10bits Y4M then export to YUV format using FFMPEG (ffmpeg -y -i [input] -pix_fmt yuv420p10le -strict -1 [output]). Finally we apply the metrics over each sample, comparing it to the original video.

For each codec, we calculate the arithmetic mean of each metric over the entire set of videos, weighted by the number of pixels of the corresponding video, for the samples of increasing quality:

We also determine the average bits per pixel for each quality sample:

Results

Raw data

The following archives contain the raw data in csv format for objective-1-fast:

Compression speed at 1080p:

Encoding speed in function of bits per pixel

Metrics

For each comparison algorithms, we plot the quality in dB in function of the mean bits per pixel on a logarithmic scale. We can then visualize which codec gives the best quality at a given bit per pixel (top left is better).

Bits per pixel at equivalent quality according to VMAF

360p

Bits per pixel at equivalent quality according to VMAF

720p

Bits per pixel at equivalent quality according to VMAF

1080p

Bits per pixel at equivalent quality according to VMAF

Bits per pixel at equivalent quality according to Y-PSNR-HVS-M

360p

Bits per pixel at equivalent quality according to VMAF

720p

Bits per pixel at equivalent quality according to VMAF

1080p

Bits per pixel at equivalent quality according to VMAF

Bits per pixel at equivalent quality according to Y-MSSSIM

360p

Bits per pixel at equivalent quality according to VMAF

720p

Bits per pixel at equivalent quality according to VMAF

1080p

Bits per pixel at equivalent quality according to VMAF

Bits per pixel at equivalent quality according to Y-SSIM

360p

Bits per pixel at equivalent quality according to VMAF

720p

Bits per pixel at equivalent quality according to VMAF

1080p

Bits per pixel at equivalent quality according to VMAF

Bits per pixel at equivalent quality according to RGB-SSIM

360p

Bits per pixel at equivalent quality according to VMAF

720p

Bits per pixel at equivalent quality according to VMAF

1080p

Bits per pixel at equivalent quality according to VMAF

CRF equivalences and bitrate reduction at 1080p

In the following table, we try to find equivalences between the CRF values of the various encoders according to a selected metric. We then calculate the expected bitrate reduction for this comparable CRF value.

For example, a x264 encode at CRF 20 would give a PSNR-HVS-M of 42.80. To obtain the same quality with VP9, one should look at the intersection of CRF 20 and "libvpx crf according to psnr-hvs-m", which gives an equivent CRF of 30.60. The column "libvpx % reduction according to psnr-hvs-m" then gives the expected reduction of bitrate, -21.294%.

x264 crf x264 bpp x264 y-ssim x264 rgb-ssim x264 ms-ssim x264 psnr-hvs-m x264 vmaf
16 0.34558 19.90 18.92 25.30 46.84 98.52
17 0.28786 19.31 18.41 24.60 45.79 98.24
18 0.23879 18.76 17.92 23.92 44.77 97.90
19 0.19742 18.22 17.46 23.27 43.77 97.47
20 0.16285 17.71 17.01 22.64 42.80 96.93
21 0.13424 17.21 16.58 22.02 41.85 96.26
22 0.11082 16.72 16.16 21.42 40.92 95.44
23 0.09185 16.25 15.76 20.82 40.00 94.45
24 0.07667 15.79 15.36 20.23 39.09 93.28
x264 crf x264 bpp libvpx crf according to y-ssim libvpx bpp according to y-ssim libvpx % reduction according to y-ssim libvpx crf according to rgb-ssim libvpx bpp according to rgb-ssim libvpx % reduction according to rgb-ssim libvpx crf according to ms-ssim libvpx bpp according to ms-ssim libvpx % reduction according to ms-ssim libvpx crf according to psnr-hvs-m libvpx bpp according to psnr-hvs-m libvpx % reduction according to psnr-hvs-m libvpx crf according to vmaf libvpx bpp according to vmaf libvpx % reduction according to vmaf
16 0.34558 17.24 0.35864 3.781 19.86 0.28608 -17.22 17.72 0.34382 -0.5091 19.74 0.28910 -16.342 21.09 0.25810 -25.313
17 0.28786 20.23 0.27743 -3.624 23.01 0.22063 -23.36 20.56 0.26966 -6.3231 22.46 0.23068 -19.863 22.86 0.22338 -22.399
18 0.23879 23.34 0.21501 -9.959 26.16 0.17333 -27.41 23.50 0.21215 -11.1558 25.20 0.18615 -22.044 24.87 0.19078 -20.104
19 0.19742 26.46 0.16961 -14.083 29.24 0.13995 -29.11 26.47 0.16950 -14.1388 27.92 0.15292 -22.539 27.20 0.16086 -18.518
20 0.16285 29.52 0.13732 -15.674 32.20 0.11620 -28.65 29.40 0.13848 -14.9610 30.60 0.12817 -21.294 29.81 0.13480 -17.222
21 0.13424 32.50 0.11411 -14.994 35.02 0.09860 -26.55 32.26 0.11575 -13.7742 33.22 0.10938 -18.520 32.62 0.11328 -15.612
22 0.11082 35.36 0.09672 -12.720 37.70 0.08473 -23.54 35.04 0.09850 -11.1184 35.76 0.09454 -14.691 35.48 0.09604 -13.335
23 0.09185 38.09 0.08283 -9.822 40.23 0.07305 -20.47 37.72 0.08464 -7.8535 38.23 0.08217 -10.542 38.21 0.08225 -10.452
24 0.07667 40.71 0.07096 -7.441 42.63 0.06270 -18.21 40.29 0.07278 -5.0651 40.63 0.07128 -7.030 40.66 0.07115 -7.198
x264 crf x264 bpp x265 crf according to y-ssim x265 bpp according to y-ssim x265 % reduction according to y-ssim x265 crf according to rgb-ssim x265 bpp according to rgb-ssim x265 % reduction according to rgb-ssim x265 crf according to ms-ssim x265 bpp according to ms-ssim x265 % reduction according to ms-ssim x265 crf according to psnr-hvs-m x265 bpp according to psnr-hvs-m x265 % reduction according to psnr-hvs-m x265 crf according to vmaf x265 bpp according to vmaf x265 % reduction according to vmaf
16 0.34558 16.06 0.28855 -16.50 16.05 0.28882 -16.42 15.82 0.30066 -13.00 15.86 0.29877 -13.55 15.77 0.30335 -12.2191
17 0.28786 17.19 0.23623 -17.94 17.14 0.23831 -17.21 16.95 0.24621 -14.47 16.99 0.24444 -15.08 16.39 0.27215 -5.4580
18 0.23879 18.31 0.19272 -19.29 18.21 0.19633 -17.78 18.09 0.20075 -15.93 18.13 0.19909 -16.63 17.12 0.23895 0.0684
19 0.19742 19.43 0.15701 -20.47 19.27 0.16172 -18.08 19.22 0.16332 -17.27 19.27 0.16174 -18.07 18.00 0.20384 3.2567
20 0.16285 20.54 0.12802 -21.39 20.31 0.13337 -18.10 20.34 0.13284 -18.43 20.40 0.13134 -19.35 19.05 0.16826 3.3227
21 0.13424 21.63 0.10466 -22.03 21.35 0.11027 -17.86 21.45 0.10826 -19.36 21.52 0.10685 -20.40 20.27 0.13441 0.1243
22 0.11082 22.71 0.08599 -22.41 22.36 0.09154 -17.40 22.54 0.08861 -20.05 22.63 0.08732 -21.21 21.64 0.10453 -5.6716
23 0.09185 23.78 0.07114 -22.55 23.37 0.07642 -16.80 23.64 0.07301 -20.51 23.73 0.07185 -21.77 23.11 0.08015 -12.7386
24 0.07667 24.85 0.05939 -22.54 24.38 0.06426 -16.18 24.72 0.06071 -20.82 24.82 0.05970 -22.14 24.62 0.06167 -19.5585
x264 crf x264 bpp libaom crf according to y-ssim libaom bpp according to y-ssim libaom % reduction according to y-ssim libaom crf according to rgb-ssim libaom bpp according to rgb-ssim libaom % reduction according to rgb-ssim libaom crf according to ms-ssim libaom bpp according to ms-ssim libaom % reduction according to ms-ssim libaom crf according to psnr-hvs-m libaom bpp according to psnr-hvs-m libaom % reduction according to psnr-hvs-m libaom crf according to vmaf libaom bpp according to vmaf libaom % reduction according to vmaf
16 0.34558 20.06 0.31722 -8.206 23.42 0.24274 -29.76 20.25 0.31217 -9.666 22.47 0.26128 -24.39 23.31 0.24488 -29.14
17 0.28786 23.25 0.24592 -14.571 26.80 0.18969 -34.10 23.31 0.24484 -14.947 25.41 0.20926 -27.31 25.34 0.21043 -26.90
18 0.23879 26.50 0.19369 -18.886 30.10 0.15265 -36.07 26.43 0.19469 -18.466 28.36 0.17069 -28.52 27.62 0.17935 -24.89
19 0.19742 29.71 0.15640 -20.776 33.27 0.12633 -36.01 29.53 0.15824 -19.842 31.27 0.14212 -28.01 30.19 0.15180 -23.11
20 0.16285 32.84 0.12947 -20.497 36.28 0.10661 -34.53 32.57 0.13154 -19.226 34.11 0.12039 -26.07 33.00 0.12827 -21.23
21 0.13424 35.87 0.10909 -18.736 39.12 0.09074 -32.41 35.53 0.11118 -17.182 36.89 0.10302 -23.26 35.93 0.10871 -19.02
22 0.11082 38.77 0.09259 -16.450 41.78 0.07713 -30.40 38.38 0.09466 -14.578 39.58 0.08828 -20.34 38.77 0.09255 -16.49
23 0.09185 41.54 0.07832 -14.731 44.28 0.06501 -29.22 41.12 0.08040 -12.465 42.20 0.07508 -18.26 41.35 0.07926 -13.71
24 0.07667 44.18 0.06546 -14.613 46.62 0.05413 -29.39 43.75 0.06755 -11.897 44.73 0.06290 -17.95 43.54 0.06852 -10.62