libavcodec/bitpacked_dec.c · abd7da0af95c9cbf077a28bdf8b4223c49db9654 · Stefan Westerfeld / ffmpeg

avcodec/bitpacked_dec: optimize bitpacked_decode_yuv422p10 · b2c82b23

Devin Heitmueller authored May 05, 2023

Rework the code a bit to speed up the 10-bit bitpacked decoding
routine.  This is probably about as fast as I can get it without
switching to assembly language.

Demonstratable with:

./ffmpeg -f lavfi -i "smptehdbars=size=3840x2160" -c bitpacked -f image2 -frames:v 1 source.yuv
./ffmpeg -f bitpacked -pix_fmt yuv422p10le -s 3840x2160 -c:v bitpacked -i source.yuv -pix_fmt yuv422p10le out.yuv

On my development system, it went from 80ms for a 2160p frame
down to 20ms (i.e. a 4X speedup).  Good enough for now, I hope...

Comments from Marton:

Originally on my system better performance could be achieved by simply
switching to the cached bitstream reader, but for Devin it was slower than
his direct byte operations.

I changed the order of writing output from u/y/v/y to u/v/y/y, and that made
the code faster than the cached bitstream reader on my system as well.

TIMER measurement of the decode loop on Ryzen 5 3600 with command line:

./ffmpeg -stream_loop 256 -threads 1 -f bitpacked -pix_fmt yuv422p10le -s 3840x2160 -c:v bitpacked -i source.yuv -pix_fmt yuv422p10le -f null none -loglevel error

Before: 823204127 decicycles in YUV,     256 runs,      0 skips
After:  315070524 decicycles in YUV,     256 runs,      0 skips
Signed-off-by: Devin Heitmueller <dheitmueller@ltnglobal.com>
Signed-off-by: Marton Balint <cus@passwd.hu>

b2c82b23

bitpacked_dec.c 4.54 KB

Replace bitpacked_dec.c