• Mark Reid's avatar
    avfilter/vf_lut3d: add x86-optimized tetrahedral interpolation · 716b3967
    Mark Reid authored
    I spotted an interesting pattern that I didn't see before that leads to the implementation being faster.
    The bit shifting table I was using before is no longer needed, and was able to remove quite a few lines. 
    I also add use of FMA on the AVX2 version.
    
    f32 1920x1080 1 thread with prelut
    c impl
    1434012700 UNITS in lut3d->interp,       1 runs,      0 skips
    1434035335 UNITS in lut3d->interp,       2 runs,      0 skips
    1423615347 UNITS in lut3d->interp,       4 runs,      0 skips
    1426268863 UNITS in lut3d->interp,       8 runs,      0 skips
    
    sse2
    905484420 UNITS in lut3d->interp,       1 runs,      0 skips
    905659010 UNITS in lut3d->interp,       2 runs,      0 skips
    915167140 UNITS in lut3d->interp,       4 runs,      0 skips
    915834222 UNITS in lut3d->interp,       8 runs,      0 skips
    
    avx
    574794860 UNITS in lut3d->interp,       1 runs,      0 skips
    581035090 UNITS in lut3d->interp,       2 runs,      0 skips
    584116720 UNITS in lut3d->interp,       4 runs,      0 skips
    581460290 UNITS in lut3d->interp,       8 runs,      0 skips
    
    avx2
    301698880 UNITS in lut3d->interp,       1 runs,      0 skips
    301982880 UNITS in lut3d->interp,       2 runs,      0 skips
    306962430 UNITS in lut3d->interp,       4 runs,      0 skips
    305472025 UNITS in lut3d->interp,       8 runs,      0 skips
    
    gbrap16 1920x1080 1 thread with prelut
    c impl
    1480894840 UNITS in lut3d->interp,       1 runs,      0 skips
    1502922990 UNITS in lut3d->interp,       2 runs,      0 skips
    1496114307 UNITS in lut3d->interp,       4 runs,      0 skips
    1492554551 UNITS in lut3d->interp,       8 runs,      0 skips
    
    sse2
    980777180 UNITS in lut3d->interp,       1 runs,      0 skips
    986121520 UNITS in lut3d->interp,       2 runs,      0 skips
    986489840 UNITS in lut3d->interp,       4 runs,      0 skips
    998832248 UNITS in lut3d->interp,       8 runs,      0 skips
    
    avx
    622212360 UNITS in lut3d->interp,       1 runs,      0 skips
    622981160 UNITS in lut3d->interp,       2 runs,      0 skips
    645396315 UNITS in lut3d->interp,       4 runs,      0 skips
    641057075 UNITS in lut3d->interp,       8 runs,      0 skips
    
    avx2
    321336400 UNITS in lut3d->interp,       1 runs,      0 skips
    321268920 UNITS in lut3d->interp,       2 runs,      0 skips
    323459895 UNITS in lut3d->interp,       4 runs,      0 skips
    324949967 UNITS in lut3d->interp,       8 runs,      0 skips
    716b3967
Name
Last commit
Last update
compat Loading commit data...
doc Loading commit data...
ffbuild Loading commit data...
fftools Loading commit data...
libavcodec Loading commit data...
libavdevice Loading commit data...
libavfilter Loading commit data...
libavformat Loading commit data...
libavutil Loading commit data...
libpostproc Loading commit data...
libswresample Loading commit data...
libswscale Loading commit data...
presets Loading commit data...
tests Loading commit data...
tools Loading commit data...
.gitattributes Loading commit data...
.gitignore Loading commit data...
.mailmap Loading commit data...
.travis.yml Loading commit data...
CONTRIBUTING.md Loading commit data...
COPYING.GPLv2 Loading commit data...
COPYING.GPLv3 Loading commit data...
COPYING.LGPLv2.1 Loading commit data...
COPYING.LGPLv3 Loading commit data...
CREDITS Loading commit data...
Changelog Loading commit data...
INSTALL.md Loading commit data...
LICENSE.md Loading commit data...
MAINTAINERS Loading commit data...
Makefile Loading commit data...
README.md Loading commit data...
RELEASE Loading commit data...
configure Loading commit data...