• Hubert Mazur's avatar
    sw_scale: Add specializations for hscale 16 to 19 · 2537fdc5
    Hubert Mazur authored
    Provide arm64 neon optimized implementations for hscale16To19 with
    filter sizes 4, 8 and X4.
    
    The tests and benchmarks run on AWS Graviton 2 instances.
    The results from a checkasm tool are shown below.
    
    hscale_16_to_19__fs_4_dstW_512_c: 6216.0
    hscale_16_to_19__fs_4_dstW_512_neon: 2257.0
    hscale_16_to_19__fs_8_dstW_512_c: 10417.7
    hscale_16_to_19__fs_8_dstW_512_neon: 3112.5
    hscale_16_to_19__fs_12_dstW_512_c: 14890.5
    hscale_16_to_19__fs_12_dstW_512_neon: 3899.0
    hscale_16_to_19__fs_16_dstW_512_c: 19006.5
    hscale_16_to_19__fs_16_dstW_512_neon: 5341.2
    hscale_16_to_19__fs_32_dstW_512_c: 36629.5
    hscale_16_to_19__fs_32_dstW_512_neon: 9502.7
    hscale_16_to_19__fs_40_dstW_512_c: 45477.5
    hscale_16_to_19__fs_40_dstW_512_neon: 11552.0
    
    (Note, the checkasm tests for these functions haven't been
    merged since they fail on x86.)
    Signed-off-by: 's avatarHubert Mazur <hum@semihalf.com>
    Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
    2537fdc5
swscale.c 9.52 KB