• Martin Storsjö's avatar
    aarch64: hevc: Produce epel_bi_hv functions for both neon and i8mm · de23b384
    Martin Storsjö authored
    In addition to just templating, this contains one change to
    ff_hevc_put_hevc_epel_bi_hv32_8, by setting the w6 register
    which ff_hevc_put_hevc_epel_h32_8_neon requires.
    
    AWS Graviton 3:
    put_hevc_epel_bi_hv4_8_c: 176.5
    put_hevc_epel_bi_hv4_8_neon: 62.0
    put_hevc_epel_bi_hv4_8_i8mm: 58.0
    put_hevc_epel_bi_hv6_8_c: 343.7
    put_hevc_epel_bi_hv6_8_neon: 109.7
    put_hevc_epel_bi_hv6_8_i8mm: 105.7
    put_hevc_epel_bi_hv8_8_c: 536.0
    put_hevc_epel_bi_hv8_8_neon: 112.7
    put_hevc_epel_bi_hv8_8_i8mm: 111.7
    put_hevc_epel_bi_hv12_8_c: 1107.7
    put_hevc_epel_bi_hv12_8_neon: 254.7
    put_hevc_epel_bi_hv12_8_i8mm: 239.0
    put_hevc_epel_bi_hv16_8_c: 1927.7
    put_hevc_epel_bi_hv16_8_neon: 356.2
    put_hevc_epel_bi_hv16_8_i8mm: 334.2
    put_hevc_epel_bi_hv24_8_c: 4195.2
    put_hevc_epel_bi_hv24_8_neon: 736.7
    put_hevc_epel_bi_hv24_8_i8mm: 715.5
    put_hevc_epel_bi_hv32_8_c: 7280.5
    put_hevc_epel_bi_hv32_8_neon: 1287.7
    put_hevc_epel_bi_hv32_8_i8mm: 1162.2
    put_hevc_epel_bi_hv48_8_c: 16857.7
    put_hevc_epel_bi_hv48_8_neon: 2836.2
    put_hevc_epel_bi_hv48_8_i8mm: 2908.5
    put_hevc_epel_bi_hv64_8_c: 29248.2
    put_hevc_epel_bi_hv64_8_neon: 5051.7
    put_hevc_epel_bi_hv64_8_i8mm: 4491.5
    Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
    de23b384
hevcdsp_epel_neon.S 175 KB