1. 04 Nov, 2022 5 commits
  2. 03 Nov, 2022 14 commits
  3. 02 Nov, 2022 2 commits
  4. 01 Nov, 2022 4 commits
    • Timo Rothenpieler's avatar
      12733c0c
    • Hubert Mazur's avatar
      sw_scale: Add specializations for hscale 16 to 19 · 2537fdc5
      Hubert Mazur authored
      Provide arm64 neon optimized implementations for hscale16To19 with
      filter sizes 4, 8 and X4.
      
      The tests and benchmarks run on AWS Graviton 2 instances.
      The results from a checkasm tool are shown below.
      
      hscale_16_to_19__fs_4_dstW_512_c: 6216.0
      hscale_16_to_19__fs_4_dstW_512_neon: 2257.0
      hscale_16_to_19__fs_8_dstW_512_c: 10417.7
      hscale_16_to_19__fs_8_dstW_512_neon: 3112.5
      hscale_16_to_19__fs_12_dstW_512_c: 14890.5
      hscale_16_to_19__fs_12_dstW_512_neon: 3899.0
      hscale_16_to_19__fs_16_dstW_512_c: 19006.5
      hscale_16_to_19__fs_16_dstW_512_neon: 5341.2
      hscale_16_to_19__fs_32_dstW_512_c: 36629.5
      hscale_16_to_19__fs_32_dstW_512_neon: 9502.7
      hscale_16_to_19__fs_40_dstW_512_c: 45477.5
      hscale_16_to_19__fs_40_dstW_512_neon: 11552.0
      
      (Note, the checkasm tests for these functions haven't been
      merged since they fail on x86.)
      Signed-off-by: 's avatarHubert Mazur <hum@semihalf.com>
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      2537fdc5
    • Hubert Mazur's avatar
      sw_scale: Add specializations for hscale 16 to 15 · 9ccf8c5b
      Hubert Mazur authored
      Add arm64 neon implementations for hscale 16 to 15 with filter
      sizes 4, 8 and X4.
      
      The tests and benchmarks run on AWS Graviton 2 instances.
      The results from a checkasm tool are shown below.
      
      hscale_16_to_15__fs_4_dstW_512_c: 6703.5
      hscale_16_to_15__fs_4_dstW_512_neon: 2298.0
      hscale_16_to_15__fs_8_dstW_512_c: 10983.0
      hscale_16_to_15__fs_8_dstW_512_neon: 3216.5
      hscale_16_to_15__fs_12_dstW_512_c: 15526.0
      hscale_16_to_15__fs_12_dstW_512_neon: 3993.0
      hscale_16_to_15__fs_16_dstW_512_c: 20183.5
      hscale_16_to_15__fs_16_dstW_512_neon: 5369.7
      hscale_16_to_15__fs_32_dstW_512_c: 39315.2
      hscale_16_to_15__fs_32_dstW_512_neon: 9511.2
      hscale_16_to_15__fs_40_dstW_512_c: 48995.7
      hscale_16_to_15__fs_40_dstW_512_neon: 11570.0
      
      (Note, the checkasm tests for these functions haven't been
      merged since they fail on x86.)
      Signed-off-by: 's avatarHubert Mazur <hum@semihalf.com>
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      9ccf8c5b
    • Hubert Mazur's avatar
      sw_scale: Add specializations for hscale 8 to 19 · 1e9cfa5b
      Hubert Mazur authored
      Add arm64 neon implementations for hscale 8 to 19 with filter
      sizes 4, 4X and 8. Both implementations are based on very similar ones
      dedicated to hscale 8 to 15. The major changes refer to saving
      the data - instead of writing the result as int16_t it is done
      with int32_t.
      
      These functions are heavily inspired on patches provided by J. Swinney
      and M. Storsjö for hscale8to15 which were slightly adapted for
      hscale8to19.
      
      The tests and benchmarks run on AWS Graviton 2 instances. The results
      from a checkasm tool shown below.
      
      hscale_8_to_19__fs_4_dstW_512_c: 5663.2
      hscale_8_to_19__fs_4_dstW_512_neon: 1259.7
      hscale_8_to_19__fs_8_dstW_512_c: 9306.0
      hscale_8_to_19__fs_8_dstW_512_neon: 2020.2
      hscale_8_to_19__fs_12_dstW_512_c: 12932.7
      hscale_8_to_19__fs_12_dstW_512_neon: 2462.5
      hscale_8_to_19__fs_16_dstW_512_c: 16844.2
      hscale_8_to_19__fs_16_dstW_512_neon: 4671.2
      hscale_8_to_19__fs_32_dstW_512_c: 32803.7
      hscale_8_to_19__fs_32_dstW_512_neon: 5474.2
      hscale_8_to_19__fs_40_dstW_512_c: 40948.0
      hscale_8_to_19__fs_40_dstW_512_neon: 6669.7
      Signed-off-by: 's avatarHubert Mazur <hum@semihalf.com>
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      1e9cfa5b
  5. 31 Oct, 2022 12 commits
  6. 30 Oct, 2022 3 commits