libavcodec/riscv/vc1dsp_rvv.S · d452db8410256550d500864ef7ba5917d6afc864 · Stefan Westerfeld / ffmpeg

lavc/vc1dsp: R-V V vc1_unescape_buffer · d452db84

Rémi Denis-Courmont authored May 12, 2024

Notes:
- The loop is biased toward no unescaped bytes as that should be most common.
- The input byte array is slid rather than the (8 times smaller) bit-mask,
  as RISC-V V does not provide a bit-mask (or bit-wise) slide instruction.
- There are two comparisons with 0 per iteration, for the same reason.
- In case of match, bytes are copied until the first match, and the loop is
  restarted after the escape byte. Vector compression (vcompress.vm) could
  discard all escape bytes but that is slower if escape bytes are rare.

Further optimisations should be possible, e.g.:
- processing 2 bytes fewer per iteration to get rid of a 2 slides,
- taking a short cut if the input vector contains less than 2 zeroes.
But this is a good starting point:

T-Head C908:
vc1dsp.vc1_unescape_buffer_c:      12749.5
vc1dsp.vc1_unescape_buffer_rvv_i32: 6009.0

SpacemiT X60:
vc1dsp.vc1_unescape_buffer_c:      11038.0
vc1dsp.vc1_unescape_buffer_rvv_i32: 2061.0

d452db84

vc1dsp_rvv.S 6.91 KB

Replace vc1dsp_rvv.S