• Rémi Denis-Courmont's avatar
    lavc/vc1dsp: R-V V vc1_unescape_buffer · d452db84
    Rémi Denis-Courmont authored
    Notes:
    - The loop is biased toward no unescaped bytes as that should be most common.
    - The input byte array is slid rather than the (8 times smaller) bit-mask,
      as RISC-V V does not provide a bit-mask (or bit-wise) slide instruction.
    - There are two comparisons with 0 per iteration, for the same reason.
    - In case of match, bytes are copied until the first match, and the loop is
      restarted after the escape byte. Vector compression (vcompress.vm) could
      discard all escape bytes but that is slower if escape bytes are rare.
    
    Further optimisations should be possible, e.g.:
    - processing 2 bytes fewer per iteration to get rid of a 2 slides,
    - taking a short cut if the input vector contains less than 2 zeroes.
    But this is a good starting point:
    
    T-Head C908:
    vc1dsp.vc1_unescape_buffer_c:      12749.5
    vc1dsp.vc1_unescape_buffer_rvv_i32: 6009.0
    
    SpacemiT X60:
    vc1dsp.vc1_unescape_buffer_c:      11038.0
    vc1dsp.vc1_unescape_buffer_rvv_i32: 2061.0
    d452db84
vc1dsp_rvv.S 6.91 KB