libavcodec/bswapdsp.h · e6baf4f3841fe07d639e5b0fedefd06b9f994e6b · Stefan Westerfeld / ffmpeg

lavc/bswapdsp: RISC-V B bswap_buf · f0ef11ea

Rémi Denis-Courmont authored Oct 02, 2022

Simply taking the Zbb REV8 instruction into use in a simple loop gives
some significant savings:

bswap_buf_c: 1081.0
bswap_buf_rvb_b: 771.0

But we can also use the 64-bit REV8 as a pseudo-SIMD instruction with
just one additional shift, and one fewer load, effectively doubling the
bandwidth. Consequently, this patch is useful even if the compile-time
target has Zbb enabled for C code:

bswap_buf_c: 1081.0
bswap_buf_rvb_b: 341.0  (this patch)

On the other hand, this approach fails miserably for bswap16_buf as the
ratio of shifts and stores becomes unfavorable compared to naïve C:

bswap16_buf_c: 1542.0
bswap16_buf_rvb_b: 1803.7

Unrolling to process 128 bits (4 samples) at a time actually worsens
performance ever so slightly:

bswap_buf_c: 1081.0
bswap_buf_rvb_b: 408.5

f0ef11ea

bswapdsp.h 1.17 KB

Replace bswapdsp.h