• Rémi Denis-Courmont's avatar
    lavc/bswapdsp: RISC-V B bswap_buf · f0ef11ea
    Rémi Denis-Courmont authored
    Simply taking the Zbb REV8 instruction into use in a simple loop gives
    some significant savings:
    
    bswap_buf_c: 1081.0
    bswap_buf_rvb_b: 771.0
    
    But we can also use the 64-bit REV8 as a pseudo-SIMD instruction with
    just one additional shift, and one fewer load, effectively doubling the
    bandwidth. Consequently, this patch is useful even if the compile-time
    target has Zbb enabled for C code:
    
    bswap_buf_c: 1081.0
    bswap_buf_rvb_b: 341.0  (this patch)
    
    On the other hand, this approach fails miserably for bswap16_buf as the
    ratio of shifts and stores becomes unfavorable compared to naïve C:
    
    bswap16_buf_c: 1542.0
    bswap16_buf_rvb_b: 1803.7
    
    Unrolling to process 128 bits (4 samples) at a time actually worsens
    performance ever so slightly:
    
    bswap_buf_c: 1081.0
    bswap_buf_rvb_b: 408.5
    f0ef11ea
bswapdsp.c 1.73 KB