-
Rémi Denis-Courmont authored
In this case, the inner loop computing the scalar product can be reduced to just one multiplication and one sum even with 128-bit vectors. The result is a lot simpler, but also brings more modest performance gains: flac_lpc_16_13_c: 15241.0 flac_lpc_16_13_rvv_i32: 11230.0 flac_lpc_16_16_c: 17884.0 flac_lpc_16_16_rvv_i32: 12125.7 flac_lpc_16_29_c: 27847.7 flac_lpc_16_29_rvv_i32: 10494.0 flac_lpc_16_32_c: 30051.5 flac_lpc_16_32_rvv_i32: 10355.0
ca664f22