• Rémi Denis-Courmont's avatar
    lavc/opusdsp: rewrite R-V V postfilter · adc87a5f
    Rémi Denis-Courmont authored
    This uses a more traditional approach allowing up processing of up to
    period minus two elements per iteration. This also allows the algorithm
    to work for all and any vector length.
    
    As the T-Head C908 device under test can load 16 elements loop, there is
    unsurprisingly a little performance drop when the period is minimal and
    the parallelism is capped at 13 elements:
    
    Before:
    postfilter_15_c:         21222.2
    postfilter_15_rvv_f32:   22007.7
    postfilter_512_c:        20189.7
    postfilter_512_rvv_f32:  22004.2
    postfilter_1022_c:       20189.7
    postfilter_1022_rvv_f32: 22004.2
    
    After:
    postfilter_15_c:         20189.5
    postfilter_15_rvv_f32:    7057.2
    postfilter_512_c:        20189.5
    postfilter_512_rvv_f32:   5667.2
    postfilter_1022_c:       20192.7
    postfilter_1022_rvv_f32:  5667.2
    adc87a5f
opusdsp_init.c 1.27 KB