• Rémi Denis-Courmont's avatar
    lavc/h263dsp: R-V V {h,v}_loop_filter · 910d281b
    Rémi Denis-Courmont authored
    Since the horizontal and vertical filters are identical except for a
    transposition, this uses a common subprocedure with an ad-hoc ABI.
    To preserve return-address stack prediction, a link register has to be
    used (c.f. the "Control Transfer Instructions" from the
    RISC-V ISA Manual). The alternate/temporary link register T0 is used
    here, so that the normal RA is preserved (something Arm cannot do!).
    
    To load the strength value based on `qscale`, the shortest possible
    and PIC-compatible sequence is used: AUIPC; ADD; LBU. The classic
    LLA; ADD; LBU sequence would add one more instruction since LLA is a
    convenience alias for AUIPC; ADDI. To ensure that this trick works,
    relocation relaxation is disabled.
    
    To implement the two signed divisions by a power of two toward zero:
     (x / (1 << SHIFT))
    the code relies on the small range of integers involved, computing:
     (x + (x >> (16 - SHIFT))) >> SHIFT
    rather than the more general:
     (x + ((x >> (16 - 1)) & ((1 << SHIFT) - 1))) >> SHIFT
    Thus one ANDI instruction is avoided.
    
    T-Head C908:
    h263dsp.h_loop_filter_c:       228.2
    h263dsp.h_loop_filter_rvv_i32: 144.0
    h263dsp.v_loop_filter_c:       242.7
    h263dsp.v_loop_filter_rvv_i32: 114.0
    (C is probably worse in real use due to less predictible branches.)
    910d281b
h263dsp.h 1.27 KB