• jinbo's avatar
    avcodec/hevc: Add asm opt for the following functions · 9239081d
    jinbo authored
    tests/checkasm/checkasm:           C       LSX     LASX
    put_hevc_qpel_uni_h4_8_c:          5.7     1.2
    put_hevc_qpel_uni_h6_8_c:          12.2    2.7
    put_hevc_qpel_uni_h8_8_c:          21.5    3.2
    put_hevc_qpel_uni_h12_8_c:         47.2    9.2     7.2
    put_hevc_qpel_uni_h16_8_c:         87.0    11.7    9.0
    put_hevc_qpel_uni_h24_8_c:         188.2   27.5    21.0
    put_hevc_qpel_uni_h32_8_c:         335.2   46.7    28.5
    put_hevc_qpel_uni_h48_8_c:         772.5   104.5   65.2
    put_hevc_qpel_uni_h64_8_c:         1383.2  142.2   109.0
    
    put_hevc_epel_uni_w_v4_8_c:        5.0     1.5
    put_hevc_epel_uni_w_v6_8_c:        10.7    3.5     2.5
    put_hevc_epel_uni_w_v8_8_c:        18.2    3.7     3.0
    put_hevc_epel_uni_w_v12_8_c:       40.2    10.7    7.5
    put_hevc_epel_uni_w_v16_8_c:       70.2    13.0    9.2
    put_hevc_epel_uni_w_v24_8_c:       158.2   30.2    22.5
    put_hevc_epel_uni_w_v32_8_c:       281.0   52.0    36.5
    put_hevc_epel_uni_w_v48_8_c:       631.7   116.7   82.7
    put_hevc_epel_uni_w_v64_8_c:       1108.2  207.5   142.2
    
    put_hevc_epel_uni_w_h4_8_c:        4.7     1.2
    put_hevc_epel_uni_w_h6_8_c:        9.7     3.5     2.7
    put_hevc_epel_uni_w_h8_8_c:        17.2    4.2     3.5
    put_hevc_epel_uni_w_h12_8_c:       38.0    11.5    7.2
    put_hevc_epel_uni_w_h16_8_c:       69.2    14.5    9.2
    put_hevc_epel_uni_w_h24_8_c:       152.0   34.7    22.5
    put_hevc_epel_uni_w_h32_8_c:       271.0   58.0    40.0
    put_hevc_epel_uni_w_h48_8_c:       597.5   136.7   95.0
    put_hevc_epel_uni_w_h64_8_c:       1074.0  252.2   168.0
    
    put_hevc_epel_bi_h4_8_c:           4.5     0.7
    put_hevc_epel_bi_h6_8_c:           9.0     1.5
    put_hevc_epel_bi_h8_8_c:           15.2    1.7
    put_hevc_epel_bi_h12_8_c:          33.5    4.2     3.7
    put_hevc_epel_bi_h16_8_c:          59.7    5.2     4.7
    put_hevc_epel_bi_h24_8_c:          132.2   11.0
    put_hevc_epel_bi_h32_8_c:          232.7   20.2    13.2
    put_hevc_epel_bi_h48_8_c:          521.7   45.2    31.2
    put_hevc_epel_bi_h64_8_c:          949.0   71.5    51.0
    
    After this patch, the peformance of decoding H265 4K 30FPS
    30Mbps on 3A6000 with 8 threads improves 1fps(55fps-->56fsp).
    
    Change-Id: I8cc1e41daa63ca478039bc55d1ee8934a7423f51
    Reviewed-by: yinshiyou-hf@loongson.cn
    Signed-off-by: 's avatarMichael Niedermayer <michael@niedermayer.cc>
    9239081d
hevcdsp_lsx.h 10.5 KB