1. 26 Mar, 2024 40 commits
    • Dai, Jianhui J's avatar
      avcodec/cbs_vp8: Improve the bitstream position check · 61afe4d9
      Dai, Jianhui J authored
      The VP8 compressed header may not be byte-aligned due to boolean
      coding. Round up byte count for accurate data positioning.
      Signed-off-by: 's avatarJianhui Dai <jianhui.j.dai@intel.com>
      Signed-off-by: 's avatarRonald S. Bultje <rsbultje@gmail.com>
      61afe4d9
    • Dai, Jianhui J's avatar
      avcodec/cbs_vp8: Use little endian in fixed() · 63dea3c1
      Dai, Jianhui J authored
      This commit adds value range checks to cbs_vp8_read_unsigned_le,
      migrates fixed() to use it, and enforces little-endian consistency for
      all read methods.
      Signed-off-by: 's avatarJianhui Dai <jianhui.j.dai@intel.com>
      Signed-off-by: 's avatarRonald S. Bultje <rsbultje@gmail.com>
      63dea3c1
    • Wenbin Chen's avatar
      ea2e0e92
    • Martin Storsjö's avatar
      aarch64: hevc: Produce plain neon versions of qpel_bi_hv · f872b197
      Martin Storsjö authored
      As the plain neon qpel_h functions process two rows at a time,
      we need to allocate storage for h+8 rows instead of h+7.
      
      By allocating storage for h+8 rows, incrementing the stack
      pointer won't end up at the right spot in the end. Store the
      intended final stack pointer value in a register x14 which we
      store on the stack.
      
      AWS Graviton 3:
      put_hevc_qpel_bi_hv4_8_c: 385.7
      put_hevc_qpel_bi_hv4_8_neon: 131.0
      put_hevc_qpel_bi_hv4_8_i8mm: 92.2
      put_hevc_qpel_bi_hv6_8_c: 701.0
      put_hevc_qpel_bi_hv6_8_neon: 239.5
      put_hevc_qpel_bi_hv6_8_i8mm: 191.0
      put_hevc_qpel_bi_hv8_8_c: 1162.0
      put_hevc_qpel_bi_hv8_8_neon: 228.0
      put_hevc_qpel_bi_hv8_8_i8mm: 225.2
      put_hevc_qpel_bi_hv12_8_c: 2305.0
      put_hevc_qpel_bi_hv12_8_neon: 558.0
      put_hevc_qpel_bi_hv12_8_i8mm: 483.2
      put_hevc_qpel_bi_hv16_8_c: 3965.2
      put_hevc_qpel_bi_hv16_8_neon: 732.7
      put_hevc_qpel_bi_hv16_8_i8mm: 656.5
      put_hevc_qpel_bi_hv24_8_c: 8709.7
      put_hevc_qpel_bi_hv24_8_neon: 1555.2
      put_hevc_qpel_bi_hv24_8_i8mm: 1448.7
      put_hevc_qpel_bi_hv32_8_c: 14818.0
      put_hevc_qpel_bi_hv32_8_neon: 2763.7
      put_hevc_qpel_bi_hv32_8_i8mm: 2468.0
      put_hevc_qpel_bi_hv48_8_c: 32855.5
      put_hevc_qpel_bi_hv48_8_neon: 6107.2
      put_hevc_qpel_bi_hv48_8_i8mm: 5452.7
      put_hevc_qpel_bi_hv64_8_c: 57591.5
      put_hevc_qpel_bi_hv64_8_neon: 10660.2
      put_hevc_qpel_bi_hv64_8_i8mm: 9580.0
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      f872b197
    • Martin Storsjö's avatar
      aarch64: hevc: Produce plain neon versions of qpel_uni_w_hv · d21b9a04
      Martin Storsjö authored
      As the plain neon qpel_h functions process two rows at a time,
      we need to allocate storage for h+8 rows instead of h+7.
      
      AWS Graviton 3:
      put_hevc_qpel_uni_w_hv4_8_c: 422.2
      put_hevc_qpel_uni_w_hv4_8_neon: 140.7
      put_hevc_qpel_uni_w_hv4_8_i8mm: 100.7
      put_hevc_qpel_uni_w_hv8_8_c: 1208.0
      put_hevc_qpel_uni_w_hv8_8_neon: 268.2
      put_hevc_qpel_uni_w_hv8_8_i8mm: 261.5
      put_hevc_qpel_uni_w_hv16_8_c: 4297.2
      put_hevc_qpel_uni_w_hv16_8_neon: 802.2
      put_hevc_qpel_uni_w_hv16_8_i8mm: 731.2
      put_hevc_qpel_uni_w_hv32_8_c: 15518.5
      put_hevc_qpel_uni_w_hv32_8_neon: 3085.2
      put_hevc_qpel_uni_w_hv32_8_i8mm: 2783.2
      put_hevc_qpel_uni_w_hv64_8_c: 57254.5
      put_hevc_qpel_uni_w_hv64_8_neon: 11787.5
      put_hevc_qpel_uni_w_hv64_8_i8mm: 10659.0
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      d21b9a04
    • Martin Storsjö's avatar
      aarch64: hevc: Produce plain neon versions of qpel_uni_hv · 5ab13867
      Martin Storsjö authored
      As the plain neon qpel_h functions process two rows at a time,
      we need to allocate storage for h+8 rows instead of h+7.
      
      By allocating storage for h+8 rows, incrementing the stack
      pointer won't end up at the right spot in the end. Store the
      intended final stack pointer value in a register x14 which we
      store on the stack.
      
      AWS Graviton 3:
      put_hevc_qpel_uni_hv4_8_c: 384.2
      put_hevc_qpel_uni_hv4_8_neon: 127.5
      put_hevc_qpel_uni_hv4_8_i8mm: 85.5
      put_hevc_qpel_uni_hv6_8_c: 705.5
      put_hevc_qpel_uni_hv6_8_neon: 224.5
      put_hevc_qpel_uni_hv6_8_i8mm: 176.2
      put_hevc_qpel_uni_hv8_8_c: 1136.5
      put_hevc_qpel_uni_hv8_8_neon: 216.5
      put_hevc_qpel_uni_hv8_8_i8mm: 214.0
      put_hevc_qpel_uni_hv12_8_c: 2259.5
      put_hevc_qpel_uni_hv12_8_neon: 498.5
      put_hevc_qpel_uni_hv12_8_i8mm: 410.7
      put_hevc_qpel_uni_hv16_8_c: 3824.7
      put_hevc_qpel_uni_hv16_8_neon: 670.0
      put_hevc_qpel_uni_hv16_8_i8mm: 603.7
      put_hevc_qpel_uni_hv24_8_c: 8113.5
      put_hevc_qpel_uni_hv24_8_neon: 1474.7
      put_hevc_qpel_uni_hv24_8_i8mm: 1351.5
      put_hevc_qpel_uni_hv32_8_c: 14744.5
      put_hevc_qpel_uni_hv32_8_neon: 2599.7
      put_hevc_qpel_uni_hv32_8_i8mm: 2266.0
      put_hevc_qpel_uni_hv48_8_c: 32800.0
      put_hevc_qpel_uni_hv48_8_neon: 5650.0
      put_hevc_qpel_uni_hv48_8_i8mm: 5011.7
      put_hevc_qpel_uni_hv64_8_c: 57856.2
      put_hevc_qpel_uni_hv64_8_neon: 9863.5
      put_hevc_qpel_uni_hv64_8_i8mm: 8767.7
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      5ab13867
    • Martin Storsjö's avatar
      aarch64: hevc: Produce plain neon versions of qpel_hv · 5cbeefc7
      Martin Storsjö authored
      As the plain neon qpel_h functions process two rows at a time,
      we need to allocate storage for h+8 rows instead of h+7.
      
      By allocating storage for h+8 rows, incrementing the stack
      pointer won't end up at the right spot in the end. Store the
      intended final stack pointer value in a register x14 which we
      store on the stack.
      
      AWS Graviton 3:
      put_hevc_qpel_hv4_8_c: 386.0
      put_hevc_qpel_hv4_8_neon: 125.7
      put_hevc_qpel_hv4_8_i8mm: 83.2
      put_hevc_qpel_hv6_8_c: 749.0
      put_hevc_qpel_hv6_8_neon: 207.0
      put_hevc_qpel_hv6_8_i8mm: 166.0
      put_hevc_qpel_hv8_8_c: 1305.2
      put_hevc_qpel_hv8_8_neon: 216.5
      put_hevc_qpel_hv8_8_i8mm: 213.0
      put_hevc_qpel_hv12_8_c: 2570.5
      put_hevc_qpel_hv12_8_neon: 480.0
      put_hevc_qpel_hv12_8_i8mm: 398.2
      put_hevc_qpel_hv16_8_c: 4158.7
      put_hevc_qpel_hv16_8_neon: 659.7
      put_hevc_qpel_hv16_8_i8mm: 593.5
      put_hevc_qpel_hv24_8_c: 8626.7
      put_hevc_qpel_hv24_8_neon: 1653.5
      put_hevc_qpel_hv24_8_i8mm: 1398.7
      put_hevc_qpel_hv32_8_c: 14646.0
      put_hevc_qpel_hv32_8_neon: 2566.2
      put_hevc_qpel_hv32_8_i8mm: 2287.5
      put_hevc_qpel_hv48_8_c: 31072.5
      put_hevc_qpel_hv48_8_neon: 6228.5
      put_hevc_qpel_hv48_8_i8mm: 5291.0
      put_hevc_qpel_hv64_8_c: 53847.2
      put_hevc_qpel_hv64_8_neon: 9856.7
      put_hevc_qpel_hv64_8_i8mm: 8831.0
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      5cbeefc7
    • Martin Storsjö's avatar
      aarch64: hevc: Reorder qpel_hv functions to prepare for templating · 20c38f4b
      Martin Storsjö authored
      This is a pure reordering of code without changing anything in
      the individual functions.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      20c38f4b
    • Martin Storsjö's avatar
      aarch64: hevc: Deduplicate the hevc_put_hevc_qpel_uni_w_hv*_8_end_neon functions · 4f71e4eb
      Martin Storsjö authored
      The hv32 and hv64 functions were identical - both loop and
      process 16 pixels at a time.
      
      The hv16 function was near identical, except for the outer loop
      (and using sp instead of a separate register).
      
      Given the size of these functions, the extra cost of the outer
      loop is negligible, so use the same function for hv16 as well.
      
      This removes over 200 lines of duplicated assembly, and over 4 KB
      of binary size.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      4f71e4eb
    • Martin Storsjö's avatar
      aarch64: hevc: Split the qpel_*_hv functions into two parts · 4063e50e
      Martin Storsjö authored
      The first horizontal filter can use either i8mm or plain neon
      versions, while the second part is a pure neon implementation.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      4063e50e
    • Martin Storsjö's avatar
      aarch64: hevc: Implement a neon version of hevc_qpel_uni_w_h*_8 · ad01d06f
      Martin Storsjö authored
      AWS Graviton 3:
      put_hevc_qpel_uni_w_h4_8_c: 159.0
      put_hevc_qpel_uni_w_h4_8_neon: 64.2
      put_hevc_qpel_uni_w_h4_8_i8mm: 40.0
      put_hevc_qpel_uni_w_h6_8_c: 344.7
      put_hevc_qpel_uni_w_h6_8_neon: 114.5
      put_hevc_qpel_uni_w_h6_8_i8mm: 82.0
      put_hevc_qpel_uni_w_h8_8_c: 596.2
      put_hevc_qpel_uni_w_h8_8_neon: 132.2
      put_hevc_qpel_uni_w_h8_8_i8mm: 106.0
      put_hevc_qpel_uni_w_h12_8_c: 1325.0
      put_hevc_qpel_uni_w_h12_8_neon: 299.0
      put_hevc_qpel_uni_w_h12_8_i8mm: 211.5
      put_hevc_qpel_uni_w_h16_8_c: 2300.0
      put_hevc_qpel_uni_w_h16_8_neon: 422.0
      put_hevc_qpel_uni_w_h16_8_i8mm: 286.2
      put_hevc_qpel_uni_w_h24_8_c: 5059.0
      put_hevc_qpel_uni_w_h24_8_neon: 912.2
      put_hevc_qpel_uni_w_h24_8_i8mm: 664.2
      put_hevc_qpel_uni_w_h32_8_c: 9198.2
      put_hevc_qpel_uni_w_h32_8_neon: 1638.2
      put_hevc_qpel_uni_w_h32_8_i8mm: 1033.7
      put_hevc_qpel_uni_w_h48_8_c: 20754.7
      put_hevc_qpel_uni_w_h48_8_neon: 3633.7
      put_hevc_qpel_uni_w_h48_8_i8mm: 2300.7
      put_hevc_qpel_uni_w_h64_8_c: 36854.7
      put_hevc_qpel_uni_w_h64_8_neon: 6435.7
      put_hevc_qpel_uni_w_h64_8_i8mm: 4039.2
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      ad01d06f
    • Martin Storsjö's avatar
      aarch64: hevc: Produce epel_bi_hv functions for both neon and i8mm · de23b384
      Martin Storsjö authored
      In addition to just templating, this contains one change to
      ff_hevc_put_hevc_epel_bi_hv32_8, by setting the w6 register
      which ff_hevc_put_hevc_epel_h32_8_neon requires.
      
      AWS Graviton 3:
      put_hevc_epel_bi_hv4_8_c: 176.5
      put_hevc_epel_bi_hv4_8_neon: 62.0
      put_hevc_epel_bi_hv4_8_i8mm: 58.0
      put_hevc_epel_bi_hv6_8_c: 343.7
      put_hevc_epel_bi_hv6_8_neon: 109.7
      put_hevc_epel_bi_hv6_8_i8mm: 105.7
      put_hevc_epel_bi_hv8_8_c: 536.0
      put_hevc_epel_bi_hv8_8_neon: 112.7
      put_hevc_epel_bi_hv8_8_i8mm: 111.7
      put_hevc_epel_bi_hv12_8_c: 1107.7
      put_hevc_epel_bi_hv12_8_neon: 254.7
      put_hevc_epel_bi_hv12_8_i8mm: 239.0
      put_hevc_epel_bi_hv16_8_c: 1927.7
      put_hevc_epel_bi_hv16_8_neon: 356.2
      put_hevc_epel_bi_hv16_8_i8mm: 334.2
      put_hevc_epel_bi_hv24_8_c: 4195.2
      put_hevc_epel_bi_hv24_8_neon: 736.7
      put_hevc_epel_bi_hv24_8_i8mm: 715.5
      put_hevc_epel_bi_hv32_8_c: 7280.5
      put_hevc_epel_bi_hv32_8_neon: 1287.7
      put_hevc_epel_bi_hv32_8_i8mm: 1162.2
      put_hevc_epel_bi_hv48_8_c: 16857.7
      put_hevc_epel_bi_hv48_8_neon: 2836.2
      put_hevc_epel_bi_hv48_8_i8mm: 2908.5
      put_hevc_epel_bi_hv64_8_c: 29248.2
      put_hevc_epel_bi_hv64_8_neon: 5051.7
      put_hevc_epel_bi_hv64_8_i8mm: 4491.5
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      de23b384
    • Martin Storsjö's avatar
      aarch64: hevc: Produce epel_uni_w_hv functions for both neon and i8mm · 96e5adda
      Martin Storsjö authored
      AWS Graviton 3:
      put_hevc_epel_uni_w_hv4_8_c: 191.2
      put_hevc_epel_uni_w_hv4_8_neon: 87.7
      put_hevc_epel_uni_w_hv4_8_i8mm: 83.2
      put_hevc_epel_uni_w_hv6_8_c: 349.5
      put_hevc_epel_uni_w_hv6_8_neon: 153.0
      put_hevc_epel_uni_w_hv6_8_i8mm: 148.5
      put_hevc_epel_uni_w_hv8_8_c: 581.2
      put_hevc_epel_uni_w_hv8_8_neon: 166.7
      put_hevc_epel_uni_w_hv8_8_i8mm: 163.5
      put_hevc_epel_uni_w_hv12_8_c: 1230.0
      put_hevc_epel_uni_w_hv12_8_neon: 387.7
      put_hevc_epel_uni_w_hv12_8_i8mm: 370.2
      put_hevc_epel_uni_w_hv16_8_c: 2003.2
      put_hevc_epel_uni_w_hv16_8_neon: 501.5
      put_hevc_epel_uni_w_hv16_8_i8mm: 490.2
      put_hevc_epel_uni_w_hv24_8_c: 4448.7
      put_hevc_epel_uni_w_hv24_8_neon: 1092.2
      put_hevc_epel_uni_w_hv24_8_i8mm: 1069.7
      put_hevc_epel_uni_w_hv32_8_c: 7817.2
      put_hevc_epel_uni_w_hv32_8_neon: 1916.2
      put_hevc_epel_uni_w_hv32_8_i8mm: 1829.5
      put_hevc_epel_uni_w_hv48_8_c: 16728.2
      put_hevc_epel_uni_w_hv48_8_neon: 4263.7
      put_hevc_epel_uni_w_hv48_8_i8mm: 4342.7
      put_hevc_epel_uni_w_hv64_8_c: 29563.2
      put_hevc_epel_uni_w_hv64_8_neon: 7474.2
      put_hevc_epel_uni_w_hv64_8_i8mm: 7128.5
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      96e5adda
    • Martin Storsjö's avatar
      aarch64: hevc: Produce epel_uni_hv functions for both neon and i8mm · d7294199
      Martin Storsjö authored
      AWS Graviton 3:
      put_hevc_epel_uni_hv4_8_c: 163.5
      put_hevc_epel_uni_hv4_8_neon: 59.7
      put_hevc_epel_uni_hv4_8_i8mm: 57.5
      put_hevc_epel_uni_hv6_8_c: 344.7
      put_hevc_epel_uni_hv6_8_neon: 105.0
      put_hevc_epel_uni_hv6_8_i8mm: 102.7
      put_hevc_epel_uni_hv8_8_c: 552.2
      put_hevc_epel_uni_hv8_8_neon: 111.2
      put_hevc_epel_uni_hv8_8_i8mm: 104.0
      put_hevc_epel_uni_hv12_8_c: 1195.0
      put_hevc_epel_uni_hv12_8_neon: 248.7
      put_hevc_epel_uni_hv12_8_i8mm: 229.5
      put_hevc_epel_uni_hv16_8_c: 1910.2
      put_hevc_epel_uni_hv16_8_neon: 339.5
      put_hevc_epel_uni_hv16_8_i8mm: 323.2
      put_hevc_epel_uni_hv24_8_c: 4048.2
      put_hevc_epel_uni_hv24_8_neon: 737.7
      put_hevc_epel_uni_hv24_8_i8mm: 713.7
      put_hevc_epel_uni_hv32_8_c: 6865.7
      put_hevc_epel_uni_hv32_8_neon: 1285.0
      put_hevc_epel_uni_hv32_8_i8mm: 1206.0
      put_hevc_epel_uni_hv48_8_c: 15830.5
      put_hevc_epel_uni_hv48_8_neon: 2844.7
      put_hevc_epel_uni_hv48_8_i8mm: 2914.0
      put_hevc_epel_uni_hv64_8_c: 27912.7
      put_hevc_epel_uni_hv64_8_neon: 4970.5
      put_hevc_epel_uni_hv64_8_i8mm: 4653.7
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      d7294199
    • Martin Storsjö's avatar
      aarch64: hevc: Produce epel_hv functions for both plain neon and i8mm · 7bf3d147
      Martin Storsjö authored
      AWS Graviton 3:
      put_hevc_epel_hv4_8_c: 163.7
      put_hevc_epel_hv4_8_neon: 52.5
      put_hevc_epel_hv4_8_i8mm: 49.5
      put_hevc_epel_hv6_8_c: 292.2
      put_hevc_epel_hv6_8_neon: 97.7
      put_hevc_epel_hv6_8_i8mm: 101.2
      put_hevc_epel_hv8_8_c: 471.0
      put_hevc_epel_hv8_8_neon: 106.7
      put_hevc_epel_hv8_8_i8mm: 102.5
      put_hevc_epel_hv12_8_c: 1030.2
      put_hevc_epel_hv12_8_neon: 240.5
      put_hevc_epel_hv12_8_i8mm: 215.0
      put_hevc_epel_hv16_8_c: 1711.5
      put_hevc_epel_hv16_8_neon: 340.2
      put_hevc_epel_hv16_8_i8mm: 319.2
      put_hevc_epel_hv24_8_c: 3670.0
      put_hevc_epel_hv24_8_neon: 702.0
      put_hevc_epel_hv24_8_i8mm: 666.5
      put_hevc_epel_hv32_8_c: 6785.5
      put_hevc_epel_hv32_8_neon: 1247.0
      put_hevc_epel_hv32_8_i8mm: 1169.0
      put_hevc_epel_hv48_8_c: 14689.7
      put_hevc_epel_hv48_8_neon: 2665.2
      put_hevc_epel_hv48_8_i8mm: 2740.0
      put_hevc_epel_hv64_8_c: 25899.2
      put_hevc_epel_hv64_8_neon: 4801.2
      put_hevc_epel_hv64_8_i8mm: 4487.7
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      7bf3d147
    • Martin Storsjö's avatar
      aarch64: hevc: Reorder epel_hv functions to prepare for templating · 5b5666e5
      Martin Storsjö authored
      This is a pure reordering of code without changing anything in
      the individual functions.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      5b5666e5
    • Martin Storsjö's avatar
      aarch64: hevc: Split the epel_*_hv functions into two parts · e6d4c0e1
      Martin Storsjö authored
      The first horizontal filter can use either i8mm or plain neon
      versions, while the second part is a pure neon implementation.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      e6d4c0e1
    • Martin Storsjö's avatar
      aarch64: hevc: Implement a neon version of hevc_epel_uni_w_h*_8 · 54af555b
      Martin Storsjö authored
      AWS Graviton 3:
      put_hevc_epel_uni_w_h4_8_c: 97.2
      put_hevc_epel_uni_w_h4_8_neon: 41.2
      put_hevc_epel_uni_w_h4_8_i8mm: 35.2
      put_hevc_epel_uni_w_h6_8_c: 203.7
      put_hevc_epel_uni_w_h6_8_neon: 84.7
      put_hevc_epel_uni_w_h6_8_i8mm: 74.7
      put_hevc_epel_uni_w_h8_8_c: 345.7
      put_hevc_epel_uni_w_h8_8_neon: 94.0
      put_hevc_epel_uni_w_h8_8_i8mm: 80.7
      put_hevc_epel_uni_w_h12_8_c: 768.7
      put_hevc_epel_uni_w_h12_8_neon: 196.7
      put_hevc_epel_uni_w_h12_8_i8mm: 169.7
      put_hevc_epel_uni_w_h16_8_c: 1313.0
      put_hevc_epel_uni_w_h16_8_neon: 290.7
      put_hevc_epel_uni_w_h16_8_i8mm: 238.0
      put_hevc_epel_uni_w_h24_8_c: 2877.5
      put_hevc_epel_uni_w_h24_8_neon: 650.0
      put_hevc_epel_uni_w_h24_8_i8mm: 512.0
      put_hevc_epel_uni_w_h32_8_c: 5113.5
      put_hevc_epel_uni_w_h32_8_neon: 1129.5
      put_hevc_epel_uni_w_h32_8_i8mm: 739.2
      put_hevc_epel_uni_w_h48_8_c: 11757.0
      put_hevc_epel_uni_w_h48_8_neon: 2518.7
      put_hevc_epel_uni_w_h48_8_i8mm: 1688.5
      put_hevc_epel_uni_w_h64_8_c: 20478.0
      put_hevc_epel_uni_w_h64_8_neon: 4411.7
      put_hevc_epel_uni_w_h64_8_i8mm: 2884.0
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      54af555b
    • Martin Storsjö's avatar
      aarch64: hevc: Implement a neon version of put_hevc_epel_h*_8 · 6d384298
      Martin Storsjö authored
      AWS Graviton 3:
      put_hevc_epel_h4_8_c: 64.7
      put_hevc_epel_h4_8_neon: 25.0
      put_hevc_epel_h4_8_i8mm: 21.2
      put_hevc_epel_h6_8_c: 130.0
      put_hevc_epel_h6_8_neon: 40.7
      put_hevc_epel_h6_8_i8mm: 36.5
      put_hevc_epel_h8_8_c: 209.0
      put_hevc_epel_h8_8_neon: 45.2
      put_hevc_epel_h8_8_i8mm: 41.2
      put_hevc_epel_h12_8_c: 465.5
      put_hevc_epel_h12_8_neon: 104.5
      put_hevc_epel_h12_8_i8mm: 86.5
      put_hevc_epel_h16_8_c: 830.7
      put_hevc_epel_h16_8_neon: 134.2
      put_hevc_epel_h16_8_i8mm: 114.0
      put_hevc_epel_h24_8_c: 1844.7
      put_hevc_epel_h24_8_neon: 282.2
      put_hevc_epel_h24_8_i8mm: 277.2
      put_hevc_epel_h32_8_c: 3227.5
      put_hevc_epel_h32_8_neon: 501.5
      put_hevc_epel_h32_8_i8mm: 396.0
      put_hevc_epel_h48_8_c: 7229.2
      put_hevc_epel_h48_8_neon: 1120.2
      put_hevc_epel_h48_8_i8mm: 901.2
      put_hevc_epel_h64_8_c: 12869.0
      put_hevc_epel_h64_8_neon: 1999.2
      put_hevc_epel_h64_8_i8mm: 1610.5
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      6d384298
    • Martin Storsjö's avatar
    • Martin Storsjö's avatar
      aarch64: hevc: Specialize put_hevc_\type\()_h*_8_neon for horizontal looping · 717cc82d
      Martin Storsjö authored
      For widths of 32 pixels and more, loop first horizontally,
      then vertically.
      
      Previously, this function would process a 16 pixel wide slice
      of the block, looping vertically. After processing the whole
      height, it would backtrack and process the next 16 pixel wide
      slice.
      
      When doing 8tap filtering horizontally, the function must load
      7 more pixels (in practice, 8) following the actual inputs, and
      this was done for each slice.
      
      By iterating first horizontally throughout each line, then
      vertically, we access data in a more cache friendly order, and
      we don't need to reload data unnecessarily.
      
      Keep the original order in put_hevc_\type\()_h12_8_neon; the
      only suboptimal case there is for width=24. But specializing
      an optimal variant for that would require more code, which
      might not be worth it.
      
      For the h16 case, this implementation would give a slowdown,
      as it now loads the first 8 pixels separately from the rest, but
      for larger widths, it is a gain. Therefore, keep the h16 case
      as it was (but remove the outer loop), and create a new specialized
      version for horizontal looping with 16 pixels at a time.
      
      Before:                  Cortex A53      A72      A73  Graviton 3
      put_hevc_qpel_h16_8_neon:     710.5    667.7    692.5   211.0
      put_hevc_qpel_h32_8_neon:    2791.5   2643.5   2732.0   883.5
      put_hevc_qpel_h64_8_neon:   10954.0  10657.0  10874.2  3241.5
      After:
      put_hevc_qpel_h16_8_neon:     697.5    663.5    705.7   212.5
      put_hevc_qpel_h32_8_neon:    2767.2   2684.5   2791.2   920.5
      put_hevc_qpel_h64_8_neon:   10559.2  10471.5  10932.2  3051.7
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      717cc82d
    • Martin Storsjö's avatar
      aarch64: hevc: Merge consecutive stores in put_hevc_\type\()_h16_8_neon · e3a54cab
      Martin Storsjö authored
      This gets rid of a couple instructions, but the actual performance
      is almost identical on Cortex A72/A73. On Cortex A53, it is a
      handful of cycles faster.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      e3a54cab
    • Martin Storsjö's avatar
      aarch64: hevc: Don't iterate with sp in ff_hevc_put_hevc_qpel_uni_w_hv32/64_8_neon_i8mm · 78db8405
      Martin Storsjö authored
      Many of the routines within hevcdsp_epel_neon and hevcdsp_qpel_neon
      store temporary buffers on the stack. When consuming it,
      many of these functions use the stack pointer as incremental pointer
      for reading the data (instead of storing it in another register),
      which is rather unusual.
      
      Technically, this is fine as long as the pointer remains properly
      aligned.
      
      However in the case of ff_hevc_put_hevc_qpel_uni_w_hv64_8_neon_i8mm,
      after incrementing sp when reading data (within each 16 pixel
      wide stripe) it would then reset the stack pointer back to a lower
      value, for reading the next 16 pixel wide stripe, expecting the
      data to remain untouched.
      
      This can't be assumed; data on the stack below the stack pointer
      can be clobbered (e.g. by a signal handler). Some OS ABIs
      allow for a little margin that won't be touched, aka a red zone,
      but not all do. The ones that do, guarantee 16 or 128 bytes, not
      9 KB.
      
      Convert this function to use a separate pointer register to
      iterate through the data, retaining the stack pointer to point
      at the bottom of the data we require to remain untouched.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      78db8405
    • Martin Storsjö's avatar
      aarch64: hevc: Reorder a misplaced function init line · e66858fb
      Martin Storsjö authored
      Group the epel and qpel functions together.
      Signed-off-by: 's avatarMartin Storsjö <martin@martin.st>
      e66858fb
    • Andreas Rheinhardt's avatar
      fftools/ffmpeg_mux_init: Fix double-free on error · ced5c5fd
      Andreas Rheinhardt authored
      MATCH_PER_STREAM_OPT iterates over all options of a given
      OptionDef and tests whether they apply to the current stream;
      if so, they are set to ost->apad, otherwise, the code errors
      out. If no error happens, ost->apad is av_strdup'ed in order
      to take ownership of this pointer.
      
      But this means that setting it originally was premature,
      as it leads to double-frees when an error happens lateron.
      This can simply be reproduced with
      ffmpeg -filter_complex anullsrc  -apad bar -apad:n baz -f null -
      This is a regression since 83ace80b.
      
      Fix this by using a temporary variable instead of directly
      setting ost->apad. Also only strdup the string if it actually
      is != NULL.
      Reviewed-by: 's avatarMarth64 <marth64@proxyid.net>
      Signed-off-by: 's avatarAndreas Rheinhardt <andreas.rheinhardt@outlook.com>
      ced5c5fd
    • Andreas Rheinhardt's avatar
      avformat/internal: Move FF_FMT_INIT_CLEANUP to demux.h · 4a4dcde3
      Andreas Rheinhardt authored
      and rename it to FF_INFMT_INIT_CLEANUP. This flag is demuxer-only,
      so this is the more appropriate place for it.
      This does not preclude adding internal flags common to both
      demuxer and muxer in the future.
      Signed-off-by: 's avatarAndreas Rheinhardt <andreas.rheinhardt@outlook.com>
      4a4dcde3
    • Andreas Rheinhardt's avatar
      avformat/vqf: Return 0 on success in read_packet · 27af88fb
      Andreas Rheinhardt authored
      Demuxers are not supposed to return the size of the packet read.
      Signed-off-by: 's avatarAndreas Rheinhardt <andreas.rheinhardt@outlook.com>
      27af88fb
    • Andreas Rheinhardt's avatar
    • Andreas Rheinhardt's avatar
      cee70b9f
    • Andreas Rheinhardt's avatar
      avformat/argo_cvg: Avoid relocations for ArgoCVGOverride · aa8c7dc3
      Andreas Rheinhardt authored
      The average length of the strings used here does not differ much
      from the length of the longest string; therefore it makes sense
      to use an array big enough for the longest string and not
      a pointer to a string. This also moves this array into .rodata
      (from .data.rel.ro).
      Signed-off-by: 's avatarAndreas Rheinhardt <andreas.rheinhardt@outlook.com>
      aa8c7dc3
    • Andreas Rheinhardt's avatar
      69b85a69
    • Andreas Rheinhardt's avatar
      cdff5a2c
    • Andreas Rheinhardt's avatar
      avformat/fsb: Don't set data_offset manually · 56ba83ff
      Andreas Rheinhardt authored
      It is set generically to the value that it is to here.
      Signed-off-by: 's avatarAndreas Rheinhardt <andreas.rheinhardt@outlook.com>
      56ba83ff
    • Andreas Rheinhardt's avatar
    • Andreas Rheinhardt's avatar
      avformat/g722: Inline constants · 87681885
      Andreas Rheinhardt authored
      Forgotten in 5f0e161d.
      Signed-off-by: 's avatarAndreas Rheinhardt <andreas.rheinhardt@outlook.com>
      87681885
    • Andreas Rheinhardt's avatar
      avformat/fitsdec: Don't use AVBPrint for temporary storage · b93ed5c2
      Andreas Rheinhardt authored
      Most of the data in the temporary storage ends up being
      returned to the user as AVPacket.data, so it makes sense
      to avoid using the AVBPrint for temporary storage altogether
      (in particular in light of the fact that the blocks read here
      are too big for the small-string optimization anyway) and
      read the data directly into AVPacket.data. This also avoids
      another memcpy() from a stack buffer to the AVBPrint in ts_image()
      (that could always have been avoided with av_bprint_get_buffer()).
      
      These changes also allow to use av_append_packet(), which
      greatly simplifies the code; furthermore, one can avoid cleanup
      code on error as the packet is already unreferenced generically
      on error.
      
      There are two user-visible changes from this patch:
      1. Truncated packets are now marked as corrupt.
      2. AVPacket.pos is set (it corresponds to the discarded header
      line, 80 bytes before the position corresponding to the
      actual packet data).
      
      Furthermore, this patch also removes code that triggered
      a -Wtautological-constant-out-of-range-compare warning
      from Clang (namely a comparison of an unsigned and INT64_MAX
      in an assert).
      Signed-off-by: 's avatarAndreas Rheinhardt <andreas.rheinhardt@outlook.com>
      b93ed5c2
    • Andreas Rheinhardt's avatar
      avformat/hls: Don't access FFInputFormat.raw_codec_id · 5144455c
      Andreas Rheinhardt authored
      It is an implementation detail of other input formats whether
      they use raw_codec_id or not. The HLS demuxer should not rely
      on this.
      Signed-off-by: 's avatarAndreas Rheinhardt <andreas.rheinhardt@outlook.com>
      5144455c
    • Andreas Rheinhardt's avatar
      configure: Make hls demuxer select AAC, AC3 and EAC3 demuxers · 8d8b5947
      Andreas Rheinhardt authored
      The code relies on their presence and would presumably crash
      when retrieving in_fmt->name if an encrypted stream with a codec id
      without demuxer were encountered.
      Signed-off-by: 's avatarAndreas Rheinhardt <andreas.rheinhardt@outlook.com>
      8d8b5947
    • Andreas Rheinhardt's avatar
      avformat/mux: Remove check for AVFMT_ALLOW_FLUSH · a990e6fa
      Andreas Rheinhardt authored
      Due to the bump it is now certain that all devices
      that support flushing have the proper internal flag set.
      (Notice that the check for LIBAVFORMAT_VERSION was wrong.)
      Signed-off-by: 's avatarAndreas Rheinhardt <andreas.rheinhardt@outlook.com>
      a990e6fa
    • Andreas Rheinhardt's avatar
      avformat/file: Combine all CONFIG_ANDROID_CONTENT_PROTOCOL blocks · e95dd6f5
      Andreas Rheinhardt authored
      Besides improving readability this also ensures that
      a developer who has the android content protocol enabled
      and works on the other parts of the file will not
      forget to add necessary inclusions just because of
      (indirect) inclusions from the files included only
      when said protocol is enabled.
      Reviewed-by: 's avatarMatthieu Bouron <matthieu.bouron@gmail.com>
      Signed-off-by: 's avatarAndreas Rheinhardt <andreas.rheinhardt@outlook.com>
      e95dd6f5