1. 25 May, 2024 3 commits
  2. 24 May, 2024 5 commits
  3. 23 May, 2024 6 commits
  4. 22 May, 2024 8 commits
    • Stone Chen's avatar
      tests/checkasm: Add check_vvc_sad to vvc_mc.c · 2e877090
      Stone Chen authored
      Adds checkasm for DMVR SAD AVX2 implementation.
      
      Benchmarks ( AMD 7940HS )
      vvc_sad_8x8_c: 50.3
      vvc_sad_8x8_avx2: 0.3
      vvc_sad_16x16_c: 250.3
      vvc_sad_16x16_avx2: 10.3
      vvc_sad_32x32_c: 1020.3
      vvc_sad_32x32_avx2: 60.3
      vvc_sad_64x64_c: 3850.3
      vvc_sad_64x64_avx2: 220.3
      vvc_sad_128x128_c: 14100.3
      vvc_sad_128x128_avx2: 840.3
      Reviewed-by: 's avatarRonald S. Bultje <rsbultje@gmail.com>
      Signed-off-by: 's avatarJames Almer <jamrial@gmail.com>
      2e877090
    • Stone Chen's avatar
      libavcodec/x86/vvc: Add AVX2 DMVR SAD functions for VVC · 0e52a4e4
      Stone Chen authored
      Implements AVX2 DMVR (decoder-side motion vector refinement) SAD functions. DMVR SAD is only calculated if w >= 8, h >= 8, and w * h > 128. To reduce complexity, SAD is only calculated on even rows. This is calculated for all video bitdepths, but the values passed to the function are always 16bit (even if the original video bitdepth is 8). The AVX2 implementation uses min/max/sub.
      
      Additionally this changes parameters dx and dy from int to intptr_t. This allows dx & dy to be used as pointer offsets without needing to use movsxd.
      
      Benchmarks ( AMD 7940HS )
      Before:
      BQTerrace_1920x1080_60_10_420_22_RA.vvc | 106.0 |
      Chimera_8bit_1080P_1000_frames.vvc | 204.3 |
      NovosobornayaSquare_1920x1080.bin | 197.3 |
      RitualDance_1920x1080_60_10_420_37_RA.266 | 174.0 |
      
      After:
      BQTerrace_1920x1080_60_10_420_22_RA.vvc | 109.3 |
      Chimera_8bit_1080P_1000_frames.vvc | 216.0 |
      NovosobornayaSquare_1920x1080.bin | 204.0|
      RitualDance_1920x1080_60_10_420_37_RA.266 | 181.7 |
      Reviewed-by: 's avatarRonald S. Bultje <rsbultje@gmail.com>
      Signed-off-by: 's avatarJames Almer <jamrial@gmail.com>
      0e52a4e4
    • James Almer's avatar
      avformat/mov: store sample_sizes as unsigned ints · 3146b77a
      James Almer authored
      As defined in Section 8.7.3.2.1 of ISO 14496-12.
      Any unsupported value will be rejected in mov_build_index() without outright
      aborting demuxing.
      
      Fixes ticket #11005.
      Signed-off-by: 's avatarJames Almer <jamrial@gmail.com>
      3146b77a
    • James Almer's avatar
      avformat/vvc: fix parsing sps_subpic_id · 2d84ee37
      James Almer authored
      The length of the sps_subpic_id[i] syntax element is sps_subpic_id_len_minus1 + 1 bits.
      Signed-off-by: 's avatarJames Almer <jamrial@gmail.com>
      2d84ee37
    • James Almer's avatar
      avformat/vvc: initialize some ptl flags · 3bd7e3a3
      James Almer authored
      Signed-off-by: 's avatarJames Almer <jamrial@gmail.com>
      3bd7e3a3
    • Rémi Denis-Courmont's avatar
      lavc/h263dsp: R-V V {h,v}_loop_filter · 910d281b
      Rémi Denis-Courmont authored
      Since the horizontal and vertical filters are identical except for a
      transposition, this uses a common subprocedure with an ad-hoc ABI.
      To preserve return-address stack prediction, a link register has to be
      used (c.f. the "Control Transfer Instructions" from the
      RISC-V ISA Manual). The alternate/temporary link register T0 is used
      here, so that the normal RA is preserved (something Arm cannot do!).
      
      To load the strength value based on `qscale`, the shortest possible
      and PIC-compatible sequence is used: AUIPC; ADD; LBU. The classic
      LLA; ADD; LBU sequence would add one more instruction since LLA is a
      convenience alias for AUIPC; ADDI. To ensure that this trick works,
      relocation relaxation is disabled.
      
      To implement the two signed divisions by a power of two toward zero:
       (x / (1 << SHIFT))
      the code relies on the small range of integers involved, computing:
       (x + (x >> (16 - SHIFT))) >> SHIFT
      rather than the more general:
       (x + ((x >> (16 - 1)) & ((1 << SHIFT) - 1))) >> SHIFT
      Thus one ANDI instruction is avoided.
      
      T-Head C908:
      h263dsp.h_loop_filter_c:       228.2
      h263dsp.h_loop_filter_rvv_i32: 144.0
      h263dsp.v_loop_filter_c:       242.7
      h263dsp.v_loop_filter_rvv_i32: 114.0
      (C is probably worse in real use due to less predictible branches.)
      910d281b
    • James Almer's avatar
      x86/vvc_alf: use the x86inc instruction macros · 3d1597d3
      James Almer authored
      Let its magic figure out the correct mnemonic based on target instruction set.
      Signed-off-by: 's avatarJames Almer <jamrial@gmail.com>
      3d1597d3
    • llyyr's avatar
      avformat/mov: avoid seeking back to 0 on HEVC open GOP files · d1b96c38
      llyyr authored
      ab77b878 attempted to fix the issue of broken packets being sent to
      the decoder by implementing logic that kept attempting to PTS-step
      backwards until it reached a valid point, however applying this
      heuristic meant that in files that had no valid points (such as HEVC
      videos shot on iPhones), we'd seek back to sample 0 on every seek
      attempt. This meant that files that were previously seekable, albeit
      with some skipped frames, were not seekable at all now.
      
      Relax this heuristic a bit by giving up on seeking to a valid point if
      we've tried a different sample and we still don't have a valid point to
      seek to. This may some frames to be skipped on seeking but it's better
      than not being able to seek at all in such files.
      
      Fixes: ab77b878 ("avformat/mov: fix seeking with HEVC open GOP files")
      Fixes: #10585
      Signed-off-by: 's avatarPhilip Langdale <philipl@overt.org>
      d1b96c38
  5. 21 May, 2024 18 commits