Commits · 590fffe6adcfdc13e6f520a47be0ece72fe0707d · Stefan Westerfeld / ffmpeg

25 May, 2024 3 commits

avformat/gifdec: Check ffio_ensure_seekback() · 590fffe6
Andreas Rheinhardt authored May 22, 2024
```
Fixes Coverity issue #1598400.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
```
590fffe6
avformat/oggdec: Check ffio_ensure_seekback() · b47116be
Andreas Rheinhardt authored May 22, 2024
```
Fixes Coverity issue #1492327.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
```
b47116be

lavc/flacdsp: do not assume maximum R-V VL · f8837465

Rémi Denis-Courmont authored May 24, 2024

This loop correctly assumes that VLMAX=16 (4x128-bit vectors
with 32-bit elements) and 32 >= pred_order > 16. We need to alternate
between VL=16 and VL=t2=pred_order-16 elements to add up to pred_order.

The current code requests AVL=a2=pred_order elements. In QEMU and on
thte K230 hardware, this sets VL=16 as we need. But the specification
merely guarantees that we get: ceil(AVL / 2) <= VL <= VLMAX. For
instance, if pred_order equals 27, we could end up with VL=14 or VL=15
instead of VL=16. So instead, request literally VLMAX=16.

f8837465

24 May, 2024 5 commits
- avcodec/flacdec: Remove unused variable · aff24c16
  Andreas Rheinhardt authored May 24, 2024
```
Forgotten in 0380a03f.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
```
  aff24c16
- lavc/pixblockdsp: add scalar get_pixels_unaligned · ba38d0e3
  Rémi Denis-Courmont authored May 21, 2024
```
The code is already there, we just need to use it.

get_pixels_unaligned_c: 2.2
get_pixels_unaligned_misaligned: 1.7
```
  ba38d0e3
- checkasm/riscv: test misaligned before V · d03cdfa2
  Rémi Denis-Courmont authored May 21, 2024
```
Otherwise V functions mask scalar misaligned ones.
```
  d03cdfa2
- checkasm/flacdsp: add a test for lpc33 · 0920f506
  James Almer authored May 12, 2024
```
Signed-off-by: James Almer <jamrial@gmail.com>
```
  0920f506
- avcodec/flacdsp: split off lpc33 into a dsp function · 0380a03f
  James Almer authored May 12, 2024
```
Signed-off-by: James Almer <jamrial@gmail.com>
```
  0380a03f
23 May, 2024 6 commits

avformat/movenc: add support for writing SA3D boxes · 62397bcf
James Almer authored May 14, 2024
```
Signed-off-by: James Almer <jamrial@gmail.com>
```
62397bcf
avutil/channel_layout: add a helper function to get the ambisonic order of a layout · 8c974494
James Almer authored May 14, 2024
```
Signed-off-by: James Almer <jamrial@gmail.com>
```
8c974494

libavcodec/x86/vvc/vvc_sad: fix assembler error · 8155808c

Haihao Xiang authored May 23, 2024

X86ASM    libavcodec/x86/vvc/vvc_sad.o
libavcodec/x86/vvc/vvc_sad.asm:85: error: invalid number of operands
libavcodec/x86/vvc/vvc_sad.asm:87: error: invalid number of operands
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Signed-off-by: James Almer <jamrial@gmail.com>

8155808c

avfilter/af_atempo: Fix indentation · ece95dc3

Andreas Rheinhardt authored May 22, 2024

Forgotten after b8f74ee5.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

ece95dc3

avfilter/af_atempo: Simplify resetting · 42e0e058

Andreas Rheinhardt authored May 22, 2024

The earlier code distinguished between a partial reset
(yae_clear()) and a complete reset (yae_release_buffers()
which also releases the buffers); this separation existed
to avoid allocations, as buffers were reallocated on reconfigs.

Yet it is pointless since a5704659,
so simply use yae_release_buffers() everywhere.
Reviewed-by: Pavel Koshevoy <pkoshevoy@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

42e0e058

avfilter/af_atempo: Properly check av_tx_init() · 35e7fa0a
Andreas Rheinhardt authored May 22, 2024
```
Fixes Coverity issue #1516804.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
```
35e7fa0a

22 May, 2024 8 commits

tests/checkasm: Add check_vvc_sad to vvc_mc.c · 2e877090

Stone Chen authored May 22, 2024

Adds checkasm for DMVR SAD AVX2 implementation.

Benchmarks ( AMD 7940HS )
vvc_sad_8x8_c: 50.3
vvc_sad_8x8_avx2: 0.3
vvc_sad_16x16_c: 250.3
vvc_sad_16x16_avx2: 10.3
vvc_sad_32x32_c: 1020.3
vvc_sad_32x32_avx2: 60.3
vvc_sad_64x64_c: 3850.3
vvc_sad_64x64_avx2: 220.3
vvc_sad_128x128_c: 14100.3
vvc_sad_128x128_avx2: 840.3
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>

2e877090

libavcodec/x86/vvc: Add AVX2 DMVR SAD functions for VVC · 0e52a4e4

Stone Chen authored May 22, 2024

Implements AVX2 DMVR (decoder-side motion vector refinement) SAD functions. DMVR SAD is only calculated if w >= 8, h >= 8, and w * h > 128. To reduce complexity, SAD is only calculated on even rows. This is calculated for all video bitdepths, but the values passed to the function are always 16bit (even if the original video bitdepth is 8). The AVX2 implementation uses min/max/sub.

Additionally this changes parameters dx and dy from int to intptr_t. This allows dx & dy to be used as pointer offsets without needing to use movsxd.

Benchmarks ( AMD 7940HS )
Before:
BQTerrace_1920x1080_60_10_420_22_RA.vvc | 106.0 |
Chimera_8bit_1080P_1000_frames.vvc | 204.3 |
NovosobornayaSquare_1920x1080.bin | 197.3 |
RitualDance_1920x1080_60_10_420_37_RA.266 | 174.0 |

After:
BQTerrace_1920x1080_60_10_420_22_RA.vvc | 109.3 |
Chimera_8bit_1080P_1000_frames.vvc | 216.0 |
NovosobornayaSquare_1920x1080.bin | 204.0|
RitualDance_1920x1080_60_10_420_37_RA.266 | 181.7 |
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>

0e52a4e4

avformat/mov: store sample_sizes as unsigned ints · 3146b77a

James Almer authored May 20, 2024

As defined in Section 8.7.3.2.1 of ISO 14496-12.
Any unsupported value will be rejected in mov_build_index() without outright
aborting demuxing.

Fixes ticket #11005.
Signed-off-by: James Almer <jamrial@gmail.com>

3146b77a

avformat/vvc: fix parsing sps_subpic_id · 2d84ee37

James Almer authored May 19, 2024

The length of the sps_subpic_id[i] syntax element is sps_subpic_id_len_minus1 + 1 bits.
Signed-off-by: James Almer <jamrial@gmail.com>

2d84ee37

avformat/vvc: initialize some ptl flags · 3bd7e3a3
James Almer authored May 18, 2024
```
Signed-off-by: James Almer <jamrial@gmail.com>
```
3bd7e3a3

lavc/h263dsp: R-V V {h,v}_loop_filter · 910d281b

Rémi Denis-Courmont authored May 19, 2024

Since the horizontal and vertical filters are identical except for a
transposition, this uses a common subprocedure with an ad-hoc ABI.
To preserve return-address stack prediction, a link register has to be
used (c.f. the "Control Transfer Instructions" from the
RISC-V ISA Manual). The alternate/temporary link register T0 is used
here, so that the normal RA is preserved (something Arm cannot do!).

To load the strength value based on `qscale`, the shortest possible
and PIC-compatible sequence is used: AUIPC; ADD; LBU. The classic
LLA; ADD; LBU sequence would add one more instruction since LLA is a
convenience alias for AUIPC; ADDI. To ensure that this trick works,
relocation relaxation is disabled.

To implement the two signed divisions by a power of two toward zero:
 (x / (1 << SHIFT))
the code relies on the small range of integers involved, computing:
 (x + (x >> (16 - SHIFT))) >> SHIFT
rather than the more general:
 (x + ((x >> (16 - 1)) & ((1 << SHIFT) - 1))) >> SHIFT
Thus one ANDI instruction is avoided.

T-Head C908:
h263dsp.h_loop_filter_c:       228.2
h263dsp.h_loop_filter_rvv_i32: 144.0
h263dsp.v_loop_filter_c:       242.7
h263dsp.v_loop_filter_rvv_i32: 114.0
(C is probably worse in real use due to less predictible branches.)

910d281b

x86/vvc_alf: use the x86inc instruction macros · 3d1597d3

James Almer authored May 21, 2024

Let its magic figure out the correct mnemonic based on target instruction set.
Signed-off-by: James Almer <jamrial@gmail.com>

3d1597d3

avformat/mov: avoid seeking back to 0 on HEVC open GOP files · d1b96c38

llyyr authored May 22, 2024

ab77b878 attempted to fix the issue of broken packets being sent to
the decoder by implementing logic that kept attempting to PTS-step
backwards until it reached a valid point, however applying this
heuristic meant that in files that had no valid points (such as HEVC
videos shot on iPhones), we'd seek back to sample 0 on every seek
attempt. This meant that files that were previously seekable, albeit
with some skipped frames, were not seekable at all now.

Relax this heuristic a bit by giving up on seeking to a valid point if
we've tried a different sample and we still don't have a valid point to
seek to. This may some frames to be skipped on seeking but it's better
than not being able to seek at all in such files.

Fixes: ab77b878 ("avformat/mov: fix seeking with HEVC open GOP files")
Fixes: #10585
Signed-off-by: Philip Langdale <philipl@overt.org>

d1b96c38

21 May, 2024 18 commits

lavc/vp9dsp: R-V V mc avg · 0c1304ae

sunyuechi authored May 18, 2024

C908:
vp9_avg4_8bpp_c: 1.2
vp9_avg4_8bpp_rvv_i64: 1.0
vp9_avg8_8bpp_c: 3.7
vp9_avg8_8bpp_rvv_i64: 1.5
vp9_avg16_8bpp_c: 14.7
vp9_avg16_8bpp_rvv_i64: 3.5
vp9_avg32_8bpp_c: 57.7
vp9_avg32_8bpp_rvv_i64: 10.0
vp9_avg64_8bpp_c: 229.0
vp9_avg64_8bpp_rvv_i64: 31.7
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>

0c1304ae

Revert "lavc/sbrdsp: R-V V neg_odd_64" · 7591eb40

Rémi Denis-Courmont authored May 13, 2024

While this function can easily be written with vectors, it just fails to
get any performance improvement.

For reference, this is a simpler loop-free implementation that does get
better performance than the current one depending on hardware, but still
more or less the same metrics as the C code:

 func ff_sbr_neg_odd_64_rvv, zve64x
         li      a1, 32
         addi    a0, a0, 7
         li      t0, 8
         vsetvli zero, a1, e8, m2, ta, ma
         li      t1, 0x80
         vlse8.v v8, (a0), t0
         vxor.vx v8, v8, t1
         vsse8.v v8, (a0), t0
         ret
 endfunc

This reverts commit d06fd18f.

7591eb40

lavc/vc1dsp: R-V V vc1_unescape_buffer · d452db84

Rémi Denis-Courmont authored May 12, 2024

Notes:
- The loop is biased toward no unescaped bytes as that should be most common.
- The input byte array is slid rather than the (8 times smaller) bit-mask,
  as RISC-V V does not provide a bit-mask (or bit-wise) slide instruction.
- There are two comparisons with 0 per iteration, for the same reason.
- In case of match, bytes are copied until the first match, and the loop is
  restarted after the escape byte. Vector compression (vcompress.vm) could
  discard all escape bytes but that is slower if escape bytes are rare.

Further optimisations should be possible, e.g.:
- processing 2 bytes fewer per iteration to get rid of a 2 slides,
- taking a short cut if the input vector contains less than 2 zeroes.
But this is a good starting point:

T-Head C908:
vc1dsp.vc1_unescape_buffer_c:      12749.5
vc1dsp.vc1_unescape_buffer_rvv_i32: 6009.0

SpacemiT X60:
vc1dsp.vc1_unescape_buffer_c:      11038.0
vc1dsp.vc1_unescape_buffer_rvv_i32: 2061.0

d452db84

checkasm: h264dsp: Avoid out of buffer writes when benchmarking · 60933671

Martin Storsjö authored May 21, 2024

The loop filters can write before the pointer given to them;
the actual test invocations correctly used an offset, while
the benchmark calls were lacking an offset. Therefore, when
running with benchmarking, these tests could have spurious
failures.
Signed-off-by: Martin Storsjö <martin@martin.st>

60933671

checkasm: print bench runs when benchmarking · d43e1238
Lynne authored May 21, 2024
```
Helps make sense of the possible noise in the results.
```
d43e1238

checkasm: add runs argument to adjust during bench · b1adf6d1

J. Dekker authored May 13, 2024

Some timers on certain device and test combinations can produce noisy
results, affecting the reliability of performance measurements. One
notable example of this is the Canaan K230 RISC-V development board.

An option to adjust the number of samples by an exponent (--runs) has
been added, allowing developers to increase the sample count for more
reliable results.
Signed-off-by: J. Dekker <jdek@itanimul.li>

b1adf6d1

checkasm: vvc_alf: Limit benchmarking to a reasonable subset of functions · a9dc7dd7

Martin Storsjö authored May 21, 2024

Don't benchmark every single combination of widths and heights;
only benchmark cases which are squares (like in vvc_mc.c).

Contrary to vvc_mc, which increases sizes by doubling dimensions,
vvc_alf tests all sizes in increments of 4. Limit benchmarking to
the cases which are powers of two.

This reduces the number of benchmarked cases from 3072 down to 18.

a9dc7dd7

Changelog: add DVB compatible information for VVC decoder · b8eb8b4f
Nuo Mi authored May 19, 2024
```
see https://dvb.org/specifications/verification-validation/vvc-test-content/
```
b8eb8b4f

avcodec/vvcdec: support Reference Picture Resampling · 1b33c9a5

Nuo Mi authored May 19, 2024

passed clips:
    RPR_A_Alibaba_4.bit
    RPR_B_Alibaba_3.bit
    RPR_C_Alibaba_3.bit
    RPR_D_Qualcomm_1.bit
    VVC_HDR_UHDTV1_OpenGOP_Max3840x2160_50fps_HLG10_res_change_with_RPR.ts

1b33c9a5

avcodec/vvcdec: increase edge_emu_buffer for RPR · cae0b012
Nuo Mi authored May 19, 2024

cae0b012
avcodec/vvcdec: refact, remove hf_idx and vf_idx from mc_xxx's param list · 7904ec2d
Nuo Mi authored May 19, 2024

7904ec2d
avcodec/vvcdec: refact out luma_prof from luma_prof_bi · 77d971c3
Nuo Mi authored May 19, 2024

77d971c3
avcodec/vvcdec: fix dmvr, bdof, cb_prof for RPR · ac457559
Nuo Mi authored May 19, 2024

ac457559

avcodec/vvcdec: inter, wait reference with a different resolution · 77acd0a0

Nuo Mi authored May 19, 2024

For RPR, the current frame may reference a frame with a different resolution.
Therefore, we need to consider frame scaling when we wait for reference pixels.

77acd0a0

avcodec/vvcdec: add RPR dsp · deda59a9
Nuo Mi authored May 19, 2024

deda59a9
avcodec/vvcdec: emulated_edge, use reference frame's sps and pps · e70225e0
Nuo Mi authored May 19, 2024
```
a preparation for Reference Picture Resampling
```
e70225e0
avcodec/vvcdec: add vvc inter filters for RPR · aa8d5c6e
Nuo Mi authored May 19, 2024

aa8d5c6e
avcodec/vvcdec: refact, pred_get_refs return VVCRefPic instead of VVCFrame · 08ad51ec
Nuo Mi authored May 19, 2024

08ad51ec