-
Christophe Gisquet authored
This is done by padding the coefficient buffer with 0s, because the order may be only a multiple of 4, and the DSP function requires batches of 8. However, no sample with such a case was found, so request one if it uses that kind of order. Approximate relative speedup depending on instruction set: plain C: -6% mmxext: 51% sse2: 54% Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
adf4ee40