• Lynne's avatar
    x86/tx_float: add 15xN PFA FFT AVX SIMD · ace42cf5
    Lynne authored
    ~4x faster than the C version.
    The shuffles in the 15pt dim1 are seriously expensive. Not happy with it,
    but I'm contempt.
    
    Can be easily converted to pure AVX by removing all vpermpd/vpermps
    instructions.
    ace42cf5
tx_template.c 62.7 KB