-
Rémi Denis-Courmont authored
This adds runtime support to use Zbb REV8 for 32- and 64-bit byte-wise swaps. The result is about five times slower than if targetting Zbb statically, but still a lot faster than the default bespoke C code or a call to GCC run-time functions. For 16-bit swap, this is however unsurprisingly a lot worse, and so this sticks to the baseline. In fact, even using REV8 statically does not seem to be beneficial in that case. Zbb static Zbb dynamic I baseline bswap16: 0.668184765 3.340764069 0.668029012 bswap32: 0.668174014 3.340763319 9.353855435 bswap64: 0.668221765 3.340496313 14.698672283 (seconds for 1 billion iterations on a SiFive-U74 core)
324899b7