I am trying to compile FFTW3 to run on ARM Neon (More precisely, on a Cortex a-53). The build env is x86_64-pokysdk-lunix, The host env is aarch64-poky-lunix. I am using the aarch64-poky-linux-gcc compiler.
I used the following command at first:
./configure --prefix=/build_env/neon/neon_install_8 --host=aarch64-poky-linux --enable-shared --enable-single --enable-neon --with-sysroot=/opt/poky/2.5.3/sysroots/aarch64-poky-linux "CC=/opt/poky/2.5.3/sysroots/x86_64-pokysdk-linux/usr/bin/aarch64-poky-linux/aarch64-poky-linux-gcc -march=armv8-a+simd -mcpu=cortex-a53 -mfloat-abi=softfp -mfpu=neon"
The compiler did not support the
-mfloat-abi=softfp and the
-mfpu=neon. It also did not let me define the path to the sysroot this way.
Then used the following command:
./configure --prefix=/build_env/neon/neon_install_8 --host=aarch64-poky-linux --enable-shared --enable-single --enable-neon "CC=/opt/poky/2.5.3/sysroots/x86_64-pokysdk-linux/usr/bin/aarch64-poky-linux/aarch64-poky-linux-gcc" "CFLAGS=--sysroot=/opt/poky/2.5.3/sysroots/aarch64-poky-linux -mcpu=cortex-a53 -march=armv8-a+simd"
This command succeeded with this config log and this config.h. Then I used the command
make install. I then copied my shared library file into my host env and used
fftwf_ instead of
fftw_ in my code base. The final step was to recompile the program. I ran a test and compared the times for both algorithm using
<sys/resource.h>. I also used the
fftw[f]_forget_wisdom() on both algorithms so that It can be fair. However, I am not getting a speedup. I believe that using an SIMD architecture (
NEON in our case) would accelerate the FFTW library.
I would really appreciate if anyone can point out something that I am doing wrong so that I can try a fix and see if I can get the performance boost I am looking for.
Read more here: Source link