I am currently trying to improve the performance of my multithreaded FFTW implementation. In the documentation of fftw3 I read that for best-possible performance, the
fftw_malloc function should be used to allocate in- and output data of the DFT.
Since I am dealing with large 3D arrays of size 256*256*256, I have to create them on the heap with
const unsigned int RES = 256; std::complex<double>(*V)[RES][RES]; V = new std::complex<double>[RES][RES][RES];
And after initialization, I create multithreaded (in-place) fftw_plans for the 3D DFT transforms according to
int N_Threads = omp_get_max_threads(); fftw_init_threads(); fftw_plan_with_nthreads(N_Threads); fftw_complex *input_V = reinterpret_cast<fftw_complex*>(opr.V); fftw_plan FORWARD_V = fftw_plan_dft_3d(RES, RES, RES, input_V, input_V, FFTW_FORWARD, FFTW_MEASURE); fftw_plan BACKWARD_V = fftw_plan_dft_3d(RES, RES, RES, input_V, input_V, FFTW_BACKWARD, FFTW_MEASURE);
My question now is: How do I create these plans using
fftw_malloc instead ?
In the fftw3 documentation I can only find
fftw_complex *in; in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
which I understand as a 1D example. Do I have to project my 3D array or is the use of
fftw_malloc not possible/advisable in this case?
Read more here: Source link