Next: How Many Threads to Use?, Previous: Installation and Supported Hardware/Software, Up: Multi-threaded FFTW [Contents][Index]
Here, it is assumed that the reader is already familiar with the usage of the uniprocessor FFTW routines, described elsewhere in this manual. We only describe what one has to change in order to use the multi-threaded routines.
First, programs using the parallel complex transforms should be linked
with -lfftw3_threads -lfftw3 -lm
on Unix, or -lfftw3_omp
-lfftw3 -lm
if you compiled with OpenMP. You will also need to link
with whatever library is responsible for threads on your system
(e.g. -lpthread
on GNU/Linux) or include whatever compiler flag
enables OpenMP (e.g. -fopenmp
with gcc).
Second, before calling any FFTW routines, you should call the function:
int fftw_init_threads(void);
This function, which need only be called once, performs any one-time initialization required to use threads on your system. It returns zero if there was some error (which should not happen under normal circumstances) and a non-zero value otherwise.
Third, before creating a plan that you want to parallelize, you should call:
void fftw_plan_with_nthreads(int nthreads);
The nthreads
argument indicates the number of threads you want
FFTW to use (or actually, the maximum number). All plans subsequently
created with any planner routine will use that many threads. You can
call fftw_plan_with_nthreads
, create some plans, call
fftw_plan_with_nthreads
again with a different argument, and
create some more plans for a new number of threads. Plans already created
before a call to fftw_plan_with_nthreads
are unaffected. If you
pass an nthreads
argument of 1
(the default), threads are
disabled for subsequent plans.
You can determine the current number of threads that the planner can use by calling:
int fftw_planner_nthreads(void);
With OpenMP, to configure FFTW to use all of the currently running
OpenMP threads (set by omp_set_num_threads(nthreads)
or by the
OMP_NUM_THREADS
environment variable), you can do:
fftw_plan_with_nthreads(omp_get_max_threads())
. (The ‘omp_’
OpenMP functions are declared via #include <omp.h>
.)
Given a plan, you then execute it as usual with
fftw_execute(plan)
, and the execution will use the number of
threads specified when the plan was created. When done, you destroy
it as usual with fftw_destroy_plan
. As described in
Thread safety, plan execution is thread-safe, but plan
creation and destruction are not: you should create/destroy
plans only from a single thread, but can safely execute multiple plans
in parallel.
There is one additional routine: if you want to get rid of all memory and other resources allocated internally by FFTW, you can call:
void fftw_cleanup_threads(void);
which is much like the fftw_cleanup()
function except that it
also gets rid of threads-related data. You must not execute any
previously created plans after calling this function.
We should also mention one other restriction: if you save wisdom from a
program using the multi-threaded FFTW, that wisdom cannot be used
by a program using only the single-threaded FFTW (i.e. not calling
fftw_init_threads
). See Words of Wisdom—Saving Plans.
Finally, FFTW provides a optional callback interface that allows you to replace its parallel threading backend at runtime:
void fftw_threads_set_callback( void (*parallel_loop)(void *(*work)(void *), char *jobdata, size_t elsize, int njobs, void *data), void *data);
This routine (which is not threadsafe and should generally be called before creating
any FFTW plans) allows you to provide a function parallel_loop
that executes
parallel work for FFTW: it should call the function work(jobdata + elsize*i)
for
i
from 0
to njobs-1
, possibly in parallel. (The ‘data‘ pointer
supplied to fftw_threads_set_callback
is passed through to your parallel_loop
function.) For example, if you link to an FFTW threads library built to use POSIX threads,
but you want it to use OpenMP instead (because you are using OpenMP elsewhere in your program
and want to avoid competing threads), you can call fftw_threads_set_callback
with
the callback function:
void parallel_loop(void *(*work)(char *), char *jobdata, size_t elsize, int njobs, void *data) { #pragma omp parallel for for (int i = 0; i < njobs; ++i) work(jobdata + elsize * i); }
The same mechanism could be used in order to make FFTW use a threading backend implemented via Intel TBB, Apple GCD, or Cilk, for example.
Next: How Many Threads to Use?, Previous: Installation and Supported Hardware/Software, Up: Multi-threaded FFTW [Contents][Index]