Speed of repeated division method
In my previous post in this series, I described a method for choosing the FFT transform length when computing a full convolution. The method is based on the idea of choosing a length whose prime factors are no greater than 7.
To find such a length, I wrote some code that used repeated division to determine whether $N$ has factors greater than 7. The code would then simply increment $N$ until it found a suitable length.
function Np = fftTransformLength_repeated_division(N)
Np = N;
while true
% Divide n evenly, as many times as possible, by the factors 2, 3,
% 5, and 7.
r = Np;
for p = [2 3 5 7]
while (r > 1) && (mod(r, p) == 0)
r = r / p;
end
end
if r == 1
break
else
Np = Np + 1;
end
end
end
Because I was interested in 2-D applications with relatively small transform sizes (in the thousands), I didn’t think too much about the efficiency of this function. For 1-D applications with very long vectors, though, the time required to compute the transform size could be significant. For example, $N=8505001$ is prime, and the nearest larger integer with prime factors no larger than 7 is $N_p =8573040$ , which is 68039 integers away. Let’s see how much time that would take.
N = 8505001;
Np = fftTransformLength_repeated_division(N)
Np = 8573040
f = @() fftTransformLength_repeated_division(N);
timeit(f)
ans = 0.0322
Is 32 milliseconds a long time?
It is. It’s about a third of the time required for the corresponding FFT computation:
x = rand(1,N);
g = @() fft(x,Np);
timeit(g)
ans = 0.0946
Alternative computation methods
I’d like to thank Chris Turnes (MATLAB Math team developer at MathWorks; fellow Georgia Tech grad) and Cris Luengo (principal data scientist at Deepcell; creator of DIPlib) for sharing their thoughts with me about alternative computation methods. Chris taught me how to construct a lookup table containing all integers (up to a specified maximum value) whose prime factors are no larger than 7. He suggested that such a table would have only a modest size, even for very large maximum transform sizes. Cris pointed out that OpenCV uses the lookup table approach with a binary search and that PocketFFT uses only multiplications and divisions by 2. Chris and Cris both did experiments that suggest there is value in using these other methods.
Constructing the lookup table
Here is a function that lists all the integers, up to some maximum value, whose prime factors are no greater than 7. The implementation concept here is from Chris Turnes.
function P = generateLookupTable(N)
P = 2.^(0:ceil(log2(N)));
P = P(:) * 3.^(0:ceil(log(N)/log(3)));
P = P(:) * 5.^(0:ceil(log(N)/log(5)));
P = P(:) * 7.^(0:ceil(log(N)/log(7)));
% Sort and trim P so that it contains only values up to N. Add 0 to the
% set of values.
P = sort(P(:));
i = find(P >= N, 1, "first");
P = [0 ; P(1:i)];
end
Let’s find all of the desirable transform lengths up to approximately a billion.
P = generateLookupTable(2^30);
size(P)
ans = 1x2
5261 1
How are those values spread out?
plot(diff(P))
title("Distance between adjacent values of P")
Performance of the lookup-table method
Now let’s try using a simple search of P
to implement an alternative method to find a good transform length.
function Np = fftTransformLength_lookup_table(N,P)
Np = P(find(P >= N, 1, "first"));
end
How long does this take for the example above, $N=8505001$ ?
N = 8505001;
P = generateLookupTable(1e9);
h = @() fftTransformLength_lookup_table(N,P);
timeit(h)
ans = 7.1154e-06
Further optimizations could be explored, but already it takes only a few microseconds, about 4,500 times faster than the repeated division method for this particular case.
Let’s check performance for a large power of two, which I would expect to be a much better case for the repeated division method.
N = 2^30;
timeit(@() fftTransformLength_repeated_division(N))
Warning: The measured time for F may be inaccurate because it is running too fast. Try measuring something that takes longer.
ans = 0
timeit(@() fftTransformLength_lookup_table(N,P))
ans = 7.4774e-06
The repeated division method is so fast, in this case, that the time required is not measurable.
However, I am inclined to always use the lookup table method, and I have updated my fftTransformLength
function (GitHub link, File Exchange link) to do that.
MathWorks enhancement request: Add an FFT transform length function
I have submitted an enhancement request to MathWorks to add a function for computing good FFT transform lengths, as described in this series.