Using the Apple FFT and Accelerate Framework

By | November 23, 2017

Has anybody used the Apple FFT for an iPhone app yet or know where I might find a sample application as to how to use it? I know that Apple has some sample code posted, but I’m not really sure how to implement it into an actual project.


I just got the FFT code working for an iPhone project:

  • create a new project
  • delete all the files except for main.m and xxx_info.plist
  • going to project settings and search for pch and stop it from trying to load a .pch (seeing as we have just deleted it)
  • copy paste the code example over whatever you have in main.m
  • remove the line that #include’s Carbon. Carbon is for OSX.
  • delete all the frameworks, and add accelerate framework

You might also need to remove an entry from info.plist that tells the project to load a xib, but I’m 90% sure you don’t need to bother with that.

NOTE: Program outputs to console, results come out as 0.000 that’s not an error –- it’s just very very fast

This code is really stupidly obscure; it is generously commented, but the comments don’t actually make life any easier.

Basically at the heart of it is:

vDSP_fft_zrip(setupReal, &A, stride, log2n, FFT_FORWARD);
vDSP_fft_zrip(setupReal, &A, stride, log2n, FFT_INVERSE);

FFT on n real floats, and then reverse to get back to where we started.
ip stands for in-place, which means &A gets overwritten
That’s the reason for all this special packing malarkey — so that we can squash the return value into the same space as the send value.

To give some perspective (like, as in: why would we be using this function in the first place?), Let’s say we want to perform pitch detection on microphone input, and we have set it up so that some callback gets triggered every time the microphone gets in 1024 floats. Supposing the microphone sampling rate was 44.1kHz, so that’s ~44 frames / sec.

So, our time-window is whatever the time duration of 1024 samples is, ie 1/44 s.

So we would pack A with 1024 floats from the mic, set log2n=10 (2^10=1024), precalculate some bobbins (setupReal) and:

vDSP_fft_zrip(setupReal, &A, stride, log2n, FFT_FORWARD);

Now A will contain n/2 complex numbers. These represent n/2 frequency bins:

  • bin[1].idealFreq = 44Hz — ie The lowest frequency we can reliably detect is ONE complete wave within that window, ie a 44Hz wave.

  • bin[2].idealFreq = 2 * 44Hz

  • etc.

  • bin[512].idealFreq = 512 * 44Hz — The highest frequency we can detect (known as the Nyquist frequency) is where every pair of points represents a wave, ie 512 complete waves within the window, ie 512 * 44Hz, or: n/2 * bin[1].idealFreq

  • Actually there is an extra Bin, Bin[0] which is often referred to as ‘DC Offset’. It so happens that Bin[0] and Bin[n/2] will always have complex component 0, so A[0].realp is used to store Bin[0] and A[0].imagp is used to store Bin[n/2]

And the magnitude of each complex number is the amount of energy vibrating around that frequency.

So, as you can see, it wouldn’t be a very great pitch detector as it doesn’t have nearly fine enough granularity. There is a cunning trick Extracting precise frequencies from FFT Bins using phase change between frames to get the precise frequency for a given bin.

Ok, Now onto the code:

Note the ‘ip’ in vDSP_fft_zrip, = ‘in place’ ie output overwrites A (‘r’ means it takes real inputs)

Look at the documentation on vDSP_fft_zrip,

Real data is stored in split complex
form, with odd reals stored on the
imaginary side of the split complex
form and even reals in stored on the
real side.

this is probably the hardest thing to understand. We are using the same container (&A) all the way through the process. so in the beginning we want to fill it with n real numbers. after the FFT it is going to be holding n/2 complex numbers. we then throw that into the inverse transform, and hopefully get out our original n real numbers.

now the structure of A its setup for complex values. So vDSP needs to standardise how to pack real numbers into it.

so first we generate n real numbers: 1, 2, …, n

for (i = 0; i < n; i++)
    originalReal[i] = (float) (i + 1);

Next we pack them into A as n/2 complex #s:

// 1. masquerades n real #s as n/2 complex #s = {1+2i, 3+4i, ...}
// 2. splits to 
//   A.realP = {1,3,...} (n/2 elts)
//   A.compP = {2,4,...} (n/2 elts)
          (COMPLEX *) originalReal, 
          2,                            // stride 2, as each complex # is 2 floats
          1,                            // stride 1 in A.realP & .compP
          nOver2);                      // n/2 elts

You would really need to look at how A is allocated to get this, maybe look up COMPLEX_SPLIT in the documentation.

A.realp = (float *) malloc(nOver2 * sizeof(float));
A.imagp = (float *) malloc(nOver2 * sizeof(float));

Next we do a pre-calculation.

Quick DSP class for maths bods:
Fourier theory takes a long time to get your head around (I’ve been looking at it on and off for several years now)

A cisoid is:

z = exp(i.theta) = cos(theta) + i.sin(theta)

i.e. a point on the unit circle in the complex plane.

When you multiply complex numbers, the angles add. So z^k will keep hopping around the unit circle; z^k can be found at an angle k.theta

  • Choose z1 = 0+1i, i.e. a quarter turn from the real axis, and notice that z1^2 z1^3 z1^4 each give another quarter turn so that z1^4 = 1

  • Choose z2 = -1, i.e. a half-turn. also z2^4 = 1 but z2 has completed 2 cycles at this point (z2^2 is also = 1). So you could think of z1 as the fundamental frequency and z2 as the first harmonic

  • Similarly z3 = the ‘three-quarters of a revolution’ point i.e. -i completes exactly 3 cycles, but actually going forwards 3/4 each time is the same as going backwards 1/4 each time

i.e. z3 is just z1 but in the opposite direction — It’s called aliasing

z2 is the highest meaningful frequency, as we chose 4 samples to hold a full wave.

  • z0 = 1+0i, z0^(anything)=1, this is DC offset

You can express any 4-point signal as a linear combination of z0 z1 and z2
i.e. You’re projecting it onto these basis vectors

but I hear you asking “what does it mean to project a signal onto a cisoid?”

You can think of it this way: The needle spins round the cisoid, so at sample k, the needle is pointing in direction k.theta, and the length is signal[k]. A signal that matches the frequency of the cisoid exactly will bulge the resulting shape in some direction. So if you add up all the contributions, you’ll get a strong resultant vector.
If the frequency nearly matches, than the bulge will be smaller and will move slowly around the circle.
For a signal that doesn’t match frequency, the contributions will cancel one another out. will help you get an intuitive understanding.

But the gist is; if we have chosen to project 1024 samples onto {z0,…,z512} we would have precalculate z0 thru z512, and that’s what this precalculation step is.

Note that if you are doing this in real code you probably want to do this once when the app loads and call the complementary release function once when it quits. DON’T do it lots of times — it is expensive.

// let's say log2n = 8, so n=2^8=256 samples, or 'harmonics' or 'terms'
// if we pre-calculate the 256th roots of unity (of which there are 256) 
// that will save us time later.
// Note that this call creates an array which will need to be released 
// later to avoid leaking
setupReal = vDSP_create_fftsetup(log2n, FFT_RADIX2);

It’s worth noting that if we set log2n to eg 8, you can throw these precalculated values into any fft function that uses resolution <= 2^8. So (unless you want ultimate memory optimisation) just create one set for the highest resolution you’re going to need, and use it for everything.

Now the actual transforms, making use of the stuff we just precalculated:

vDSP_fft_zrip(setupReal, &A, stride, log2n, FFT_FORWARD);

At this point A will be containing n/2 complex numbers, only the first one is actually two real numbers (DC offset, Nyquist #) masquerading as a complex number. The documentation overview explains this packing. It is quite neat — basically it allows the (complex) results of the transform to be packed into the same memory footprint as the (real, but weirdly packaged) inputs.

vDSP_fft_zrip(setupReal, &A, stride, log2n, FFT_INVERSE);

and back again… we will still need to unpack our original array from A. then we compare just to check that we have got back exactly what we started out with, release our precalculated bobbins and done!

But wait! before you unpack, there is one final thing that needs to be done:

// Need to see the documentation for this one...
// in order to optimise, different routines return values 
// that need to be scaled by different amounts in order to 
// be correct as per the math
// In this case...
scale = (float) 1.0 / (2 * n);

vDSP_vsmul(A.realp, 1, &scale, A.realp, 1, nOver2);
vDSP_vsmul(A.imagp, 1, &scale, A.imagp, 1, nOver2);


Here’s a real-world example: A snippet of c++ that uses Accelerate’s vDSP fft routines to do autocorrelation on the Remote IO audio unit’s input. Using this framework is pretty complicated, but the documentation isn’t too bad.

OSStatus DSPCore::initialize (double _sampleRate, uint16_t _bufferSize) {
    sampleRate = _sampleRate;
    bufferSize = _bufferSize;
    peakIndex = 0;
    frequency = 0.f;
    uint32_t maxFrames = getMaxFramesPerSlice();
    displayData = (float*)malloc(maxFrames*sizeof(float));
    bzero(displayData, maxFrames*sizeof(float));
    log2n = log2f(maxFrames);
    n = 1 << log2n;
    assert(n == maxFrames);
    nOver2 = maxFrames/2;
    A.realp = (float*)malloc(nOver2 * sizeof(float));
    A.imagp = (float*)malloc(nOver2 * sizeof(float));
    FFTSetup fftSetup = vDSP_create_fftsetup(log2n, FFT_RADIX2);

    return noErr;

void DSPCore::Render(uint32_t numFrames, AudioBufferList *ioData) {

    bufferSize = numFrames;
    float ln = log2f(numFrames);

    //vDSP autocorrelation

    //convert real input to even-odd
    vDSP_ctoz((COMPLEX*)ioData->mBuffers[0].mData, 2, &A, 1, numFrames/2);
    memset(ioData->mBuffers[0].mData, 0, ioData->mBuffers[0].mDataByteSize);
    vDSP_fft_zrip(fftSetup, &A, 1, ln, FFT_FORWARD);

    // Absolute square (equivalent to mag^2)
    vDSP_zvmags(&A, 1, A.realp, 1, numFrames/2);
    bzero(A.imagp, (numFrames/2) * sizeof(float));    

    // Inverse FFT
    vDSP_fft_zrip(fftSetup, &A, 1, ln, FFT_INVERSE);

    //convert complex split to real
    vDSP_ztoc(&A, 1, (COMPLEX*)displayData, 2, numFrames/2);

    // Normalize
    float scale = 1.f/displayData[0];
    vDSP_vsmul(displayData, 1, &scale, displayData, 1, numFrames);

    // Naive peak-pick: find the first local maximum
    peakIndex = 0;
    for (size_t ii=1; ii < numFrames-1; ++ii) {
        if ((displayData[ii] > displayData[ii-1]) && (displayData[ii] > displayData[ii+1])) {
            peakIndex = ii;

    // Calculate frequency
    frequency = sampleRate / peakIndex + quadInterpolate(&displayData[peakIndex-1]);

    bufferSize = numFrames;

    for (int ii=0; ii<ioData->mNumberBuffers; ++ii) {
        bzero(ioData->mBuffers[ii].mData, ioData->mBuffers[ii].mDataByteSize);


While I will say Apple’s FFT Framework is fast… You need to know how an FFT works in order to get accurate pitch detection (i.e. calculating the phase difference on each successive FFT in order to find the exact pitch, not the pitch of the most dominate bin).

I don’t know if it’s of any help, but I uploaded my Pitch Detector object from my tuner app ( There is an example xCode 4 project for download also (so you can see how the implementation works).

I’m working on uploading an example FFT implementation — so stay tuned and I’ll update this once that happens.

Happy coding!


Here is an another real-world example:

Leave a Reply

Your email address will not be published. Required fields are marked *