Here’s a fairly fast algorithm for detecting beats, and finding the tempo (the beats-per-minute, or “BPM”) of music.

From top to bottom:

1). Rainbow: Spectral analysis of the track. I’m using Balkansky & Loop Stepwalker – Cicatriz for this example, because it’s noisy and highly compressed, presenting a good challenge to the algorithm.

2). Red: The differences between adjacent spectral frames. This should help isolate the moments when drum sounds (transients) pop. Pseudo-code: *abs( frame[x][y] – frame[x-1][y] )*

3). Cyan: The total energy of each frame in (2), with heavier weights applied to higher frequencies. This attempts to compensate for equal-loudness effects: Higher frequencies are typically perceived as being “louder” than lower frequencies at the same amplitude. So, most music is mixed with higher frequencies having less spectral energy. (You can see this in layer (1): The spectral analysis shows the bass range (near the top) is much brighter than the treble).

4). Blurry cyan: Layer (3) is smeared using convolution. This smooths out any spectral artifacts and transients.

5). Yellow: Layer (4) is subtracted from layer (3). Only positive values are lit. Energetic spikes stand out amidst the surrounding noise of their neighbors.

6). Magenta: Searches for the greatest energies in layer (5). These are the beats! Frames that surround the beats are zeroed-out, ineligible for beat detection. The search is repeated until all beats above 10% of the maximum spike’s energy are identified. Congratulations! You have found the beats.

Now the trick is to determine the tempo and rhythm. My solution was to begin at the strongest beat. Then measure the distance forward to the next beat. Using this distance, repeatedly step forward through the landscape of beats, and compare the two “measures” to see if the rhythms correlate.

Then run this again with a distance of 2 beats forward; then 3 beats; etc.

In this enthusiastic screenshot, a very strong rhythm has emerged when the program steps a distance of 9 beats forward (or backward) from the strongest beat. The rhythm of “Cicatriz” is largely composed of 8 quick hihats (sixteenth notes), followed by 1 big half note snare, so this result seems to make sense.

And there it is! You have found the rhythm. Divide the sampling rate (44100) by this distance to get the tempo. Actually, the resulting number will be very small; multiply by 16, 32, 64… until a reasonable BPM emerges. (“Cicatriz” is 70bpm.)

Source code is here. This proof of concept was developed in Flash, which is a decent tool for rapid prototyping, and it comes with a huge library of classes out of the box. This comes with absolutely no warranty! I recommend not using this in mission-critical situations. The Queen Mary plug-in set may be a more robust choice.

wow, so its basicly just a data conversion/cleaning on a fft? or do you need wavelets/gabor transform?

I just used FFT for the spectral analysis.