Shazam
The Algorithm behind Shazam: A Nerdy Look at Music Recognition
Time, as it grows old, teaches all things. -Aeschylus
Shazam is an application that identifies the music used in movies, advertising, and television shows based on a short sample played. In this article, I will explain the technology Shazam uses to make its audio discovery software work.
How does Shazam Work?
Shazam identifies songs through something called an audio/acoustic fingerprint and a Spectogram. Now to explain these words.
What is an audio/acoustic fingerprint?7
An audio/acoustic fingerprint is a condensed digital summary that is generated by an audio signal. An audio signal is a representation of sound typically using either a changing level of electricity voltage for analog signals or a series of binary numbers for digital signals.
In the case of Shazam, these audio signals are a series of binary numbers used to represent digital signals, these binary numbers can be sued to identify an audio sample or quickly locate similar items in an audio database (In case you aren’t aware, an audio database is a database for audio).
What is a Spectogram
A spectrogram is a graph representation of audio, each piece of audio is split into some segments over time, and from these audio segments, a graph is generated which plots 3 dimensions of audio -Frequency vs Intensity vs time.
To efficiently search for a sound you need to efficiently describe it and the way to do this is by using a spectrogram.
How does all this work in Shazam
We’ve successfully explained the technologies used in Shazam as single concepts, now let us understand how they work together cumulatively to make up Shazam.
When you ask Shazam to tell you information about a song like its name, author, etc, you give it an audio stream of the song in question via a microphone or some other audio input device. It represents the audio stream as a spectrogram, the shazam algorithm then picks out the peak point in the audio stream via the spectrogram graph representation -Peak points are points of less background noise.
The Shazam algorithm then creates an audio fingerprint from the peak point and then indexes through the audio database for a song with a similar audio fingerprint and when it finds a successful match it then returns its results to the user.
How Shazam updates its audio database
Going through this article you might have been able to infer that a core piece of technology behind the success of Shazam is its extensive audio database, to put it simply without an up to date audio database Shazam won’t efficiently meet the demands of its users and this will lead to a resulting loss in revenue. So how does Shazam keep its audio database updated?
They do this through Industry partnerships with companies who document music. Shazam gets these companies to document music for them and then uses the data it gets from these companies to improve its audio database.