New entry in this series that I love explaining mathematical and engineering concepts without equations. This time we have to see, the Nyquist - Shannon Sampling Theorem.
For those of you who don’t know it, the sampling theorem was formulated by Nyquist in 1928, and proved by Shannon in 1949, and is one of the cornerstones of digital signal processing.
The best-known formulation of the theorem is that in order to reconstruct a sampled signal, the sampling frequency must be greater than twice the bandwidth.
And now normally a mathematical explanation would come to prove this. Which, precisely, is NOT the purpose of this post, but quite the opposite.
Don’t get me wrong, I love equations. But if to explain something they need two pages of math, maybe your teacher didn’t quite understand it (and you run the risk of forgetting it in a flash).
So, what’s all this mystique and endless articles about the “sampliiiing theoooreeem”. But, before we get to that, we need to start by remembering what a sampled signal is.
Sampled signals
Suppose a “real” signal from the physical world, such as an audio or electromagnetic signal. In general, these “real” signals are variations in the measurement of a magnitude. Except for a few (and rare) exceptions, physical signals are always analog.
In this context, away from the chalkboard and the math book, by “real” and analog we mean non-continuous, non-periodic signals, with infinite detail and even a bit of noise. A true signal party, let’s say.
For example, the following is an audio signal captured with a microphone.
If I want to digitally store this “real” analog signal, it doesn’t fit. I would need infinite memory to store its infinite level of detail. So I have to somehow simplify it. And simplifying, almost always, means losing information.
The simplest and most intuitive way to digitize our “real” signal is to sample it, that is, take measurements at regular intervals and store them. This list of numbers is our digital signal.
Then, if I have to reconstruct the analog signal from the data, I have various ways, like “connecting the dots,” or more or less sophisticated interpolations, depending on how much I want to kill my brain and how much computing power I have.
An obvious characteristic of the sampling process is that the closer the points are, that is, the higher my sampling frequency, the more details of the signal I will capture.
And conversely, if I separate the points more and more, I will end up with a potato of a digital signal, because if my sampling frequency is much lower than that of the signal, I lose a lot of information. Between two points there could be a peak, a smooth curve, 20 oscillations… a camel, an elephant, anything (which, on the other hand, may or may not interest me).
This is what worried Nyquist, Shannon (and others) 100 years ago, and that’s what the sampling theorem is all about, how the quality of the sampled signal is affected by the separation between the points.
Signal spectrum
Our “real” signal is continuous, infinitely detailed, and, moreover, invented. I’ll draw one, and you’ll draw another different one. So we won’t be able to draw many conclusions like this.
To draw relevant theoretical conclusions, it is convenient to move to the frequency domain, that is, to view the signal in an alternative representation as a sum of sine waves.
The frequency domain is something that seems difficult to understand… but everyone understands the car radio equalizer without any problems. Well, it’s the same thing, considering the signal formed by waves of different frequencies, low, high…
To transform the signal from the time domain to the frequency domain, we use the Fourier integral transform. In particular, we will use some of its versions like the DFT (Discrete Fourier Transform) and the FFT algorithm (Fast Fourier Transform). But, for our explanation, it’s the same thing.
This transform allows us to treat (or see, or conceive) our signal in another way, but without changing it (that’s why it’s a transform, not a change). Simply, instead of treating it as a function of the magnitude of the signal at each instant of time, now we treat it as a function that encodes the amplitude of its frequency components.
By passing our signal to the frequency domain, we obtain this function which we will call the spectrum of the function. If our “real” function was continuous, non-periodic, blah blah blah…, its spectrum will also be continuous, non-periodic, with noise, blah blah blah.
Not very funny, right? Then why have we done all this? Because now, if we can know what happens to a sinusoidal signal of a certain frequency when we sample it, we can deduce what will happen to the “real” signal (or at least part of it) when we sample it.
So come on, let’s take our signal from the previous point and do a Fourier transform and see… that I’ve fooled you. In reality, I generated it with the sum of four sine signals, with invented frequencies and amplitudes.
Well, what did you expect, I already told you that you can’t store a “real” signal. And it has allowed us to visualize its frequency components. But keep in mind that in a “real” signal its spectrum would be continuous, and instead of four points, we would have a continuous curve.
Representation using phasors
The previous section has served to conclude that we are going to work with the signal in its frequency domain. In this way, we only need to work with sine signals to draw valid conclusions about how sampling affects “real” signals.
Now we need one more tool, which is the treatment of sine signals through their phasor representation. And what the heck is this now? A ball that spins, but the names don’t help much to make this seem simple.
You may remember from high school that a sine wave (sine, cosine) is actually the projection of a point that is rotating around a center. In fact, every time a phenomenon appears in nature that follows a sinusoidal pattern, it is because it is related to “something” that is rotating, and you are looking at it wrong. But that’s another debate.
This is a phasor representation and simplifies working with sine signals a lot. The radius of rotation of the phasor with respect to the center is the amplitude of the wave, and the frequency of the sine wave is the frequency at which the phasor rotates, which is equivalent to its rotational speed.
With this in mind, we have enough tools to understand the sampling theorem.
The sampling problem
Recentering the issue, the sampling problem can be summarized roughly as knowing if we can reconstruct a signal from the sampled “points,” or at least what consequences the process has on the sampled signal compared to the original.
By moving to the frequency domain, the problem has become, given that set of points, can you tell me the sine signals that best fit those set of points? Darn… It’s even worse than before! How am I supposed to know that?
No, it’s not that difficult. If we work in phasors, the question ends up being, can you tell me for each of the following “balls”, its frequency and its radius of rotation? And that’s very easy, because each “ball” rotates at a different speed.
Since I can easily discriminate each “ball” (frequency component) simply because they rotate at different speeds, I can focus the analysis on only one frequency, and the rest of the analysis will be similar for all components.
So now I only care about, for example, the red ball. After sampling, I have “static images” (samples) of the ball in its movement. We just have to calculate its angular velocity to determine the frequency component.
And that’s very easy! Or not? Hint, if it were that easy, we wouldn’t be here. This is about to get interesting.
Fundamental harmonics
First difficulty, but not too problematic. You see the following “ball” (let’s call it a phasor from now on, okay?) rotating, and calculating the rotational speed is trivial. Right?
Well, yes and no. Because you only have two fixed images, and you don’t know if between them, the phasor has made 10, 20, or three million turns (remember? a camel, an elephant…) In all cases, the phasor would be in the same place in the second photo.
That’s a first problem of sampling, that you have an uncertainty about which harmonic you are representing. In general, it’s not a big problem because you know that you are sampling at a frequency Wm, so any higher frequency is “reasonable” to assume that you will lose it.
But it’s important that you have in mind for the following sections, that the frequency (speed) that I’m going to assume you want when we do the reconstruction is the lowest of the possible ones.
Explanation of the sampling theorem
We have finally reached the “heart” of the entry. What happens when we increase the frequency of the component in relation to the sampling frequency? (either because we increase the frequency of the component, or because we reduce the sampling frequency).
Well, for now no problems. Let’s increase the rotational speed of the phasor a bit more, until it passes the x-axis. What will happen then?
Wow! You expected the green arrow to get a little bigger, but instead a blue one has appeared. Oh, I didn’t mention that phasors can rotate in both directions? Well, yes, they rotate in both directions.
And remember that, from the previous section, I told you that I was going to tell you the slowest speed of all the possible ones. As soon as it crosses the x-axis, I think it’s rotating in the opposite direction.
Finally, we have arrived at our demonstration without equations. If you want me to reconstruct a frequency, I need to have at least two samples on the same side of the circle. That is, its frequency must be less than half of my sampling frequency.
Equivalently, this means that I cannot reconstruct frequency components higher than half the sampling frequency, which is exactly what we wanted to demonstrate.
Corollary, an even simpler way to explain the theorem is that the frequency I can reconstruct is half the sampling frequency because I don’t know in which direction it rotates. There you have it.
Aliasing
Did you think this was over? There’s still a little bit left. Now it’s time to talk about another consequence, less known and much less understood of the sampling theorem, the terrible and horrible Aliasing.
I’m sorry to say that I’ve lied to you half of the entry. Do you remember that I told you I was only going to give you the lowest frequency component? Well, it’s a lie, I’m always going to give you both directions of rotation.
Remember the first example of the phasor, where it was very easy to determine the frequency. Let’s draw it together with its spectrum.
In reality, I’m not only going to have a green peak, but I’m also going to have a blue one, which corresponds to the frequency rotating in the other direction. This “mirror” signal is called “alias,” and is what gives rise to the phenomenon of Aliasing.
You may have heard the term from the “anti-aliasing” filter, which frequently appears in 3D graphics (for example, in video games), and which tries to solve a problem related to the sampling theorem.
This peak doesn’t seem very problematic, because it’s above half the sampling frequency. And, as we are smart people, and thanks to the genius of Nyquist and Shannon, I know that everything above that is “garbage” and I won’t even look at it when reconstructing the signal.
I wish it were that easy. But look at what happens with a noise “slightly above” half the sampling frequency, which appears as a high frequency in my sampled digital signal. And I can’t distinguish it from the “real” data!
It’s even worse if it approaches the sampling signal, because it appears as low frequency components in my digital signal. And it’s worse because, usually, at low frequencies I have a lot of the “good stuff” from my signal.
But not only that, also all the harmonics of the frequencies higher than half the sampling frequency will generate Aliasing.
In general, when I sample a signal, I’m sampling the “mirror” of its entire spectrum and all its harmonics. And all that signal I’m “eating” and I have no way of distinguishing it from the “real” data.
That’s why, always always always, before sampling a signal you have to filter out the frequencies higher than half the sampling frequency, to prevent all that information from sneaking into the recorded spectrum.
Conclusion
A bit long? I hope it didn’t get too boring, and that you’ve seen a different way to understand the sampling theorem, without the need for equations (very beautiful, by the way).
The sampling theorem is applicable in multiple fields, typically in telecommunications and audio, but also in many other less obvious areas such as image processing (yes, there are 2D and 3D FFTs), oscilloscopes, measuring instruments, and in any application that involves digital manipulation of signals.
So let’s thank the geniuses of Nyquist, Shannon and other contributors to the theory of digital information processing for their work and see you in the next entry!