Passing planes and other whoosh sounds

I always assumed that the recognisable 'whoosh' sound a plane or helicopter makes when passing overhead simply comes from the famous Doppler effect. But when you listen closely, this explanation doesn't make complete sense.

(Audio clipped from freesound - here and here)

A classic example of the Doppler effect is the sound of a passing ambulance constantly descending in pitch. When a plane flies overhead the roar of the engine sometimes does that as well. But you can also hear a wider, breathier noise that does something different: it's like the pitch goes down at first, but when the plane has passed us, the pitch goes up again. That's not how Doppler works! What's going on there?

Comb filtering.

Let's shed light on the mystery by taking a look at the sound in a time-frequency spectrogram. Here, time runs from top to bottom, frequencies from left (low) to right (high).

We can clearly see one part of the sound sweeping from right to left, or from high to low frequencies; this should be the Doppler effect. But there's something else happening on the left side.

The sound's frequency distribution seems to form a series of moving peaks and valleys. This resembles what audio engineers would call 'comb filtering', due to its appearance in the spectrogram. When the peaks and valleys move about it causes a 'whoosh' sound; this is the same principle as in the flanger effect used in music production. But these are just jargon for the electronically created version. We can call the acoustic phenomenon the whoosh.

The comb pattern is caused by two copies of the same exact sound arriving at a slightly different times, close enough that they form an interference pattern. It's closely related to what happens to light in the double slit experiment. In recordings this often means that the sound was captured by two microphones and then mixed together; you can sometimes hear this happen unintentionally in podcasts and radio shows. So my thought process is, are we hearing two copies of the plane's sound? How much later is the other one arriving, and why? And why does the 'whoosh' appear to go down in pitch at first, then up again?

Into the cepstral domain.

The cepstrum, which is the inverse Fourier transform of the estimated log spectrum, is a fascinating plot for looking at delays and echoes in complex (as in complicated) signals. While the spectrum separates frequencies, the cepstrum measures time, or quefrency – see what they did there? It reveals cyclicities in the sound's structure even if it interferes with itself, like in our case. In that it's similar to autocorrelation.

It's also useful for looking at sounds that, experientially, have a 'pitch' to them but that don't show any clear spectral peak in the Fourier transform. Just like the sound we're interested in.

Here's a time-quefrency cepstrogram of the same sound (to be accurate, I used the autocepstrum here for better clarity):

The Doppler effect is less prominent here. Instead, the plot shows a sweeping peak that seems to agree with the pitch change we hear. This delay time sweeps from around 4 milliseconds to 9 ms and back. Note that the scale: higher frequencies (shorter times) are on the left side this time.

Now why would the sound be so correlated with itself with this sweeping delay time?

Ground echo?

Here's my hypothesis. We are hearing not only the direct sound from the plane but also a delayed echo from a nearby flat surface. These two sound get superimposed and interfere before they reach our ears. The effect would be especially prominent with planes and helicopters because there is little in the way of the sound either from above or from the large surface. And what could be a large reflective surface outdoors? Well, the ground below!

Let's think about the numbers. The ground is around one-and-a-half metres below our ears. When a plane is directly overhead, the reflected sound needs to take a path that's three metres longer (two-way) than the direct path. Since sound travels 343 metres per second this translates to a difference of 9 milliseconds – just what we saw in the correlogram!

Below, I used GeoGebra to calculate the time difference (between the yellow and green paths) in milliseconds.

When the plane is far away the angle is shallower, the two paths are more similar in distance, and the time difference is shorter.

It would follow that a taller person hears the sound differently than a shorter one, or someone in a tenth-floor window! If the ground is very soft, maybe in a mossy grove, you probably wouldn't hear the effect at all; just the Doppler effect. But this prediction needs to be tested out in a real forest.

Here's what a minimal acoustic simulation model renders. We'll just put a flying white noise source in the sky and a reflective surface as the ground. Let's only update the IR at 15 fps to prevent the Doppler phenomenon from emerging.

Whoosh!

Some everyday whooshes.

The whoosh isn't only associated with planes. When it occurs naturally it usually needs three things:

  • a sound with a lot of structure (preferably a hissy or breathy noise)
  • an unobstructed echo from a closeby surface
  • and some kind of physical movement.

I've heard this outdoors when the sound of a waterfall was reflecting off a brick wall (video); and next to a motorway when the sound barrier provided the reflection. You can hear it in some films – for instance, in the original Home Alone when Kevin puts down the pizza box after taking a whiff (video)!

You can even hear it in the sound of thunder when lightning hits quite close. Nothing is physically moving in this case; but it might be caused because a 'bang' is created simultaneously along a very long path but sound only travels so fast.

Try it yourself: move your head towards a wall – or a laptop screen – and back away from it, while making a continuous 'hhhh' or 'shhh' noise. Listen closely but don't close your eyes, you might bump your nose.

Where have you encountered the whoosh?

A simple little plot.

Finally, if you have JavaScript turned on you'll see (and hear) some more stuff in this blog post. In the interactive graph below you can move the aeroplane and listener around and see how the numbers change. The 'lag' or time difference we hear (orange arrow) comes from how much farther away the reflected virtual image is compared to the real aeroplane. For instance, when it's right above, the copied sound travels 3 meters longer. In the lower right corner, the 'filter' spectrum up to 4.5 kHz is also drawn. The circles are there to visualize the direct distance.

FAQ

I get many questions that point out that planes have two of something: two engines, two ends in one engine, etc... This is a red herring for two reasons. 1) The sound is not just associated to jet engines or even planes at all; 2) The sounds would have to be nearly identical for interference to happen. Random wind noise can't created phase-coherent sounds from two independent sources. Some discussion in the comments below.

Ultrasonic investigations in shopping centres

I can't remember how I first came across these near-ultrasonic 'beacons' ubiquitous in PA systems. I might have been scrolling through the audio spectrum while waiting for the underground train; or it might have been the screeching 'tinnitus-like' sensation I would often get near the loudspeakers at a local shopping centre.

[Image: Graph with frequencies from 16 to 24 kHz and decibels from -30 to -110. Two peaks are shown, the lower one marked REDI around 19600 Hz and a higher one marked Forum at 20000 Hz.]

Whatever the case, I learned that they are called pilot tones. Many multi-loudspeaker PA systems (like the Zenitel VPA and Axys End of Line detection unit) employ these roughly 20-kilohertz tones to continuously measure the system's health status: no pilot tone means no connection to a loudspeaker. It's usually set to a very high frequency, inaudible to humans, to avoid disturbing customers.

However, these tones are powerful and some people will still hear them, especially if the frequency gets below 20 kHz. There is one such system at an uncomfortable 19.595 kHz in my city; it's marked green in the graph above. I've heard of several other people that also hear the sound. I don't believe it to be a sonic weapon like The Mosquito; those use even lower frequencies, down to 17 kHz. It's probably just a misconfiguration that was never fixed because the people working on it couldn't experientially confirm any issue with it.

Hidden modulation.

Pretty quickly it became apparent that this sound is almost never a pure tone. Some kind of modulation can always be seen wiggling around it in the spectrogram. Is it caused by the background music being played through the PA system? Is it carrying some information? Or is it something else altogether?

I've found at least one place where the tone appears to be amplitude modulated by the lowest frequencies in the music or commercials playing. It's probably a side effect of some kind of distortion.

Here's a spectrogram plot of this amplitude modulation around the strong pilot tone. It's colour-coded so that the purple colour is coming from the right microphone channel and green from the left. I'm not quite sure what the other purple horizontal stripes are here.

But this kind of modulation is rare. It's more common to see the tone change in response to things happening around you, like people moving about. More on that in the following.

Doppler-shifted backscatter.

Look what happens to the pilot tone when a train arrives at an underground station:

The wideband screech in the beginning is followed by this interesting tornado-shaped pattern that seems to have a lot of structure inside of it. It lasts for 15 seconds, until the train comes to a stop.

It's my belief that this is backscatter, also known as reverb, from the pilot tone reflecting off the slowing train and getting Doppler-shifted on the way. The pilot tone works as a continuous-wave bistatic sonar. Here, the left microphone (green) hears a mostly negative Doppler shift whereas the right channel (purple) hears a positive one, as the train is passing us from right to left. An anti-aliasing filter has darkened the higher frequencies as I wasn't yet aware I would find something interesting up here when recording.

A zoomed-in view of this cone reveals these repeating sweeps from positive to negative and red to green. Are they reflections off of some repeating structure in the passing train? The short horizontal bursts of constant tone could then be surfaces that are angled in a different direction than the rest of the train. Or perhaps this repetition reflects the regular placement of loudspeakers around the station?

Moving the microphone.

Another interesting experiment: I took the lift to another floor and recorded the ride from inside the lift. It wasn't the metal box type, the walls were made of glass, so I thought the pilot tone should be at least somewhat audible inside. Here's what I got during the 10-second ride. It's a little buried in noise.

[Image: A spectrogram zoomed into 19500 Hz. There's a pure tone at 19500 Hz in the beginning. Soon it starts to 'disintegrate' into wideband noise and then comes back together, forming a spindle-like pattern.]

Skater calculation.

For the next experiment I went into the underground car park of a shopping centre. I stood right under a PA loudspeaker and recorded a skateboarder passing by. A lot of interesting stuff is happening in this stereo spectrogram!

[Image: A pure tone near 19500 Hz, superimposed with an S-shaped pattern going from higher to lower frequencies and from red to green colour.]

First of all, there seems to be two pilot tones, one at 19,595 Hz and a much quieter one at 19,500 Hz. Are there two different PA systems in the car park?

Second, there's a clear Doppler shift in the reverb. The frequency shift goes from positive to negative at the same moment that the skater passes us, seen as the wideband wheel noise changing color. It looks like the pattern is also 'filled in' with noise under this Doppler curve. What all information can we find out just by looking at this image?

If we ignore the fact that this is actually a bistatic doppler shift we could try and estimate the speed using a formula on Wikipedia. It was pretty chilly in the car park, I would say 15 °C. The speed of sound at 15 °C is 340 m/s. The maximum Doppler shift here seems to be 350 Hz. Plugging all these into the equation we get 11 km/h, which sounds like a realistic speed for a skater.

Why is it filled in? My thought is these are reflections off different points on our test subject. There's variation in the reflection angles and, consequently, magnitudes of the velocity component that causes frequency shifting, down to nearly zero Hz.

What now?

What would you do with this ultrasonic beep all around you? I have some free ideas:

  • Automated speed trap in the car park
  • Detect when the escalators stop working
  • Modulate it with a positioning code to prevent people getting lost in the maze of commerce
  • Use it to deliver ads somehow
  • Use it to find your way to the quietest spots in a shopping centre

Smoother sailing: Studying audio imperfections in Steamboat Willie

[Image: Mickey Mouse whistling on the bridge of a steamboat.]

Steamboat Willie (1928) was one of the earliest cartoons with synchronized sound. That is, it had post-production sound effects; this was something new and exciting. Now that the cartoon has recently entered the public domain[bbc24] we can safely delve into its famous soundtrack. See, there's something interesting about how it sounds...

If you listen closely to the soundtrack on Youtube it sounds somehow distorted. You might be tempted to point out that it's 96 years old, yes. But you might also recognize that it is suffering from flutter, i.e. an unstable playback or recording speed.

In the spirit of this blog let's geek out for a bit and study this flutter distortion further. Can we learn something interesting? Could we perhaps learn enough to be able to reduce it?

Of course the flutter might be 100% authentic to how it sounded in theatres in the 1920s; we don't know when and why it appeared in the audio (more on that later!). It might have sounded even worse. But we can still hope to enjoy the sound effects in their original recorded form.

Prior work

I'm not the first one to notice this clip is 'fluttering' and to try and do something about it. I found videos of people's attempts to un-flutter it using Celemony Capstan, a professional tool made just for this purpose, with varying results. Capstan uses Melodyne's famous note detection engine to detect musical features and then controls a varispeed effect to cancel out any flutter.

But Capstan is expensive, and it's more fun to come up with a home-made solution anyway. And what about non-musical sounds? Besides, I had some code laying around in a forgotten desk drawer that just might fit the purpose.

Finding a high quality source

Why would I need a high-quality digital file of a poor-quality soundtrack from the 1920s? I guess it's the archivist in me hoping that it has been preserved with high level of detail. But also, if you're going to try and dig up some hidden details in the sound, you'd want minimal interference from any lossy psychoacoustic compression, right? These artifacts might become audible after varispeed effects and could also hinder frequency detection.

[Image: Two spectrograms labeled 'random Youtube video' and '4K version', the former showing compression artifacts.]

The high-quality source I found is in the Internet Archive. It might originally be coming from the 4K Blu-Ray release called Celebrating Mickey. The spectrogram doesn't show almost any compression artifacts that I can see, even in the quietest frequency ranges! Perfect!

[Image: A single film frame.]

But the Internet Archive delivers something even better. There's a (visually) lossless 4K scan of the movie with the optical soundtrack partially included (above)! The high-quality version is 34 GB, but there's a downscaled 480p MP4 one thousandth of the size.

I listened to the optical soundtrack from this low-resolution version with a little pixel-reader script. Turns out the flutter is already present on the film! (Edit: Note that we don't know where this particular film print came from. When was it created? Is there an original somewhere, without flutter?)

Hand-guiding a frequency tracker

Looking at the above spectrogram, we can see that the frequency of everything is zig-zagging as a function of time – that's flutter all right. But how to quantify these variations? We could zoom in on one of the frequency peaks and follow the course of its frequency in time. I'm using FFT peak interpolation to find more accurate frequency estimates[gasior04].

Take the sound of Pete's tobacco hitting the ship's bell around the 01'45'' mark. You'd think a bell is supposed to have a constant frequency, yet this one sounds quite unstable. We can follow any one of the harmonics and see how the playback speed (bell frequency) varies over the period of one second:

[Image: Spectrogram with fluctuating tones.]

To my eye, this oscillation looks periodic and not random at all. We can run another round of FFT on a longer stretch of samples to find the strongest period of these fluctuations: It turns out to be 15 Hz. (Why 15? I so hoped it would have been 24 Hz – it would have made a more interesting story! More on that later...)

[Image: Spectrum plot showing a peak at 15.0 Hz about 15 dB higher than background.]

Okay, so can we repeat this process for the whole movie? I don't think we can just automatically follow the frequency of every peak, since some sounds will naturally contain vibration and rises and drops in frequency. Not all of it is due to flutter. Some sort of a vetting process is needed. We could try a tedious manual route...

[Image: GUI of a software with spectrograms and oscillogram plots.]

I made a little software tool (above) where I could click and drag little boxes onto a spectrogram to search for peaks in. This wobbly line is then simply taken to be the speed variation (red graph in the top picture).

It became quite a chore to annotate longer sounds as this software didn't come with undo, edit, or save features for the longest time!

Now let's think about what to do with this speed information...

Desk drawer deep dive

Some time ago I had made a tool that could well come in handy now. It was for correcting wobbly wideband radio recordings stored on VHS tapes. These recordings contained some empty carriers that happened to work like seismographs, accurately recording the tape speed variations. The tool then used a Lagrange polynomial to interpolate new samples at a steady interval, so called 'digital varispeed'.

It was ultimately based on an interesting paper on de-fluttering magnetic tapes using the tape bias signal as reference[howarth04].

[Image: Buttons of an old device, one of them Varispeed, labeled 1981. Below, part of a GUI with the text Varispeed, labeled 2023.]

By the way, I keep mentioning varispeed and never explained it. This was a feature of old studio-grade reel-to-reel tape recorders where the playback speed could be freely varied by the operator; hence vari+speed. Audio people still use this word in the digital world to essentially refer to variable-rate resampling, which has the same effect, so I'm using them interchangeably. (Topmost photo: Ferdinando Traversa, CC BY, cropped to detail)

Here's what this digital varispeed sounds like when exaggerated. In the below example I'm doing it in a simpler way. Instead of the Lagrange method I first upsampled some music by 10x in an audio software; hand-drew a speed curve in Audacity; and then used that curve to pick samples out of the oversampled music:

[Image: A waveform in Audacity.]

Carefully controlled, this effect can be used to cancel out flutter. Here's how: If we knew exactly how the playback speed was fluctuating we could instantly vary the speed of our resampler in the opposite direction, thus canceling the variations. And with the above research we now have that knowledge!

Well, almost. I couldn't always see a clear frequency peak to follow, so the graph is patchy. But.. Maybe it could help to band-pass the speed signal at 15 Hz? This would help fill out small gaps and also preserve vibrato and other fluctuations that aren't part of the flutter distortion. We can at least try!

[Image: Two waveforms, one of them piecewise and noisy, the other one smooth and continuous.]

In the example above, I replaced empty parts with a constant value of 100% and then filtered the whole thing. This sews the disjointed parts together in a smooth way.

Can we hear some examples already?

This clip is from when the goat ate Minnie's sheet music and guitar – the apparent catalyst event that sent Mickey Mouse to seek revenge on the entire animal kingdom.

Before [Image: Movie screenshot]
After

You can definitely hear the difference in the bell-like sounds coming from the goats insides. It even sounds like the little flute notes in the beginning are easier to tell apart in the corrected version.

Here's another musical example, with strings.

Before [Image: Movie screenshot]
After

The cow's moo. That's a hard one because it's so rich in harmonics, in the spectrogram it looks almost like a spaghetti bolognese. My algorithm is constrained to a box and can't stay with one harmonic when the 'moo' slides in frequency. You can hear some artifacts because of this, but still the result sounds less sheep-like than the original.

Before [Image: Movie screenshot]
After

But Mickey whistling "Steamboat Bill" in the beginning of the film actually doesn't sound better when corrected... I preferred a bit of vibrato!

Before [Image: Movie screenshot]
After

Sidetrack 1: Anything else we can find?

Glad you're still reading! Let's step away from flutter for a while and take the raw audio track itself under the Fourier microscope. Zooming closer, is there anything interesting in the lower end?

[Image: Spectrogram showing a frequency range from 0 to 180 Hz.]

We can faintly see peaks at multiples of both 24 and 60 Hz. No surprises there, really... 24 Hz being the film framerate and 60 Hz the North American mains frequency. Was there a projector running in the recording studio? Or maybe it's an artifact of scanning the soundtrack one frame at a time? In any case, these sounds are pretty weak.

[Image: Spectrogram showing tones with apparent sidebands.]

In some places you can see some sort of modulation that seems to be generating sidebands, just like in radio signals. It's especially visible in Mickey's whistle when it's flutter-corrected, here at the 5-second mark. The sidebands peaks are 107 and 196 Hz away from the 'carrier' if you will. I'm not sure what this could be. Fluctuating amplitude?

Sidetrack 2: Playing sound-on-film frame by frame?

This is an experiment I did some time ago. It's just a silly thought - what would happen if the soundtrack was being read in the same way as the picture is – stopped 24 times per second? Would this be the ultimate flutter distortion?

In the olden days, sound was stored on the film next to the picture frames as analog information. Unlike the picture frames that had to be stopped momentarily for projection, the sound had to be played at a constant speed. There was a complicated mechanism in the projector to make this possible.

I found some speed curves for old-school movie projectors in [bickford72]. They describe the film's deceleration and acceleration during these stops. Let's emulate these speed curves in audio with the oversampling varispeed method.

The video below is a 3D animation where this same speed curve controls an animation of a moving film in an imaginary machine. The clip is from another 1920s animation, Alice in the Wooly West (1926).

~~ Now we know ~~

Conclusions

  • We found a 15 Hz speed fluctuation that was, to some extent, reversible.
  • This flutter signal is already present in the optical soundtrack of a film scan (of unknown origin).
  • With enough manual work, much of the soundtrack could probably be 'corrected'.
  • 'Hmm, that sounds odd' are sometimes the words of a white rabbit.

References

Using HDMI EMI for fast wireless data transfer

This story, too, begins with noise. I was browsing the radio waves with a software radio, looking for mysteries to accompany my ginger tea. I had started to notice a wide-band spiky signal on a number of frequencies that only seemed to appear indoors. Some sort of interference from electronic devices, probably. Spoiler alert, it eventually led me to broadcast a webcam picture over the radio waves... but how?

It sounds like video

The mystery deepened when I listened to how this interference sounded like as an AM signal. It reminded me of a time I mistakenly plugged our home stereo system to the Nintendo console's video output and heard a very similar buzz.

Am I possibly listening to video? Why would there be analog video transmitting on any frequency, let alone inside my home?

[Image: Oscillogram of a noisy waveform that seems to have a pulse every 10 microseconds or so.]

If we plot the signal's amplitude against time we can see that there is a strong pulse exactly 60 times per second. This could be the vertical synchronisation signal of 60 Hz video. A shorter pulse (pictured above) can be seen repeating more frequently; it could be the horizontal one. Between these pulses there is what appears to be noise. Maybe, if we use the strong pulses for synchronisation and plot the amplitude of that noise as a two-dimensional picture, we could see something?

And sure enough, when main screen turn on, we get signal:

[Image: A grainy greyscale image of what appears to be a computer desktop.]

(I've hidden the bright synchronisation signal from this picture.)

It seems to be my Raspberry Pi's desktop with weirdly distorted greyscale colours! Somehow, some part of the monitor setup is radiating it quite loudly into the aether. The frequency I'm listening to is a multiple of the monitor's pixel clock frequency.

As it turns out, this vulnerability of some monitors has been known for a long time. In 1985, van Eck demonstrated how CRT monitors can be spied on from a distance[1]; and in 2004, Markus Kuhn showed that the same still works on flat-screen monitors[2]. The image is heavily distorted, but some shapes and even bigger text can be recognisable. Sometimes this kind of eavesdropping is called "van Eck phreaking", in reference to phone phreaking.

The next thought was, could we get any more information out of these images? Is there any information about colour?

Mapping all the colours

HDMI is fully digital; there is no linear dependency between pixel values and greyscale brightness in this amplitude image. I believe the brightness in the above image is related to the number of bit transitions over my radio's sampling time (which is around 8 bit-lengths); and in HDMI, this is dependent on many things, not just the actual RGB value of the pixel. HDMI also uses multiple differential wires that all are transmitting their own picture channels side by side.

This is why I don't think it's possible easy to reconstruct a clear picture of what's being shown on the screen, let alone decode any colours.

But could the reverse be possible? Could we control this phenomenon to draw the greyscale pictures of our choice on the receiver's screen? How about sending binary data by displaying alternating pixel values on the monitor?

[Image: On the left, gradients of red, green, and blue; on the right, greyscale lines of seemingly unrelated brightness.]

My monitor uses 16-bit colours. There are "only" 65,536 different colours, so it's possible to go through all of them and see how each appears in the receiver. But it's not that simple; the bit-pattern of a HDMI pixel can actually get modified based on what came before it. And my radio isn't fast enough to even tell the bits apart anyway. What we could do is fill entire lines with one colour and average the received signal strength. We would then get a mapping for single-colour horizontal streaks (above). Assuming a long run of the same colour always produces the same bitstream, this could be good enough.

[Image: An XY plot where x goes from 0 to 65536 and Y from 0 to 1.2. A pattern seems to repeat itself every 256 values of x. Values from 16128 to 16384 are markedly higher.]

Here's the map of all the colours and their intensity in the radio receiver. (Whatever happens between 16,128 and 16,384? I don't know.)

Now, we can resample a greyscale image so that its pixels become short horizontal lines. Then, for every greyscale value find the closest matching RGB565 color in the above map. When we display this psychedelic hodge-podge of colour on the screen (on the right), enough of the above mapping seems to be preserved to produce a recognizable picture of a movie[3] on the receiver side (on the left):

[Image: On the right, a monitor shows a noisy green and blue image. On the left, another monitor shows a grainy picture of a man and the text 'Hackerman'.]

These colours are not constant in any way. If I move the antenna around, even if I turn it from vertical to horizontal, the greyscales will shift or even get inverted. If I tune the radio to another harmonic of the pixel clock frequency, the image seems to break down completely. (Are there more secrets to be unfolded in these variations?)

The binary exfiltration protocol

Now we should have enough information to be able to transmit bits. Maybe even big files and streaming data, depending on the bitrate we can achieve.

First of all, how should one bit be encoded? The absolute brightness will fluctuate depending on radio conditions. So I decided to encode bits as the brightness difference between two short horizontal lines. Positive difference means 1 and negative 0. This should stay fairly constant, unless the colours completely flip around that is.

[Image: When a bit is 0, the leftmost line is darker than the rightmost line, and vice versa. These lines are used to form 768-bit packets.]

The monitor has 768 pixels vertically. This is a nice number so I designed a packet that runs vertically across the display. (This proved to be a bad decision, as we will later see.) We can stack as many packets side-by-side as the monitor width allows. A new batch of packets can be displayed in each frame, or we can repeat them over multiple frames to improve reliability.

These packets should have some metadata, at least a sequence number. Our medium is also quite noisy, so we need some kind of forward error correction. I'm using a Hamming(12,8) code which adds 4 error correction bits for every 8 bits of data. Finally, we need to add a CRC to each packet so we can make sure it arrived intact; I chose CRC16 with the polynomial 0x8005 (just because liquid-dsp provided it by default).

First results!

It was quite unbelievable, I was able to transmit a looping 64 kbps audio stream almost without any glitches, with the monitor and the receiver in the same room approximately 2 meters from each other.

Quick tip. Raw 8-bit PCM audio is a nice test format for these kinds of streaming experiments. It's straightforward to set an arbitrary bitrate by resampling the sound (with SoX for instance); there's no structure, headers, or byte order to deal with; and any packet loss, misorder, or buffer underrun is instantly audible. You can use a headerless companding algorithm like A-law to fit more dynamic range in 8 bits. Even stereo works; if you start from the wrong byte the channels will just get swapped. SoX can also play back the stream.

But can we get more? Slowly I added more samples per second, and a second audio channel. Suddenly we were at 256 kbps and still running smoothly. 200 kbps was even possible from the adjacent room, with a directional antenna 5 meters away, and with the door closed! In the same room, it worked up to around 512 kilobits per second but then hit a wall.

[Image: Info window that says HasPreamble: 1. Total: 928.5 kbps, Fresh: 853.6 kbps, Fresh (payload): 515.7 kbps.]

A tearful performance

The heavy error correction and framing adds around 60% of overhead, and we're left wit 480 bits of 'payload' per packet. If we have 39 packets per frame at 60 frames per second we should get more than a megabit per second, right? But for some reason it always caps at half a megabit.

The reason revealed itself when I noticed every other frame was often completely discarded at the CRC check. Of course; I should have thought of properly synchronising the screen update to the graphics adapter's frame update cycle (or VSYNC). This would prevent the picture information changing mid-frame, also known as tearing. But whatever options I tried with the SDL library I couldn't get the Raspberry Pi 4 to not introduce tearing.

Screen tearing appears to be an unsolved problem plaguing the Raspberry Pi 4 specifically (see this Google search). I tried another mini computer, the Asus Tinker Board R2.0, but I couldn't get the graphics drivers to work properly. I then realised it was a mistake to have the packets run from top to bottom; any horizontal tearing will cut every single packet in half! With a horizontal design only one packet per frame would suffer this fate.

A new design enables video-over-video

Packets that run horizontally across the screen indeed fix most of the packet loss. It may also help with CPU load as it improves memory access locality. I'm now able to get 1000 kbps from the monitor! What could this be used for? A live video stream, perhaps?

But the clock was ticking. I had a presentation coming up and I really wanted to amaze everyone with a video transfer demo. I quite literally got it working on the morning of the event. For simplicity, I decided to go with MJPEG, even though fancier schemes could compress way more efficiently. The packet loss issues are mostly kept at bay by repeating frames.

The data stream is "hidden" in a Windows desktop screenshot; I'm changing the colours in a way that both creates a readable bit and also looks inconspicuous when you look from far away.

Mitigations

This was a fun project but this kind of a vulnerability could, in the tinfoiliest of situations, be used for exfiltrating information out of a supposedly airgapped computer.

The issue has been alleviated in some modern display protocols. DisplayPort[4] makes use of scrambling: a pseudorandom sequence of bits is mixed with the bitstream to remove the strong clock oscillations that are so easily radiated out. This also randomizes the bitstream-to-amplitude correlation. I haven't personally tested whether it still has some kind of video in their radio interference, though. (Edit: Scrambling seems to be optionally supported by later versions of HDMI, too – but it might depend on which features exactly the two devices negotiate. How could you know if it's turned on?)

[Image: A monitor completely wrapped in tinfoil, with the text IMPRACTICAL written over it.]

I've also tried wrapping the monitor in tinfoil (very impractical) and inside a cage made out of chicken wire (it had no effect - perhaps I should have grounded it?). I can't recommend either of these.

Software considerations

This project was made possible by at least C++, Perl, SoX, ImageMagick, liquid-dsp, Dear Imgui, GLFW, turbojpeg, and v4l2! If you're a library that feels left out, please leave a comment.

If you wish to play around with video emanations, I heard there is a project called TempestSDR. For generic analog video decoding via a software radio, there is TVSharp.

References

  1. Van Eck, Wim (1985): Electromagnetic radiation from video display units: An eavesdropping risk?
  2. Kuhn, Markus (2004): Electromagnetic Eavesdropping Risks of Flat-Panel Displays
  3. KUNG FURY Official Movie [HD] (2015)
  4. Video Electronics Standards Association (2006): DisplayPort Standard, version 1.