Spectrograms#

  • Rich time series like sound recordings can be converted to spectrograms, e.g., ussing FFT methods.

    • Spectrograms show the distribution in the frequency domain as a function of time.

    • In practice this is implemented by a moving window technique with local FFT computations (overlapping windows) called Short-time Fourier transform.

  • Some limitations:

    • The decomposition is dependent on the sampling frequency, \(f_s\) (samples per second), and time window length \(T\) seconds.

    • The highest frequency that can be resolved is called the Nyquist frequency \(= f_s/2\) Hz, i.e., 22050 Hz for a 44.1 kHz sampling (due to the Fourier transform).

    • The lowest frequency that can be resolved is the Rayleigh frequency \(= 1/T\) Hz, i.e., for a window of length 1152 samples (MP3) and 44.1 kHz, this results in a minimum frequency of \(\frac{1}{\frac{1152}{44100 Hz}} = \frac{1}{26.1 ms} = 38.3 Hz\).

    • Too short windows will give a “smearing effect” on the extracted frequencies (see Wikipedia article) where precision is low on the extracted frequencies.

    • For time series, this means we have an upper bound (often of less interest) and a lower bound of the frequencies that can be resolved. The lower bound being “waves in the signal” \(\leq\) “length of the window”.

Spectrograms#

  • Rich time series like sound recordings can be converted to spectrograms, e.g., ussing FFT methods.

    • Spectrograms show the distribution in the frequency domain as a function of time.

    • In practice this is implemented by a moving window technique with local FFT computations (overlapping windows) called Short-time Fourier transform.

  • Some limitations:

    • The decomposition is dependent on the sampling frequency, \(f_s\) (samples per second), and time window length \(T\) seconds.

    • The highest frequency that can be resolved is called the Nyquist frequency \(= f_s/2\) Hz, i.e., 22050 Hz for a 44.1 kHz sampling (due to the Fourier transform).

    • The lowest frequency that can be resolved is the Rayleigh frequency \(= 1/T\) Hz, i.e., for a window of length 1152 samples (MP3) and 44.1 kHz, this results in a minimum frequency of \(\frac{1}{\frac{1152}{44100 Hz}} = \frac{1}{26.1 ms} = 38.3 Hz\).

    • Too short windows will give a “smearing effect” on the extracted frequencies (see Wikipedia article) where precision is low on the extracted frequencies.

    • For time series, this means we have an upper bound (often of less interest) and a lower bound of the frequencies that can be resolved. The lower bound being “waves in the signal” \(\leq\) “length of the window”.

# Create a sine wave that starts low and increases during 4 seconds with a sampling rate of 44.1 kHz.
import numpy as np
fs = 44100 # Sampling rate
sec = 4   # Duration of the tone in seconds
t = np.linspace(0, 4, sec*fs)      # Time axis
f = np.linspace(200, 5000, sec*fs) # Frequency axis
y_approx = np.sin(2*np.pi*np.cumsum(f)/fs) # Approximation of the sine wave

# Plot the sine wave at the beginning, middle and end
import matplotlib.pyplot as plt
plt.figure(figsize=(12,3))
plt.plot(range(600), np.hstack([y_approx[:200],y_approx[int(sec/2*fs):int(sec/2*fs)+200], y_approx[-200:]]))
plt.xlabel('Selected time (samples)')
plt.ylabel('Amplitude')
plt.show()
../../_images/04a410b8b1b9afe4209a021be3b5baee8531ebc3168a0e16fd74eb1f3ef4e089.png
# Play the song using sounddevice
import sounddevice as sd
sd.play(y_approx, fs)

Spectrogram computation#

# Produce a spectrogram using scipy.signal.stft (short-time Fourier transform)
from scipy.signal import stft # Legacy function stft. Anyone want to update this to ShortTimeFFT?
f_stft, t_stft, Zxx = stft(y_approx, fs, nperseg=1152)
# Plot the spectrogram
import matplotlib.pyplot as plt
plt.pcolormesh(t_stft, f_stft, np.abs(Zxx), vmin=0, vmax=max(y_approx), shading='gouraud')
plt.title('STFT Magnitude')
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.show()
../../_images/f25841c9a6b408add5acc79734fd586f2406420eac258b3656364922ecaba2b3.png
# Plot the spectrogram without pre-computed frequency and time axes
import matplotlib.pyplot as plt
plt.specgram(y_approx, Fs=fs, NFFT=1152, noverlap=400, scale='linear')
plt.xlabel('Time (s)')
plt.ylabel('Frequency (Hz)')
plt.colorbar()
plt.show()
../../_images/0bf705e17e23265933c2670ae2122acb01ab8100206f4f62ed26738649eb4f63.png
# Music: https://www.chosic.com/free-music/all/ 
# Load MP3 file Bird.mp3 and plot the spectrogram
# NOTE: Needs `ffprobe` or `avprobe` installed on the computer.
from pydub import AudioSegment
song = AudioSegment.from_mp3("../../data/Bird.mp3")

# Convert song to numpy array
samples = np.array(song.get_array_of_samples())
# Play the song using pydub
sd.play(samples, 44100)
# Produce a spectrogram using scipy.signal.stft (short-time Fourier transform)
f, t, Zxx = stft(samples, song.frame_rate, nperseg=1152)
# Plot the spectrogram
plt.pcolormesh(t, f, np.abs(Zxx), vmin=0, vmax=max(samples)/100, shading='gouraud')
plt.title('STFT Magnitude')
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.show()
../../_images/03a5db8560ea5fddbf0cb4b00df629d1f685b6130ae0ccc0159808270d94c781.png

Spectrogram on time series#

  • Here, we show a random series of data and its spectrogram.

  • An abrupt change in the frequency domain will show up in the spectrogram, maybe indicating a change in the underlying process.

# Our friend, the random series, but now a little longer
rng = np.random.default_rng(0)
n = 2001
y = rng.standard_normal(n).cumsum()
# Compute the spectrogram
f, t, Zxx = stft(y, 2000, nperseg=100)

# Plot the series y and the spectrogram above each other
import matplotlib.pyplot as plt
fig, axs = plt.subplots(2, 1)
axs[0].plot(y)
axs[0].set_xlim(0, n-1)
axs[0].set_ylabel('Amplitude')
axs[1].pcolormesh(t, f, np.abs(Zxx), vmin=0, shading='gouraud', vmax=1) # Play with the vmax parameter to enhance the contrast
axs[1].set_ylabel('Frequency [Hz]')
axs[1].set_xlabel('Intervals')
plt.show()
../../_images/fbbd321ab575e3ac5221b74a7ba3dbf6f1f12a39aa5c276db55ee9824e553143.png