Last modified: Apr 16, 2026 By Alexander Williams
Librosa Audio Analysis in Python
Audio data is complex. Music contains rhythm, pitch, and timbre. Analyzing it requires specialized tools. Librosa is a Python library built for this task.
It simplifies music and audio analysis. You can load files, extract features, and visualize results. This guide will show you how to get started.
We will cover core functions and practical examples. You will learn to extract meaningful insights from sound.
What is Librosa?
Librosa is a powerful Python package. It is designed for music and audio analysis. It provides the building blocks to understand audio signals.
The library handles common audio file formats. It can compute many audio features. These include tempo, beat tracks, and spectral characteristics.
It integrates well with NumPy, SciPy, and Matplotlib. This makes it a key tool for any audio processing pipeline. For a broader context, see our Python Audio Libraries: Play, Record, Process guide.
Installing Librosa
First, you need to install the library. Use pip, the Python package installer. Run the following command in your terminal.
pip install librosa
This command downloads and installs Librosa. It also installs core dependencies like NumPy and SciPy. You are now ready to start analyzing audio.
Loading an Audio File
The first step is to load an audio file. Use the librosa.load() function. It returns two important things.
You get the audio time series as a NumPy array. You also get the sample rate. The sample rate is the number of samples per second.
import librosa
# Load an audio file
audio_path = 'your_song.wav'
y, sr = librosa.load(audio_path)
print(f"Audio shape: {y.shape}")
print(f"Sample rate: {sr} Hz")
print(f"Duration: {librosa.get_duration(y=y, sr=sr):.2f} seconds")
Audio shape: (1323000,)
Sample rate: 22050 Hz
Duration: 60.00 seconds
The y variable holds the audio waveform. The sr variable holds the sample rate, typically 22050 Hz by default. This is a standard for analysis.
Extracting Basic Features
Librosa can compute many audio features. Let's start with some fundamental ones. These features describe the audio's properties.
Beat and Tempo
Finding the beat and tempo is common. Use the librosa.beat.beat_track() function. It estimates the tempo and frame indices of beats.
# Estimate tempo and beat frames
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)
print(f"Estimated tempo: {tempo:.2f} BPM")
print(f"Number of beat frames: {len(beat_frames)}")
Estimated tempo: 120.12 BPM
Number of beat frames: 289
This is useful for music information retrieval. It helps in segmenting music or synchronizing visuals.
Mel-Frequency Cepstral Coefficients (MFCCs)
MFCCs are crucial for speech and music analysis. They represent the short-term power spectrum of sound. Use librosa.feature.mfcc() to compute them.
# Extract MFCC features
mfccs = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)
print(f"MFCCs shape: {mfccs.shape}")
print(f"Number of MFCCs: {mfccs.shape[0]}")
print(f"Number of frames: {mfccs.shape[1]}")
MFCCs shape: (13, 2587)
Number of MFCCs: 13
Number of frames: 2587
The output is a matrix. Rows are the coefficients (e.g., 13). Columns are time frames. This is a compact representation of timbre.
Spectral Centroid
The spectral centroid indicates the "brightness" of a sound. It is the center of mass of the spectrum. Compute it with librosa.feature.spectral_centroid().
# Compute the spectral centroid
centroid = librosa.feature.spectral_centroid(y=y, sr=sr)
print(f"Spectral Centroid shape: {centroid.shape}")
print(f"First 5 values: {centroid[0, :5]}")
Spectral Centroid shape: (1, 2587)
First 5 values: [1853.76 1921.32 1988.88 2056.44 2124.00]
Higher values mean a brighter, sharper sound. Lower values suggest a darker, bass-heavy sound.
Visualizing Audio Features
Visualization helps in understanding the data. Librosa works with Matplotlib for plotting. Let's create a waveform and spectrogram plot.
import matplotlib.pyplot as plt
import librosa.display
# Create a figure with subplots
plt.figure(figsize=(14, 8))
# Plot the waveform
plt.subplot(3, 1, 1)
librosa.display.waveshow(y, sr=sr, alpha=0.5)
plt.title('Audio Waveform')
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
# Plot the spectrogram
plt.subplot(3, 1, 2)
D = librosa.amplitude_to_db(librosa.stft(y), ref=np.max)
librosa.display.specshow(D, sr=sr, x_axis='time', y_axis='log')
plt.colorbar(format='%+2.0f dB')
plt.title('Log-Frequency Power Spectrogram')
# Plot the MFCCs
plt.subplot(3, 1, 3)
librosa.display.specshow(mfccs, sr=sr, x_axis='time')
plt.colorbar()
plt.title('MFCCs')
plt.tight_layout()
plt.show()
This code generates three plots. The first is the raw waveform. The second is a log-frequency spectrogram. The third shows the MFCCs over time.
Visuals make patterns obvious. You can see beats, harmonics, and feature changes.
Practical Application: Beat-Synced Feature Analysis
A powerful technique is beat-synchronous analysis. You align features with the detected beats. This reduces variability and captures musical structure.
Librosa's librosa.util.sync() function helps here. It aggregates feature columns within each beat.
# Get beat frames in sample indices
beat_times = librosa.frames_to_time(beat_frames, sr=sr)
beat_samples = librosa.time_to_samples(beat_times, sr=sr)
# Compute chroma features (musical pitch)
chroma = librosa.feature.chroma_stft(y=y, sr=sr)
# Sync chroma features to beats
chroma_sync = librosa.util.sync(chroma, beat_frames, aggregate=np.median)
print(f"Original chroma shape: {chroma.shape}")
print(f"Beat-synced chroma shape: {chroma_sync.shape}")
Original chroma shape: (12, 2587)
Beat-synced chroma shape: (12, 289)
The synced features have one column per beat, not per time frame. This is more musically meaningful. It's essential for tasks like genre classification or music segmentation.
Conclusion
Librosa is an essential library for audio analysis in Python. It simplifies loading files, extracting features, and creating visualizations.
We covered loading audio, estimating tempo, and computing MFCCs. We also looked at spectral centroids and beat-synchronous analysis.
These tools form the foundation for more advanced work. You can build music recommendation systems, automatic tagging, or transcription tools.
To dive deeper into the fundamentals, check out our Python Audio Processing Guide for Beginners. Start experimenting with your own audio files. The world of music signal analysis is now at your fingertips.