Does Audio Format Affect Stem Separation Quality?

Key finding MP3 128kbps input reduces mean SDR by 0.24 dB compared to WAV 24-bit (7.8 dB vs 8.04 dB)

Methodology

Twenty test tracks from MUSDB18-7s were taken as WAV 24-bit reference audio. Each mixture was re-encoded to a target format via ffmpeg and decoded back to WAV before being processed by HTDemucs. This isolates the effect of format-induced audio degradation on separation quality.

All six formats were applied to the same 20 tracks so comparisons are apples-to-apples. SDR was computed against the ground truth stems using mir_eval BSSEval v4.

Formats tested: WAV 24-bit, WAV 16-bit, MP3 320kbps, MP3 192kbps, MP3 128kbps, AAC 256kbps.

Model: htdemucs_ft (fine-tuned, 4-stem). Device: Apple M4 MPS.

Notes on the data

7-second clips introduce more per-track variance than full tracks would. The SDR numbers here should be read as indicative rather than as definitive absolute benchmarks. The relative differences between formats are reliable because each format is compared on the same set of clips.

SDR by Format and Stem

All values in dB. Higher is better. Delta shows change vs WAV 24-bit baseline.

Format	Vocals	Drums	Bass	Other	Mean SDR	Delta
WAV 24-bit	8.86	9.95	9.52	3.85	8.04	baseline
WAV 16-bit	8.86	10.01	9.48	3.82	8.04	baseline
MP3 320kbps	8.82	9.93	9.52	3.69	7.99	-0.05 dB
MP3 192kbps	8.78	9.92	9.42	3.82	7.98	-0.06 dB
MP3 128kbps	8.49	9.69	9.46	3.57	7.8	-0.24 dB
AAC 256kbps	8.78	9.91	9.57	3.69	7.99	-0.05 dB