
A practical workflow for building 'everything except your instrument' backing tracks — covers model choice (4-stem vs 6-stem), per-instrument steps for voice, guitar, bass, drums, the songs that don't separate well, and how to slow them down.
Most musicians' practice routine has one obvious gap: there's no band.
You can metronome through a song for an hour, but you won't learn how to land the chorus until you're playing against actual drums, actual bass, actual vocals. The classic answer was buying backing tracks one at a time on iTunes — a few hundred songs, $1.99 each, mostly bad mixes of songs you don't want to play.
AI source separation killed that market. You can now take any song you own (or any YouTube link) and remove your instrument in a few minutes. The result is a backing track that fits the original record exactly, because it is the original record minus you.
This post walks through the practical workflow for the four common cases — voice, guitar, bass, drums — plus the songs where the trick doesn't work, and what to do when you need to slow them down.
A single audio file per song that contains the full original recording minus your instrument. Drop it into Spotify on your phone, Anytune, a portable looper, or any DAW. Play along.
For singers, that's the karaoke instrumental. For guitarists, the full band without guitar. For drummers, the song with a hole where your kit goes. Same idea, different stems removed.
This is the one decision most people get wrong, and it costs you a full re-render.
| Your instrument | Use this model | Why |
|---|---|---|
| Voice (singing) | 4-stem (default) | Vocals separate cleanest in the 4-stem model |
| Bass | 4-stem (default) | Bass has its own dedicated stem |
| Drums | 4-stem (default) | Drums have their own dedicated stem |
| Guitar | 6-stem | Without 6-stem, guitar gets dumped into "other" with synths and strings |
| Piano | 6-stem | Same reason — piano needs its own dedicated stem |
| Sax, violin, brass | 4-stem (and accept it) | No dedicated stem exists; they live in "other" |
The 6-stem model is the one mistake we see most often. Guitarists default to 4-stem out of habit, then wonder why their "instrumental" backing track still has guitar bleeding through. It's not a model bug — there's no dedicated guitar stem in the 4-stem model. Pick 6-stem if you play guitar or piano. Otherwise pick 4-stem; it's faster and slightly cleaner per stem.
The cost is the same either way, so don't optimize for it. (We wrote up the per-call cost math here.)
This is the simplest case because "everything except vocals" is one click.
The one trick: if the song has prominent backing vocals you also want to remove (Beatles-style stacked harmonies), the karaoke maker leaves those in. There's no model on the public internet that cleanly separates lead vocals from backing vocals — they share too much frequency content. Pick a different recording, or accept the backing vocals in your instrumental.
This is where the 6-stem decision matters.
The result is the full band minus guitar. Loop the solo section in any audio player that supports A-B repeat and practice the lick fifty times.
The timing trap: it's tempting to also mute drums for a "cleaner" practice mix. Don't. Most musicians lose timing without the drums as reference, and the whole point of playing along with the record is to learn how the part sits against the groove.
Almost identical to guitar, but use 4-stem.
Bass-specific gotcha: songs with synth bass or heavy sub-bass often get split awkwardly between the "bass" stem and "other". If your bassline disappears from the bass file and shows up faintly in "other", the original mix routed the bass through a synth or used heavy sidechaining. There's no fix at the model level — pick a different song, or layer the two stems back together and accept that the "backing" track will have ghost bass in it.
Same flow, different stem to drop.
Drum-specific gotcha: the "vocals" stem will have faint cymbal hash bleeding through (cymbals share a lot of upper-frequency content with sibilant vocals), and the "other" stem will sometimes have ghost-snare artifacts. For practice, this doesn't matter — you'll be playing loud enough that nobody hears the bleed. For recording your kit over the backing track, high-pass everything except the drum slot at ~80 Hz and the bleed disappears.
This is the half of the equation nobody talks about. A perfect model can't separate audio that wasn't recorded with separation in mind.
Works well:
Works poorly:
The earbud test: if you can clearly hear and name each instrument when listening on cheap earbuds, the model can probably separate them. If the mix sounds like a wall of sound on cheap earbuds, the model will give you a wall of stems.
A backing track at original tempo is rarely useful when you're still learning. Two ways to handle it.
Slow down after separation. Run the song through stem splitter normally, mix your backing track, then drop it into the slowed + reverb maker. Works fine for tempo drops up to about 15%. Beyond that you start hearing time-stretch artifacts on the cymbals.
Slow down before separation. Counterintuitively, this often produces better stem quality. The model processes the same audio at a lower sample density per second, which gives it more to work with on tricky transients. Try this for songs where the default separation comes out muddy.
For key changes, use the pitch changer on your final backing track. Avoid changing key before separation — the pitch-shift artifacts confuse the model and you end up with worse stems.
1. Don't normalize each stem before mixing. Stem separation already preserves relative volumes from the original mix. If you normalize each stem to 0 dB before combining, you'll get a backing track where the bass is suddenly the loudest thing — totally wrong against the original record. Import the raw stems, set all tracks to 0 dB gain, export.
2. Don't bother with stems for a one-off. Stem separation makes sense for songs you'll practice 50 times. For a song you'll play through twice, just play along with the original record at a volume that lets you hear yourself. The math of "5 minutes of processing + 30 seconds of mixing" only pays back across many practice sessions.
3. Don't trust the first separation if the source audio sounds bad. Bitrate matters. A 128 kbps YouTube rip will separate noticeably worse than a 320 kbps MP3 or a lossless file. If the result sounds off, check the source first — there's a real ceiling on quality you can extract from a low-bitrate source.
A typical workflow takes about three minutes of active time:
Total: under 5 minutes from "I want to practice this song" to "the backing track is on my phone."
If you only need vocals removed, the karaoke maker skips the manual mix step entirely. For everything else, one drag-and-drop into Audacity is the whole job.
The takeaway: the model is the easy part. Picking the right model for your instrument and picking a song that was recorded with clean separation are the two decisions that determine whether you spend the next hour practicing or troubleshooting.
If you want to try it on a song without setting up a local toolchain, AI Stem Splitter is free for the first few minutes of audio.


Step-by-step guide to removing vocals from any song with AI. No software to install, no signup for your first try. Get a clean instrumental in under 90 seconds.


I ran the same Pixabay track through LALAL.AI, Moises, vocalremover.org, Voice.ai, Fadr, UVR, and my own AI Stem Splitter. Here is the honest, headphone-tested comparison plus a step-by-step guide for getting clean six-stem output.


A practical comparison of three leading open-source audio separation models — covering SDR scores, inference cost, real-world latency, and when each one actually makes sense in production.
