Tech

Twelve Million Tracks, Zero Permission Slips

The Atlantic made the music industry's AI problem searchable. Now artists can see exactly what was taken.

By Chasing Seconds · JUNE 20, 20262 minute read

There's a version of this story where the industry debates consent in the abstract — philosophers and lawyers trading hypotheticals while the machine keeps eating. That version ended when a reporter at The Atlantic built a searchable database.

According to coverage at The Verge, Atlantic reporter Alex Reisner surfaced four datasets of music being used to train AI models and made them publicly searchable. Two of those datasets run at 12 million and 9 million tracks respectively. The other two are smaller — over 100,000 songs each — but "smaller" is doing a lot of work in that sentence. We're still talking about a staggering volume of recorded human creative work, assembled and deployed without a single artist being asked.

Google and Stability AI have both confirmed in research papers that they used the data. That confirmation matters. It moves this out of the realm of allegation.

The Gap Between Legal and Asked

Here's where I keep landing: the consent problem in AI training was never really a technical one. Nobody was confused about how to send an email, how to post a form, how to run a licensing negotiation. The industry knew where the music lived. It chose not to knock.

Some of these sources, per the reporting, are things like the Free Music Archive — material that's free to stream for personal use. Free to stream is not the same as free to industrialize. That distinction is obvious to anyone operating in good faith, which is precisely why the gap between those two things feels less like an oversight and more like a decision.

What the Atlantic database does is collapse the comfortable distance between "AI was trained on data" and "your specific work was in the pile." Twelve million tracks is an abstraction. Your track, searchable by name, is not.

Receipts Have a Way of Changing Conversations

The datasets have been downloaded thousands of times, and while it's impossible to know the full scope of who's used them, the confirmation from Google and Stability is a signal that this wasn't fringe behavior. This was infrastructure. The people building the models knew what they were pulling from.

I've watched this cycle play out in other corners of tech — the quiet accumulation of data at scale, the retroactive justification, the moment when the receipts go public and suddenly everyone is very interested in talking about licensing frameworks. The music industry is now at the receipt stage.

What changes from here is genuinely unclear. The datasets exist. The models trained on them exist. You can't un-train a model the way you can pull a sample from a record. But you can decide what happens next — who gets compensated, what gets licensed going forward, whether "we already did it" becomes a permanent defense or a one-time window that just closed.

The Atlantic made the problem searchable. What the industry does with that search bar is the only question left.

End — Filed from the desk

§ More from Tech

Keep reading tech.

Tech

From the other desks.

Cars

Twelve Million Tracks, Zero Permission Slips

The Gap Between Legal and Asked

Receipts Have a Way of Changing Conversations

Keep reading tech.

Goats Built a Neural Network. Nobody Called Them Sentient.

Beijing Moved Its Data Centers Off the Planet

Scan First, Reckon Later

From the other desks.

Run It to Zero and See Who Blinks

Forty Years, and Peak Performance Went Back to Ask the Mountain

Giannis Is Available. The Cap Isn't.