Twelve Million Tracks, Zero Permission Slips
The Atlantic made the music industry's AI problem searchable. Now artists can see exactly what was taken.

Photo · The Verge
There's a version of this story where the industry debates consent in the abstract — philosophers and lawyers trading hypotheticals while the machine keeps eating. That version ended when a reporter at The Atlantic built a searchable database.
According to coverage at The Verge, Atlantic reporter Alex Reisner surfaced four datasets of music being used to train AI models and made them publicly searchable. Two of those datasets run at 12 million and 9 million tracks respectively. The other two are smaller — over 100,000 songs each — but "smaller" is doing a lot of work in that sentence. We're still talking about a staggering volume of recorded human creative work, assembled and deployed without a single artist being asked.
Google and Stability AI have both confirmed in research papers that they used the data. That confirmation matters. It moves this out of the realm of allegation.
The Gap Between Legal and Asked
Here's where I keep landing: the consent problem in AI training was never really a technical one. Nobody was confused about how to send an email, how to post a form, how to run a licensing negotiation. The industry knew where the music lived. It chose not to knock.
Some of these sources, per the reporting, are things like the Free Music Archive — material that's free to stream for personal use. Free to stream is not the same as free to industrialize. That distinction is obvious to anyone operating in good faith, which is precisely why the gap between those two things feels less like an oversight and more like a decision.
What the Atlantic database does is collapse the comfortable distance between "AI was trained on data" and "your specific work was in the pile." Twelve million tracks is an abstraction. Your track, searchable by name, is not.
Receipts Have a Way of Changing Conversations
The datasets have been downloaded thousands of times, and while it's impossible to know the full scope of who's used them, the confirmation from Google and Stability is a signal that this wasn't fringe behavior. This was infrastructure. The people building the models knew what they were pulling from.
I've watched this cycle play out in other corners of tech — the quiet accumulation of data at scale, the retroactive justification, the moment when the receipts go public and suddenly everyone is very interested in talking about licensing frameworks. The music industry is now at the receipt stage.
What changes from here is genuinely unclear. The datasets exist. The models trained on them exist. You can't un-train a model the way you can pull a sample from a record. But you can decide what happens next — who gets compensated, what gets licensed going forward, whether "we already did it" becomes a permanent defense or a one-time window that just closed.
The Atlantic made the problem searchable. What the industry does with that search bar is the only question left.
Keep reading tech.

Goats Built a Neural Network. Nobody Called Them Sentient.
A Microsoft researcher ran Age of Empires II livestock through a neural network architecture to make a point the AI industry keeps refusing to hear.

Beijing Moved Its Data Centers Off the Planet
China just announced a satellite AI infrastructure alliance, and the interesting part isn't the ambition — it's the timing.

Scan First, Reckon Later
The UK is deploying facial recognition age checks on asylum-seekers while already knowing the error rates are bad. Someone decided that was fine.
From the other desks.

Run It to Zero and See Who Blinks
A writer pushed the 2026 Chevy Bolt until the battery died. What they found says more about trust than range.

Forty Years, and Peak Performance Went Back to Ask the Mountain
A heritage brand's best argument isn't a campaign. It's the conditions it was built for.

Giannis Is Available. The Cap Isn't.
Boston wants the league's best player. The salary cap wants a word first.