AI's Underpants Gnomes Problem

Back in February, I picked up a flyer at an anti-AI march in London. I can’t say for sure whether the writers meant to riff on South Park’s underpants gnomes. But if they did, they nailed it: “Step 1: Grow a digital super mind,” it read. “Step 2: ? Step 3: ?”

Produced by Pause AI, the activist group that co-organized the protest, it ended with this plea: “Pause AI until we know what the hell Step 2 is.”

For those who missed the 1998 episode: Kenny, Kyle, Cartman, and Stan discover gnomes stealing underpants. The gnomes’ business plan? Phase 1: Collect underpants. Phase 2: ? Phase 3: Profit. It’s become one of the great internet memes, used to satirize everything from startup strategies to Elon Musk’s Mars mission funding plan.

Right now, it perfectly captures AI. Companies have built the tech (Step 1) and promised transformation (Step 3). How they get there is still a giant question mark.

Pause AI thinks Step 2 must involve regulation. But what kind, and who enforces it, remains fuzzy. AI boosters, meanwhile, are convinced Step 3 is salvation and tend to glaze over the middle bit. OpenAI’s chief scientist Jakub Pachocki recently told me we’re racing toward sunny uplands on the back of an “economically transformative technology.” They know where they want to go—more or less. It’s hazy and still some way off. But everyone’s taking a different route.

For every big claim about the future, there’s a more sober assessment. Consider two recent studies. One from Anthropic predicted which jobs LLMs will affect most. Managers, architects, and media folks should prepare for change; groundskeepers, construction workers, and hospitality workers, not so much. But these predictions are really just guesses based on what tasks LLMs seem good at, not how they actually perform in the workplace.

Another study from February by researchers at Mercor, an AI hiring startup, tested several AI agents powered by top-tier models from OpenAI, Anthropic, and Google DeepMind on 480 workplace tasks done by human bankers, consultants, and lawyers. Every agent failed to complete most of its duties.

Why such wide disagreement? First, consider who’s making the claims and why. Anthropic has skin in the game. Most people telling us something big is about to happen have reached that conclusion largely based on how fast AI coding tools are getting. But not all tasks can be hacked with coding. Other studies have found LLMs are bad at strategic judgment calls.

Also, when deployed, these tools aren’t dropped into a cleanroom. They need to work in places contaminated with people and existing workflows. Sometimes adding AI makes things worse. Sure, maybe those workflows need to be torn up and refashioned around the new technology for it to achieve transformative status, but that takes time and guts.

That big hole is right where Step 2 should be. The lack of agreement on exactly what’s about to happen—and how—creates an information vacuum filled by the latest wild claim of the week, evidence be damned. We’re so unmoored from any real understanding that a single social media post can (and does) shake markets.

We need fewer guesses and more evidence. That requires transparency from model makers, coordination between researchers and businesses, and new ways to evaluate this technology that tell us what really happens when it’s rolled out in the real world.

The tech industry rests on the held-out promise that AI really will be transformative. But that’s not yet a sure bet. Next time you hear bold claims about the future, remember that most businesses are still figuring out what to do with their underpants.

AI’s Underpants Gnomes Problem

Comments (0)