Google’s Vantage Experiment: Using AI to Grade Real-World Skills

Google Research just dropped something interesting: Vantage, a research experiment that uses generative AI to assess the squishy, hard-to-measure skills everyone keeps talking about — critical thinking, collaboration, creative problem-solving. The kind of stuff that supposedly makes us future-proof against automation.

They partnered with New York University, ran a study, and claim the AI scoring is on par with human experts. The experiment is now available on Google Labs for sign-up, targeting high school and college students.

Let me be upfront: I’ve seen this movie before. Every few years, some ed-tech company promises to finally measure “21st century skills” (or whatever we’re calling them this decade) at scale. It usually ends with a rubric so generic it could apply to a bake sale or a board meeting. But Vantage is different enough to pay attention to.

The problem with measuring what matters

Standardized tests are great for measuring how well you memorized the quadratic formula. They’re terrible at measuring whether you can actually work with a team to solve a messy problem. Real-world skills like conflict resolution or building on someone else’s ideas don’t happen in a vacuum — they happen in conversation, with all the awkward pauses, misunderstandings, and power dynamics that entails.

The traditional approach is to have humans observe and rate these interactions. That works for a classroom of 30 students. It falls apart when you’re trying to assess thousands of students across a district. The cost and inconsistency are brutal.

Vantage tries to solve this by putting learners in a simulated conversation with AI avatars. You’re not filling out a multiple-choice test. You’re talking to animated characters who are programmed to push back on your ideas, introduce conflict, or go silent — all designed to give you opportunities to demonstrate your skills.

How it actually works

The setup is straightforward: you join a multi-party conversation with AI avatars working on a task together — preparing for a debate, pitching a creative vision, something that requires actual back-and-forth. Behind the scenes, an “Executive LLM” watches the whole thing, using a predefined rubric to steer the avatars toward effective assessment.

Here’s the clever part: the system dynamically introduces challenges based on what’s happening in the conversation. If everyone agrees too quickly, an avatar might push back. If someone dominates the discussion, another avatar might try to redirect. It’s adaptive in a way a static test never could be.

By the end of the conversation, the system has gathered enough data to score the user across the targeted skills. The researchers claim this approach produces scores that align with human expert ratings. That’s promising, but I’d want to see the full methodology — especially around edge cases where the AI misreads sarcasm or cultural differences.

The elephant in the room

Vantage is still a research experiment. It’s not replacing teachers anytime soon. But it raises an uncomfortable question: if an AI can assess these skills as well as humans, what does that say about how we define them?

The OECD and WEF frameworks are useful, but they’re also reductive. Critical thinking isn’t just “analyzing arguments.” Collaboration isn’t just “building on others’ ideas.” These skills are messy, context-dependent, and often involve unspoken social cues that even humans struggle to articulate.

I’m not saying AI can’t help. I’m saying we should be careful about what we optimize for. If Vantage becomes widely adopted, schools might start teaching to the test — but the test is a conversation with AI avatars. That’s a weird feedback loop.

Still, credit where it’s due: Google Research is tackling a genuinely hard problem. Most AI in education is either flashcard apps or essay graders that miss the forest for the trees. Vantage at least tries to capture the forest.

Whether it succeeds depends on how well the Executive LLM handles the chaos of real human conversation. I’ve seen enough AI chatbots go off the rails to be skeptical. But I’ve also seen enough progress in language models to believe this could work — at least for certain skill domains.

Vantage is available now for sign-up on Google Labs. It’s English-only, aimed at high school and college students. I’ll be curious to see how it handles non-native speakers or students from different cultural backgrounds. Those are the stress tests that will tell us whether this is a genuine breakthrough or just another demo.

Google’s Vantage Experiment: Using AI to Grade Real-World Skills

The problem with measuring what matters

How it actually works

The elephant in the room

Comments (0)