Google's Vantage Experiment Uses GenAI to Grade Soft Skills

Google Research just dropped something interesting: Vantage, a research experiment that uses generative AI to assess “future-ready” skills like critical thinking, collaboration, and creative thinking. It’s now available for sign-up on Google Labs, and the results from a study with New York University show AI scoring is on par with human experts.

Let’s be real for a second. The term “future-ready skills” gets thrown around a lot, but the OECD Learning Compass 2030 and WEF’s Future of Jobs report both point to the same core competencies: critical thinking, collaboration, and creative thinking. These aren’t new—they’ve been essential long before anyone heard of ChatGPT. But with AI automating more technical tasks, these human skills are suddenly the hot commodity.

The problem is measuring them. Traditional tests are rigid, multiple-choice garbage that can’t capture how someone actually thinks or interacts in a real-world scenario. You can’t grade conflict resolution with a bubble sheet. And human assessment? Too expensive, too inconsistent, and too hard to scale. How do you fairly assess a group that never disagrees, or someone who builds on others’ ideas when the team settles on the first suggestion?

Vantage’s approach is clever. Learners jump into multi-party conversations with AI avatars. These aren’t simple chatbots—they’re dynamic, open-ended scenarios like preparing for a debate or pitching a creative vision. An “Executive LLM” constantly analyzes the conversation and steers the AI avatars to introduce specific challenges. Pushback on an idea, a manufactured conflict, a sudden constraint. It’s a next-gen adaptive assessment engine that gathers enough data to score the user by the end of the conversation.

Here’s where it gets interesting. The research claims the AI scoring matches human experts. I’ve seen this claim before in other domains, and it usually comes with caveats. The study was with NYU, but we don’t know the sample size or the diversity of scenarios yet. The tech report is available, and I’d want to dig into the rubric details before fully buying in.

But the bigger question is philosophical: should we be automating the assessment of human skills? I get the scalability argument—teachers are overworked, and standardized tests are broken. But there’s something deeply weird about having an AI judge your ability to collaborate or think critically. The very act of performing for an AI might change what’s being measured. Are you demonstrating genuine collaboration, or are you just gaming the system that’s trying to game you?

Vantage is positioned as a “sandbox environment for practice and validated assessment.” That’s smart—it lowers the stakes. Students can screw up, learn, and try again without a human teacher’s judgment hanging over them. But Google is also pitching this as a tool for educators to “align lessons with these skills.” That’s where I get nervous. Once AI scoring becomes part of the curriculum, it changes the incentive structure. Students will optimize for what the AI rewards, not necessarily for genuine skill development.

I’ve been in this field long enough to remember when automated essay scoring was going to revolutionize writing assessment. It didn’t. It produced formulaic, keyword-stuffed garbage that taught students to write for machines. Vantage is more sophisticated, but the same dynamics apply.

The experiment is worth watching. The methodology is sound, and the partnership with NYU gives it academic credibility. But I’d like to see independent replication, longer-term studies on learning outcomes, and some honest discussion about the limitations. Google’s blog post is predictably optimistic—they’re not going to highlight the failure modes in a launch announcement.

For now, Vantage is an interesting research experiment. If you’re an educator or a student, sign up and play with it. See how it feels to be assessed by AI avatars. My bet is that the novelty wears off fast, and the real value might be in the practice environment rather than the assessment itself. But that’s a harder sell than “AI matches human experts.”

I’ll be watching this space, and I’ll be skeptical. Because the last thing we need is another layer of automated judgment on top of an already broken education system.

Google’s Vantage Experiment Uses GenAI to Grade Soft Skills

Comments (0)