I Spent Time with Google's Diagnostic AI in a Real Clinic. Here's What I Saw.

I’ve been following Google’s AMIE project since they first showed it off in simulated settings. It was impressive then—a conversational medical AI that could reason through diagnostic challenges and talk to patient actors. But simulations are simulations. The real test is whether it works when a real patient, with real anxiety and a real sore throat, sits down to chat with it.

Now we have that data. Google Research, in partnership with Beth Israel Deaconess Medical Center (BIDMC), just published results from a prospective, single-center feasibility study. This isn’t a toy demo. It’s a pre-registered, IRB-approved study where AMIE actually took patient histories before new primary care appointments. Let me walk you through what they did and what I think it means.

How the Study Worked

The setup was straightforward but careful. Patients booked for new, non-emergency episodic complaints—either in-person or via telehealth—were invited to participate during the booking process. The key: they were told their decision wouldn’t affect their care. That’s the kind of ethical baseline we need to see more of.

Participants interacted with AMIE through a secure web link before their physical consultation. The AI conducted text-based chats, but here’s the important part: a physician was watching live via video call with screen-sharing. That physician (called the “AI supervisor”) had a predefined set of safety criteria and could intervene at any point. Think of it like a resident doctor taking a history while an attending watches—except the “resident” is a language model.

After the chat, AMIE generated a transcript and summary for the patient’s actual doctor. The clinician got a comprehensive overview of the pre-visit interaction before seeing the patient. This is smart design. It doesn’t replace the doctor-patient conversation; it augments the information gathering phase.

What I Like About This Approach

First, the safety-first mindset. They didn’t just let the AI loose. The live supervision, the IRB approval, the structured safety criteria—this is how you responsibly test a system that could literally harm someone if it gets things wrong. Too many AI health demos skip this part.

Second, the workflow integration. AMIE isn’t trying to be the doctor. It’s doing what many clinicians wish they had more time for: taking a thorough history. In primary care, the average visit is 15 minutes. A good chunk of that is asking questions the patient has already answered on a clipboard form. If AI can handle that pre-work, it frees up the doctor to actually think and connect.

Third, the transparency. The transcript and summary go to the clinician, not just the patient. This means the doctor can spot errors, omissions, or weird AI logic. It’s not a black box.

The Catch (There’s Always a Catch)

Let’s be real: this is a feasibility study. That means small sample size, single site, highly controlled conditions. We don’t know how AMIE performs with patients who have limited English proficiency, low health literacy, or complex multi-morbidity. The study population was likely skewed toward the tech-comfortable and the relatively healthy.

Also, the AI supervisor model is expensive. Having a physician watch every AI interaction defeats the cost-saving purpose. The hope is that with enough data, the supervision can be scaled back to exception-only monitoring. But we’re not there yet.

And let’s not ignore the elephant in the room: AI hallucinations. Even with fine-tuning, large language models make up facts. In a diagnostic context, that could mean suggesting a symptom the patient never mentioned or missing a critical detail. The study design mitigates this with human oversight, but that’s not a long-term solution.

My Take

This is a legitimate step forward. Not a revolution, but a careful, evidence-based move toward integrating conversational AI into clinical workflows. Google deserves credit for doing the hard, boring work of prospective clinical testing instead of just releasing a flashy demo.

But I’m not ready to let AMIE loose on my grandmother. The safety net is still too thin. What I’d like to see next: multi-site studies with diverse populations, longer follow-up to measure actual clinical outcomes (not just patient satisfaction), and transparent reporting of failure modes. How often did the supervisor have to intervene? What kinds of mistakes did AMIE make? The paper should include those numbers.

For now, AMIE is a promising tool for the pre-visit history-taking task. It’s not a doctor, nor should it be. But if it can save primary care physicians 5-10 minutes per patient while maintaining safety, that’s a win. Let’s just make sure the evidence supports it before we roll it out everywhere.

I Spent Time with Google’s Diagnostic AI in a Real Clinic. Here’s What I Saw.

How the Study Worked

What I Like About This Approach

The Catch (There’s Always a Catch)

My Take

Comments (0)