Person speaking to a phone using AI voice assistant

I spent the last few weeks doing something I suspect a lot of people are quietly curious about but haven't fully committed to: I sat down and properly tested the conversational voice AI offered by Claude, Gemini, and ChatGPT. Not a quick demo, not a canned use case — actual conversations, back and forth, the kind you'd normally have with a person. I wanted to know which one, right now in 2026, is actually usable as a conversational partner.

The verdict is unambiguous. ChatGPT wins by a landslide. But the more interesting story isn't who won — it's what the winner is quietly about to do to an entire industry that most people haven't thought about yet.

The Claude Problem: Brilliant AI, Broken Conversation

I'll start with Claude, because I use it every day as my primary text-based tool and I genuinely wanted it to be good here. It isn't. At least not yet.

The core problem is a UX failure that makes fluid conversation nearly impossible: while Claude is speaking out loud — while the audio is playing and the model is mid-response — it interprets any ambient noise, any word you start to say, as a new input. It doesn't know the difference between "I'm still talking, wait for me" and "the user has started a new turn." The result is a loop of broken responses and misread interruptions that makes the experience genuinely uncomfortable to use.

You try to jump in with a quick "yes, go on" or a natural interjection the way you would with a human, and the model either stops dead and starts a new response, or produces something incoherent because it was already mid-sentence when your word got processed. After a few minutes, you stop speaking naturally and start waiting in complete silence until the model fully stops — which is not how humans talk, and defeats the entire point of conversational AI.

The technology underneath Claude's voice mode is excellent. The experience of using it is not. These are two different problems, and the second one is the only one that matters to a real user.

I want to be clear that I fully expect this to get fixed. Anthropic is moving fast, and the underlying model quality is not in question. But right now, as of today, the voice UX is not ready for the kind of fluid, natural back-and-forth that makes conversational AI genuinely useful. If you try it expecting something close to a phone call with a very smart person, you'll be frustrated.

Gemini: Smoother, But Still Not Quite There

Google's conversational AI is a noticeably more polished product than Claude's. The turn-taking is better handled, the voice quality is pleasant, and there's clearly more engineering attention paid to the UX layer. You can feel the difference within the first few exchanges.

But more polished is not the same as good. What I kept running into with Gemini was a subtle but persistent sense of friction in the conversation flow. There's a slight hesitancy in how it hands the turn back to you, a moment where you're not quite sure if it's finished or if there's more coming. In text, that ambiguity doesn't matter — you see the response and know it's done. In voice, that fraction-of-a-second uncertainty accumulates over a conversation and produces something that feels slightly off-rhythm, like talking to someone with a minor lag on a video call.

It's workable. I wouldn't say the experience is bad — it's genuinely better than Claude's current state. But "workable" and "natural" are different thresholds, and Gemini sits somewhere in between. You can have a conversation, but you're aware you're having a conversation with a machine in a way that prevents you from fully relaxing into it.

ChatGPT: The One That Actually Works

Then there's ChatGPT, and the difference is immediately apparent. Within the first sixty seconds of using it I stopped thinking about the mechanics of the interaction and started just talking. That transition — from awareness of the tool to unselfconscious use of the tool — is the only benchmark that matters for conversational AI, and ChatGPT is the only one of the three that currently clears it.

The turn-taking feels natural. Interruptions are handled gracefully. The voice quality is good. The response latency is low enough that you don't sit in silence wondering if it heard you. All of these technical elements combine to produce something that, in practice, feels less like interacting with a piece of software and more like talking to someone who happens to know everything and has infinite patience.

The experience is fluid, pleasant, and — importantly — reliable. I didn't have to adjust my speaking style, speak more slowly, wait longer between turns, or develop any compensatory habits. I just talked, and it talked back. That sounds like a low bar. It isn't, based on what I found with the other two.

The Feature That Changes Everything: Speak Any Language, Get Answered in Kind

Here's where the story gets genuinely interesting. ChatGPT's voice mode has a capability that I don't think gets nearly enough attention: you can start the conversation in any language, and the model will respond in that same language, naturally, without any setup, without switching modes, without telling it anything.

Start in French, get French. Switch mid-conversation to German, get German. Open with Mandarin, get Mandarin back. The model tracks the language of the conversation and mirrors it automatically. And because the underlying language model is genuinely multilingual at a deep level, the quality of the conversation doesn't degrade when you move away from English. You're not talking to a translation layer. You're talking to a model that actually operates in the language you're using.

You can speak for an hour in Italian about whatever you're actually interested in — architecture, cinema, football, local politics — and get thoughtful, contextually appropriate responses the entire time. No course material required. No curriculum. Just conversation.

The implications of this feature are more significant than most people have processed yet.

Who's Getting Disrupted: Duolingo, Babbel, Preply, and Language Conversation Programs

Language learning as an industry is built on a few distinct product categories. There are gamified apps like Duolingo and Babbel that teach vocabulary and grammar through short exercises. There are marketplace platforms like Preply and iTalki that connect students with human tutors for conversation practice. There are formal exchange programs and immersion courses. And there are the thousands of independent conversation tutors who charge by the hour to practice speaking with you.

ChatGPT's voice mode, at current quality levels, is a direct substitute for the most valuable part of all of them: conversational practice.

Duolingo and Babbel have always known that their real vulnerability is that apps are bad at conversation. You can memorise vocabulary through a phone game, but you can't learn to actually speak a language that way. They've built features to address this — Duolingo has a speaking practice mode, Babbel has live classes — but neither delivers the thing a learner actually needs: a patient, available-anytime native-level conversation partner who will talk about whatever you want to talk about. ChatGPT does exactly that. At a price point that makes the competition look absurd.

Preply and iTalki are more directly threatened. Their core product is human conversation tutors — real people who you book for 30-minute or 60-minute sessions at anywhere from $10 to $80 per hour depending on the teacher. Those sessions are valuable precisely because they're conversational. You're not doing grammar exercises; you're practising speaking with a human who can correct you, react to you, and push back. ChatGPT voice does all of that, except it's available at 2am, never needs to be booked in advance, never runs out of patience, and costs a few dollars a month for unlimited access.

Language exchange programs and immersion experiences serve a real purpose that AI can't replace — the cultural immersion, the social relationships, the experience of being in a country and navigating it in real time. But for the vast majority of people who enrol in these programs primarily for the speaking practice, the calculation is changing fast.

The economics are brutal. For the price of one or two hours with a Preply tutor, you can have an AI conversation partner available every hour of every day for an entire month, in any language you choose, on any topic that interests you.

What This Means for Language Learning Costs

Speaking practice has always been the bottleneck in language learning, and it's always been expensive. Hiring a native speaker costs real money. Finding conversation exchange partners takes time and scheduling effort. Immersion programs cost thousands. The resources have always been there for people willing to pay for them, but most people learning a language are doing it on the side — a few hours a week, fitting it around work and life — and paying $40 an hour for a tutor three times a week is not a realistic option for most of them.

ChatGPT changes this completely. For a few dollars a month — less than a single Preply session — you get unlimited conversational practice in any language, at any hour, at a quality level that is genuinely good enough to improve your speaking ability. You can talk for three hours on a Sunday about the topics that actually interest you, in the language you're learning, without booking anything in advance or paying per minute.

This is not a marginal cost reduction. It's a structural collapse of the price floor for one of the most valuable services in language education. The equivalent of what you'd have paid $500 a month for five years ago now costs less than a streaming subscription.

This Is Happening Now, Not Eventually

I want to resist the temptation to frame this as a future prediction, because it isn't one. The capability exists today, it works well today, and anyone with a ChatGPT subscription can test it themselves in the next five minutes. The disruption isn't coming — it's already here. The lag is just in awareness.

Duolingo, Babbel, and Preply will not disappear overnight. There are real switching costs, real habits, real social dimensions to their platforms that don't vanish just because a better tool exists. But the value proposition of paid conversational practice — the premium, highest-value part of the language learning market — has just been fundamentally undercut by a tool that costs almost nothing and is available to anyone.

The companies that survive this will be the ones that offer something AI genuinely can't: human presence, cultural authenticity, social accountability, the real-world stakes of a face-to-face conversation with a person who has their own life and perspective. The companies that don't find that differentiation will look, within a few years, very much like the homework-help platforms looked after ChatGPT launched.

The next question — one I've been sitting with — is what happens when this same conversational capability gets a physical body. A voice that works this well in an app is one thing. A voice in a robot is something else entirely.

As for Claude and Gemini — I'll be testing them again in six months. The gap with ChatGPT in voice today is real, but this space is moving fast. What's true now may not be true by the time you read this. But today, if you want to practise your French over breakfast or work on your Mandarin during a commute, there's only one answer worth recommending.

Jaime Delgado

Jaime Delgado

Product Analyst & AI early adopter

Jaime has been tracking the AI landscape since the GPT-3 era. He writes about AI capabilities, model comparisons, and practical applications for builders and founders. His daily driver is Claude inside Visual Studio Code — though he also reaches for Grok, Gemini, and ChatGPT when the question is quick and the context is light. He stays genuinely open to every AI that comes along: the landscape moves fast, and so does he. Based in Spain.

View on LinkedIn