Person creating video content with AI voice tools on a laptop

For a long time, InShot was enough. It's a solid mobile app, the editing workflow is fast, and if you need a quick voiceover to narrate a clip you don't want to record yourself — it was right there, built in. No extra apps, no accounts, no friction. I used it for months on my YouTube and TikTok content without thinking too hard about it.

Then I tried ElevenLabs, and I understood immediately that I had been settling.

What InShot Gets Right — and Where It Falls Short

To be fair to InShot: it does what it says on the tin. The video editing tools are genuinely good, the interface is clean, and the text-to-speech feature works. If you type a script and need a voice to read it out, InShot will produce audio that is functional and adequate.

But adequate is the ceiling. The voice selection is thin — a handful of generic options that sound exactly like what they are: AI voices that nobody has put much time into. There's no control over pacing, no control over emotion, no way to make the output sound like it was made for your brand rather than for any random piece of content on the internet. You choose a voice, you generate the audio, and you accept what you get.

For casual content, that's fine. For content you actually care about — content you want to build an audience with — it starts to feel like a ceiling.

The voice is the first thing a viewer hears. If it sounds generic, everything else you've built around it sounds generic too.

Discovering ElevenLabs

I came across ElevenLabs the same way I find most tools: someone mentioned it in passing, I looked it up out of curiosity, and twenty minutes later I was testing it on a script I had been planning to record with InShot that same afternoon.

The first thing that hit me was the voice library. Not a handful of options — hundreds of voices, sorted by accent, age, style, tone, and use case. Narration voices, conversational voices, authoritative voices, warm voices, neutral voices. Voices that sound American, British, Australian, Irish, Latin American. Voices for news, for storytelling, for corporate explainers, for social media. You can preview any of them in seconds with your own text, right inside the platform.

I spent twenty minutes just browsing. That's not wasted time — that's twenty minutes of realising how much more is possible.

The Voice Library: Breadth That Actually Matters

The variety in ElevenLabs isn't just cosmetic. Different voices serve different content in meaningfully different ways, and having the right one changes how your video lands with a viewer.

I produce content on different topics — some more analytical, some more casual. With InShot, I used the same voice for everything because there weren't enough options to justify switching. With ElevenLabs, I have a shortlist of three or four voices that I rotate depending on what the video needs. A more measured, authoritative voice for explainer content. A faster, more energetic one for TikTok clips. A warmer tone when the content calls for something personal.

These are not small differences. The voice shapes how the content feels — how credible it sounds, how watchable it is, how much the viewer trusts what they're hearing. Getting that right is not a detail. It's part of the product.

Customization: The Part InShot Doesn't Have

Here's the feature gap that really ended the InShot comparison for me: ElevenLabs lets you adjust how any voice sounds, down to a granular level.

Stability controls how consistent the voice stays across a long output — higher stability means more uniform delivery, lower stability allows more natural variation. Clarity affects how clean and crisp the speech sounds. Style exaggeration pushes or pulls back on the expressive qualities of the voice. Speaker boost sharpens the presence of the voice in the mix.

You can run the same script with the same voice at five different settings and get five meaningfully different outputs. One sounds almost robotic in its consistency. Another sounds like an actual person who got slightly tired at the end of a long paragraph. You choose what fits.

InShot gives you a voice. ElevenLabs gives you a voice you can actually shape into something that sounds like yours.

For YouTube, I tend to push stability higher and keep style exaggeration low — I want consistent, clean audio that doesn't distract from the visuals. For TikTok, I dial back stability slightly and let the voice breathe more. The platform is different, the pacing is different, and the voice should feel different too.

Voice Cloning: The Feature That Changes Everything

The capability that separates ElevenLabs from everything else in this space is voice cloning — the ability to upload a sample of your own voice and have the platform generate audio that sounds like you.

I used this for the first time on a video where I wanted my own voice narrating the script but didn't have time to record it properly. I uploaded a two-minute audio sample — nothing fancy, just me talking clearly into my phone — and within minutes I had a voice model that could read any script I gave it and sound like I had recorded it myself.

The quality is not perfect. A human listener who knows you well will notice the difference. But for a viewer who has never heard your voice before? It's indistinguishable. And for content creators who want to build a consistent voice identity without being chained to a recording setup, it changes the workflow entirely.

You can batch generate narration for five videos in the time it would take to record one. You can produce content at the same quality standard when you're traveling, when you're sick, when you simply don't have the time to sit down at a microphone. The voice is always there, always consistent, always ready.

The Workflow Now

My current process is simple. I write the script — usually in a notes app, sometimes directly in the ElevenLabs interface. I choose my voice (or use my cloned voice for content where that fits). I generate the audio, download it, and import it into my editing software alongside the video. The whole step takes maybe three minutes for a typical short-form video.

What I gave up: nothing. InShot's voice feature was fast, but ElevenLabs is not meaningfully slower. The extra two minutes of choosing settings and generating a preview is the difference between a generic output and one I'm actually happy with.

What I gained: control, quality, and the ability to build a consistent audio identity across my content. These are not small things if you're serious about what you're producing.

If you're still using InShot's built-in voices — or any built-in voices — I'd genuinely encourage you to try ElevenLabs for one video and see whether you can go back. In my experience, most people can't.

Try it yourself

ElevenLabs — AI Voice That Sounds Like It Was Made for Your Content

Hundreds of voices, deep customization, and voice cloning. The free tier lets you generate up to 10,000 characters per month — more than enough to test it properly on your own videos.

Try ElevenLabs free →

Two Alternatives Worth Knowing

ElevenLabs is my recommendation without hesitation — but it's not the only serious player in AI voice. Here are two alternatives that are worth a look depending on what you need.

Alternative 1

Murf AI

Murf is the alternative I recommend to people who want a more guided, professional-production feel. It comes with a built-in studio interface — you paste your script, assign voices to different parts, adjust timing, and export a finished voiceover with background music baked in if you want it. The voice quality is excellent and the voice library covers all the major accents and styles. Where ElevenLabs is a power tool, Murf is a complete production environment. Slightly less raw customization than ElevenLabs, but the workflow is more structured and the learning curve is gentler.

Visit Murf AI →
Alternative 2

Play.ht

Play.ht sits closest to ElevenLabs in terms of raw voice generation capability. The library is large — over 900 voices across 142 languages — and the output quality is genuinely competitive. Where it differentiates is in the API: Play.ht is built with developers in mind, and if you're building a product or pipeline that needs voice generation at scale, the API is mature and well-documented. For individual content creators, the platform interface is good but slightly less polished than ElevenLabs. Worth trying if you find ElevenLabs outside your budget or you need multilingual output at volume.

Visit Play.ht →
Jaime Delgado

Jaime Delgado

Product Analyst & AI early adopter

Jaime has been tracking the AI landscape since the GPT-3 era. He writes about AI capabilities, model comparisons, and practical applications for builders and founders. His daily driver is Claude inside Visual Studio Code — though he also reaches for Grok, Gemini, and ChatGPT when the question is quick and the context is light. He stays genuinely open to every AI that comes along: the landscape moves fast, and so does he. Based in Spain.

View on LinkedIn