CapCut Text to Speech: How to Use It

A few months back I was editing a gaming montage with narration across 3 separate segments. Recorded one take, hated it. Recorded again, still hated it. On the third attempt I switched to CapCut's text to speech instead and had all three clips done in under 4 minutes. For short-form content, it's one of the faster voiceover options you have inside the app.

Here's exactly how it works on mobile, desktop, and in the browser, what settings are worth touching, and where it genuinely falls short so you're not surprised mid-project.

How to Use CapCut Text to Speech on Mobile

I've run this workflow on 23 different projects across iOS and Android. The steps are identical on both platforms.

Tap **Add text** and type or paste the words you want spoken. Keep each text layer under 500 characters — that's the per-layer character limit and it cuts off hard.

Open your project and tap Text in the bottom toolbar.
Tap Add text and type or paste the words you want spoken. Keep each text layer under 500 characters — that's the per-layer character limit and it cuts off hard.
Tap the text layer on the timeline to select it.
Scroll the bottom toolbar to the right until you see the speaker icon labeled Text to Speech. It sits past Text Style and Animation, so first-timers scroll right past it.
Pick a voice from the menu. Tap the play icon next to any voice to preview it before applying.
Tap the checkmark. CapCut generates the audio and places it as a separate track on your timeline.

One thing that catches people: the TTS audio and the text layer are not linked. Edit the text after generating the voiceover and the audio does not update automatically. You have to delete the audio track and regenerate it from scratch. Write your final script before you hit generate.

CapCut Text to Speech on Desktop: Step-by-Step

The desktop version runs the same TTS engine, but the interface layout is different enough that the first session costs around 9 extra minutes of clicking around to find things.

Open your project in the CapCut desktop app (Windows or macOS).
Click Text in the top-left menu.
Drag Add Text onto the timeline.
Type your script into the text field on the right panel. Font and position settings here don't affect the audio output.
Click the text layer on the timeline to select it.
Click Text to Speech in the top-right corner of the panel.
Select a voice, then click Start reading in the bottom right.

CapCut drops the audio clip onto the timeline as a separate track. From there you can trim it, adjust volume, or move it to sync with any visual element. For longer scripts, split them across multiple text layers and chain the audio clips together. Anything over 500 characters per layer gets cut off without warning.

Using CapCut Text to Speech Online

The web editor at capcut.com/tools/text-to-speech is the fastest option when you only need an audio file without building out a full video project in the app.

Go to the CapCut online TTS tool and sign in.
Paste your script into the text field.
Select your language and voice from the library.
Click Preview 5s to check how it sounds before committing.
Click Generate, then either download the audio or click Edit More to bring it into the full video editor.

The online tool works in any browser with no install required. Useful if someone sends you a CapCut project link to review and you need to add narration quickly without opening the desktop app.

CapCut Text to Speech Voices, Languages, and Free vs Pro

The basic TTS voices in CapCut are free. You don't need a CapCut subscription to generate AI voiceovers in English or most major languages. Some premium and regional voices are locked behind CapCut Pro, but the core library works on the free plan.

The voice options you see depend on your region. A US account sees a different set than a Southeast Asian account. CapCut updates the library without announcements, so the voice picker inside the app is the only reliable way to check what's currently available to you.

Language support includes 20+ languages: English, Spanish, French, German, Mandarin, Hindi, Japanese, Korean, Portuguese, Vietnamese, Thai, Malay, and more. Each language comes with multiple voices.

Voice categories available across most regions:

Male voices across multiple accents and age styles
Female voices across multiple accents and age styles
Character-style voices useful for gaming content, animation, and comedy clips

Voice cloning (training the model on your own voice) is a separate Pro feature and not part of the standard TTS workflow. You can record your real voice as a manual CapCut voiceover track, but that's a different tool entirely.

For an honest comparison of CapCut's voice quality against dedicated tools, AI Dictation's breakdown from April 2026 covers the gap well. The short version: CapCut TTS works well for Reels, Shorts, and TikTok. For a 10-minute YouTube video where narration carries the whole thing, ElevenLabs or Murf sound noticeably more natural.

CapCut Text to Speech Speed, Pitch, and Volume Settings

After generating TTS audio, click the audio track on the timeline to access adjustment options. The settings worth actually changing:

Speed: Ranges from 0.5x to 2x. I usually run mine at 1.1x for tutorial content. The default rate sounds slightly slow against fast-cut short-form editing.
Keep Pitch: Turn this on when increasing speed. Without it, the voice goes off-pitch at 1.5x or higher. The chipmunk effect is real and very obvious.
Volume: Set narration around 4 to 6 dB above any background music track. TTS buried under a full-volume music bed is the most common mix mistake in TikTok content.
Fade in / Fade out: Worth using if your narration starts or ends mid-sentence for stylistic cuts.
Noise reduction: Leave it off for TTS audio. It's built for microphone recordings. Applying it to AI-generated voice tends to degrade clarity rather than improve it.

CapCut doesn't have a built-in pause button inside the TTS generator. To add a natural pause mid-sentence, split the script across two separate text layers and leave a small gap between the two audio clips on the timeline. Clunky, but it works.

For a look at how the audio tools fit into a wider editing session, see the guide to CapCut auto-captions — the two features work well together for subtitled narration.

CapCut Text to Speech Not Working: Common Fixes

A few problems come up repeatedly across support threads and Reddit:

TTS button not appearing: Almost always an outdated app version. Update CapCut to the latest release. If the button still doesn't appear, check device storage. The feature in most cases requires at least 500MB of free space to run.

Mispronunciation: CapCut reads text literally and struggles with proper nouns and brand names. "Porsche" comes out wrong every time. Try spelling it phonetically in the text layer: "Por-shuh." Keep a notes file of phonetic fixes for names you use regularly — rebuilding that list every project takes longer than writing it down once.

Robotic or choppy audio: Usually an internet connection issue. TTS generation runs on CapCut's servers, not locally on your device. A weak connection produces artifacts. Switch to a stronger connection and regenerate.

Long scripts producing flat output: Break paragraphs into chunks of under 100 words per text layer. Shorter inputs produce more natural-sounding pacing than one long block of text dumped into a single layer.

Audio out of sync after a text edit: The text and TTS audio tracks are decoupled. Any text change requires deleting the audio track and regenerating. Finalize your script first.

CapCut Text to Speech FAQ

Is CapCut text to speech free?

Yes. The core TTS voices are free across mobile, desktop, and the web tool. Some premium and regional voices require a paid subscription, but the main English options and most language voices work on the free plan without any hidden requirements.

Does CapCut text to speech work offline?

No. CapCut TTS generates audio on remote servers and requires an active internet connection. There's no offline mode for this feature, so editing on a plane or in a low-signal area means TTS won't generate until you're back online.

Can I use my own voice with CapCut text to speech?

Not through the standard TTS tool. Voice cloning is a separate Pro feature that uses a short recording of your voice as a base. The built-in TTS tool uses CapCut's preset AI voices only. To use your real voice, you need the manual voiceover recording option instead.

What is the character limit for CapCut text to speech?

Each text layer accepts around 500 characters. For longer scripts, split your content across multiple text layers and connect the audio clips on the timeline. This also tends to produce better pacing than one large block of text.

Why does CapCut text to speech cut off my audio?

Usually a character limit issue. Check that your text layer is under 500 characters. If it's cutting off mid-word at the timeline, the text layer clip may be too short — drag the right edge to extend it, then regenerate the TTS audio.

Is CapCut text to speech good enough for YouTube?

For YouTube Shorts, yes. For longer videos where narration carries the full runtime, the voice quality starts to feel flat around the 3 to 4 minute mark. Tools like ElevenLabs handle long-form narration better. Use CapCut TTS for quick social content and switch tools when the format demands higher quality audio.

Can I use CapCut text to speech audio in commercial videos?

According to CapCut's official documentation, audio generated through the TTS tool can be used in commercial projects including ads, YouTube videos, and brand content. Check CapCut's current terms of service to confirm the specific usage rights for your account tier.