How to Add Captions in CapCut: A Complete Guide
Captions can make or break a video. They boost accessibility, improve watch time on muted feeds, and give your content a polished, professional feel. CapCut — one of the most popular free video editors available on mobile and desktop — offers several ways to add captions, from manual text overlays to AI-powered auto-captions. Here's how each method works and what shapes the experience for different users.
What "Captions" Actually Means in CapCut
In CapCut, the word "captions" covers two distinct things:
- Manual text overlays — text you type and position yourself, synced manually to your video timeline
- Auto captions — speech-to-text captions generated automatically by CapCut's AI based on the audio in your clip
Both produce on-screen text, but they work differently, look different by default, and suit different use cases. Knowing which method you're using matters before you start.
Method 1: Using CapCut's Auto Captions Feature 🎙️
Auto captions are the fastest route if your video has clear spoken audio. CapCut's built-in speech recognition transcribes your dialogue and places synced caption blocks directly on the timeline.
How it works on mobile (iOS and Android):
- Open your project in CapCut and tap Text in the bottom toolbar
- Select Auto captions
- Choose the language of your audio from the dropdown menu
- Tap Start — CapCut processes the audio and generates caption segments automatically
- Review the generated text in the timeline; tap any segment to edit individual words or correct errors
How it works on desktop (CapCut PC/Mac):
- Open your project and click the Text panel in the left sidebar
- Select Auto captions
- Choose your spoken language and click Generate
- Edit individual caption blocks in the timeline as needed
Once generated, you can select all caption segments at once and apply uniform styling — font, size, color, background, and animation — through the text settings panel.
What affects accuracy: Auto caption quality depends heavily on audio clarity, accent recognition, background noise levels, and the language selected. CapCut's recognition performs well with standard accents and clean audio; noisier recordings or strong regional accents may require more manual correction.
Method 2: Adding Captions Manually with Text Overlays
Manual text gives you full control over placement, timing, and styling — at the cost of more time investment.
On mobile:
- Tap Text in the editing toolbar
- Select Add text
- Type your caption text, then tap Done
- In the timeline, drag the text clip to align it with the correct moment in your video
- Drag the edges of the text clip to set how long it stays on screen
- Repeat for each caption segment
On desktop:
- Click Text in the left panel
- Click Add text and type your caption
- Drag and position the text box on the canvas
- Adjust start time and duration in the timeline below
Manual captions are time-intensive for longer videos but give you precise control over every element — particularly useful when syncing text to music or sound effects rather than speech.
Styling and Customizing Your Captions ✏️
Whether you used auto captions or added text manually, CapCut's styling tools are the same. When a caption is selected, you can adjust:
| Setting | Options Available |
|---|---|
| Font | CapCut's built-in font library (hundreds of options) |
| Size | Free scaling via slider or numeric input |
| Color | Solid colors, gradients |
| Background | Colored box, bubble, or none |
| Stroke | Outline thickness and color |
| Shadow | Drop shadow with adjustable offset |
| Animation | Entrance, exit, and loop animations |
| Alignment | Left, center, right |
On mobile, batch styling is available when using auto captions — select all segments and apply changes globally rather than one by one. On desktop, multi-select allows similar batch edits.
CapCut also includes preset caption styles under the Text tab, which apply a complete visual treatment in one tap — useful if you want a consistent look without manually tuning every setting.
Exporting Video With Captions
Captions in CapCut are burned into the video on export by default — meaning they become part of the video image and aren't a separate subtitle file. This is different from platforms that support .srt or .vtt subtitle tracks.
If you need captions as a separate file for upload to YouTube, TikTok, or other platforms with native subtitle support, CapCut's standard export doesn't generate those separately. The export always produces a flat video with captions baked in visually.
Variables That Shape Your Experience
Several factors influence how smoothly caption-adding goes in CapCut:
Platform version: Features differ between the mobile app (iOS/Android), the desktop app, and the browser-based version. Auto captions, for example, may have slightly different language support or UI placement depending on which version you're running. CapCut updates frequently, so interface details shift between releases.
Audio quality: Auto captions rely entirely on what the AI can hear. Even small improvements to recording conditions — a quieter room, a closer microphone — produce noticeably better transcription results.
Video length: Long-form content with auto captions generates many segments, which can be slow to review and edit. Shorter content is considerably more manageable through the same workflow.
Language and accent: CapCut supports a growing list of languages for auto captions, but accuracy is uneven across them. English (particularly US/UK accents) tends to get the most reliable results.
Skill level with the timeline: Manual captioning requires comfort with CapCut's timeline editor — dragging clips, setting in/out points, and layering text over video. Users new to timeline-based editors often find auto captions a more accessible entry point.
The right captioning approach ultimately comes down to your video's audio characteristics, how much editing time you have, the level of visual customization you want, and which version of CapCut you're working in. Each of those variables tilts the answer in a different direction — and only your specific project can determine which method fits.