If you’ve ever delivered subtitles for OTT platforms or YouTube, you’ve likely faced two painful problems:
- The platform asks for a subtitle format you don’t have (TTML, WebVTT, etc.).
- The subtitles “work” but fail in real viewing because of safe area, timing, or readability issues.
Quick Answer
- Use SRT when you need a simple, widely compatible subtitle file with basic timing and text.
- Use VTT (WebVTT) when you need subtitles for web players and want more styling/positioning support than SRT.
- Use TTML when you are delivering to OTT/broadcast pipelines that require structured, styled, XML-based subtitles (often with strict spec compliance).
Most teams succeed with this approach:Author in SRT (fast) → Convert to VTT/TTML (as required) → QC again after conversion (mandatory).
1) What Subtitle Formats Are
Subtitles are basically timed text + rules.
Each format is a container that defines:
- How timecodes are written,
- How text is stored,
- How styling/positioning can be expressed.
Your “subtitle quality” depends on more than translation:
- Timing,
- Segmentation (line breaking),
- Reading speed,
- Safe area placement,
- Consistency rules,
- Platform expectations.
2) SRT vs VTT vs TTML: Comparison Table
| Feature | SRT | WebVTT (VTT) | TTML |
| Structure | Plain text | Plain text with cues | XML-based (structured) |
| Styling support | Minimal | Moderate (supports more styling/positioning than SRT) | Strong (built for styling, regions, layout) |
| Best for | Simple workflows, broad compatibility | Web players, online video workflows | OTT/broadcast delivery pipelines, strict specs |
| Complexity | Low | Medium | High |
| Common problems | Limited styling; inconsistent handling across players | Some features not consistently supported across players | Easy to “break” with invalid XML/spec mismatches |
| Conversion friendliness | Great source format | Great for web delivery | Requires careful conversion + strict QC |
If your client is an OTT platform, assume you will eventually need TTML (or a TTML-compatible variant), even if you start with SRT.
3) When to Use Each Format
When SRT is best
Use SRT when:
- Speed matters,
- You need maximum compatibility,
- You’re building a working master subtitle file,
- Styling requirements are minimal.
Best for: initial translation workflows, internal reviews, quick pilots, creator content.
When VTT is best
Use WebVTT when:
- Delivery is primarily web-player based,
- You need better handling of cues, positioning, or certain styling features,
- You are working in web-first pipelines.
Best for: web video platforms, certain creator pipelines, some internal players.
When TTML is best
Use TTML when:
- The delivery requirement is OTT/broadcast-grade,
- The platform enforces formatting and layout rules,
- You need structured styling/regions to comply with spec expectations.
Best for: enterprise OTT pipelines, distribution partners, spec-driven deliveries.Important note: Different platforms use different TTML profiles/requirements. Treat TTML as “spec-sensitive.” Always validate against the platform’s delivery sheet.
4) Subtitle Safe Area
Safe area is one of the most common failure points—especially for vertical video and mobile viewers.
What “safe area” means
Safe area is the region where text remains readable and not blocked by:
- UI overlays (play/pause bars, progress bars, captions toggles),
- Platform controls,
- Mobile gestures and cutouts,
- Other on-screen elements.
Why safe area is not “one universal box”
A common misconception is “just keep subtitles 10% above the bottom.”
In reality, safe area is:
- Platform dependent (different players overlay UI differently),
- Format dependent (16:9 vs vertical),
- Context dependent (controls appear/disappear).
Practical safe area rules that work in real deliveries
- Avoid placing subtitles too low; keep enough bottom margin for player UI.
- For vertical video, assume the bottom UI overlays are heavier.
- Avoid placing subtitles over important on-screen text (names, labels, chat overlays).
- Test subtitles on mobile (not only desktop preview).
- If the platform supports positioning, use it carefully and re-QC.
If you are delivering to multiple platforms, safe area should be treated like a spec item, not an afterthought.
5) Timing and Spotting Rules
Subtitles fail most often because of timing—not translation.
A) Timecode alignment
- Subtitles should appear when speech starts (not late)
- Disappear when speech ends (not abruptly early).
- Avoid extremely short “flash” cues that can’t be read.
B) Reading speed (CPS) discipline
Reading speed is typically measured as characters per second (CPS).
If CPS is too high, viewers miss text and drop engagement.
Practical rule: keep subtitles readable for the average viewer, especially on mobile. If your content has fast dialogue, you must:
- Compress lines (without losing meaning),
- Improve segmentation,
- Avoid over-long cues.
C) Segmentation (line breaking) matters more than people think
Bad segmentation makes even accurate translation hard to read.
Good segmentation:
- Breaks on natural phrase boundaries,
- Keeps names and verbs together,
- Avoids splitting articles/prepositions awkwardly,
- Avoids breaking numbers and units.
D) Overlap and continuity
- Avoid overlaps that cause flicker or hidden cues.
- Maintain consistency in timing rhythm across episodes.
6) Formatting Rules
Line length and layout
A common best practice is:
- Keep lines short enough to read comfortably,
- Avoid cramming too much into one cue,
- Use 1–2 lines where possible,
- Avoid three-line subtitles unless the platform explicitly allows it.
Speaker identification
If two speakers share a cue:
- Use hyphens or speaker labels (based on your style guide),
- Ensure clarity without overloading text.
SDH and closed captions
SDH includes non-dialogue cues like:
- [door slams]
- [music]
- Speaker IDs when needed
This is required for accessibility in many workflows, but it must follow the platform’s style rules.
Consistency rules
- Names spelled consistently across episodes
- Terminology consistent (use a glossary)
- Punctuation style consistent
- Numerals style consistent (e.g., “10” vs “ten” based on guide)
7) Conversion Workflows
Conversion is not a mechanical step. It’s a quality risk.
SRT → VTT conversion
Usually straightforward, but you must QC:
- Timing precision differences,
- Cue formatting changes,
- Any styling cues you rely on.
SRT → TTML conversion
TTML conversion is where teams get burned.
You must QC:
- XML validity
- Timecode format compatibility
- Whether special characters and punctuation are preserved correctly
- Styling/region compliance (if used)
- Line break behavior (often changes during conversion)
Best practice: always perform a post-conversion QC pass before delivery.
8) Subtitle QC Checklist
Use this checklist before sending files to an OTT platform or client.
A) Technical QC
- Correct format requested (SRT / VTT / TTML)
- Correct encoding (no garbled characters)
- Timecodes valid, consistent, no negative durations
- No overlaps that break playback
- No cues too short to read (“flash” cues)
- Correct frame rate/timebase assumptions (where relevant)
B) Readability QC
- Reading speed reasonable
- Line breaks are natural
- No overcrowded cues
- Punctuation supports readability
- Italics used consistently (if required)
C) Language QC
- Meaning accurate (not literal-only)
- Names/terms consistent (glossary applied)
- Tone and register consistent
- Spelling and grammar clean (proofread)
D) Safe area and viewing QC
- Check on mobile playback
- Ensure subtitles are not covered by UI overlays
- Ensure subtitles do not block critical on-screen text
9) Common Mistakes That Cause Rejections
- Delivering the wrong format (or wrong TTML variant)
- Skipping post-conversion QC (conversion can break layout and timing)
- Subtitles placed too low (covered by platform UI)
- High reading speed (viewers can’t keep up)
- Bad segmentation (hard to read even if accurate)
- Inconsistent terminology and names (breaks continuity in series)
- No style guide (different episodes feel “different”)
- No version control (v1/v2 confusion and wrong files go live)
If you’re delivering subtitles at scale, the fastest way to reduce rejections is a standard format + QC workflow.
If you share:
- a sample episode
- target platform requirements (format + any style constraints)
- target languages
We can propose a delivery-ready subtitle workflow including translation, proofreading, and QC. Contact Sukudo Studios Today!
FAQ: SRT vs VTT vs TTML
There is no universal best. SRT is simplest and widely supported. VTT is better for web workflows with more cue features. TTML is best for spec-driven OTT/broadcast deliveries.
Because TTML supports structured styling, layout rules, and spec compliance in enterprise pipelines. It is more predictable for platform ingestion when done correctly.
You can convert it, but you must QC after conversion. TTML is spec-sensitive and conversion can break timing, line breaks, or formatting.
Safe area issues—subtitles are positioned too close to the bottom, and UI overlays cover them. This is common on mobile.
Spotting is the timing and segmentation process: deciding when subtitles appear/disappear and how text is split into readable lines.
CPS means characters per second. If CPS is too high, viewers can’t read subtitles comfortably, especially on mobile.
YouTube supports common formats including SRT and VTT. What matters most is timing accuracy and readability.
SDH (Subtitles for the Deaf and Hard-of-hearing) includes sound cues and sometimes speaker IDs. It’s required for accessibility in many distribution workflows.
Skipping post-conversion QC. Conversion can change line breaks and timing and introduce errors that weren’t present in the source file.
A sample video/episode, transcript (if available), target languages, platform delivery specs (format + style requirements), and any glossary/style rules.

