Skip to content

Commit d311fae

Browse files
committed
improve comments on TTML → SRT conversion
- update class header with proper technical references and remove author tag. - update comments of replacing NBSP('\u00A0'), especially adding examples of rendering incorrectly.
1 parent 71aa6d5 commit d311fae

File tree

1 file changed

+30
-9
lines changed

1 file changed

+30
-9
lines changed

app/src/main/java/org/schabi/newpipe/streams/SrtFromTtmlWriter.java

Lines changed: 30 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,11 @@
1515
import java.nio.charset.StandardCharsets;
1616

1717
/**
18-
* @author kapodamy
18+
* Converts TTML subtitles to SRT format.
19+
*
20+
* References:
21+
* - TTML 2.0 (W3C): https://www.w3.org/TR/ttml2/
22+
* - SRT format: https://en.wikipedia.org/wiki/SubRip
1923
*/
2024
public class SrtFromTtmlWriter {
2125
private static final String NEW_LINE = "\r\n";
@@ -135,20 +139,37 @@ private String normalizeLineBreakForSrt(final String text) {
135139
private String normalizeForSrt(final String actualText) {
136140
String cleaned = actualText;
137141

138-
// Replace non-breaking space (\u00A0) with regular space ' '(\u0020).
142+
// Replace NBSP "non-breaking space" (\u00A0) with regular space ' '(\u0020).
143+
//
144+
// Why:
145+
// - Some viewers render NBSP(\u00A0) incorrectly:
146+
// * MPlayer 1.5: shown as “??”
147+
// * Linux command `cat -A`: displayed as control-like markers
148+
// (M-BM-)
149+
// * Acode (Android editor): displayed as visible replacement
150+
// glyphs (red dots)
151+
// - Other viewers show it as a normal space (e.g., VS Code 1.104.0,
152+
// vlc 3.0.20, mpv 0.37.0, Totem 43.0)
153+
// → Mixed rendering creates inconsistency and may confuse users.
154+
//
155+
// Details:
139156
// - YouTube TTML subtitles use both regular spaces (\u0020)
140157
// and non-breaking spaces (\u00A0).
141158
// - SRT subtitles only support regular spaces (\u0020),
142159
// so \u00A0 may cause display issues.
143160
// - \u00A0 and \u0020 are visually identical (i.e., they both
144161
// appear as spaces ' '), but they differ in Unicode encoding,
145-
// leading to test failures (e.g., ComparisonFailure).
146-
// - Convert \u00A0 to \u0020 to ensure consistency in subtitle
147-
// formatting.
148-
// - References:
149-
// - Unicode General Punctuation: https://unicode.org/charts/PDF/U2000.pdf
150-
// - TTML Spec: https://www.w3.org/TR/ttml2/
151-
// - SRT Format: https://en.wikipedia.org/wiki/SubRip
162+
// and NBSP (\u00A0) renders differently in different viewers.
163+
// - SRT is a plain-text format and does not interpret
164+
// "non-breaking" behavior.
165+
//
166+
// Conclusion:
167+
// - Ensure uniform behavior, so replace it to a regular space
168+
// without "non-breaking" behavior.
169+
//
170+
// References:
171+
// - Unicode U+00A0 NBSP (Latin-1 Supplement):
172+
// https://unicode.org/charts/PDF/U0080.pdf
152173
cleaned = cleaned.replace('\u00A0', ' ') // Non-breaking space
153174
.replace('\u202F', ' ') // Narrow no-break space
154175
.replace('\u205F', ' ') // Medium mathematical space

0 commit comments

Comments
 (0)