You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
# subtotxt
2
-
Quickly convert a [SubRip](https://en.wikipedia.org/wiki/SubRip) .srt or [WEBVTT](https://en.wikipedia.org/wiki/WebVTT) .vtt subtitle file to plain text. Removes timestamps and .srt subtitle line numbers.
2
+
Quickly convert a [SubRip](https://en.wikipedia.org/wiki/SubRip) .srt or [WEBVTT](https://en.wikipedia.org/wiki/WebVTT) .vtt subtitle file to plain text. Removes timestamps and .srt/.vtt subtitle line numbers.
3
3
This was a quick project thrown together for my girlfriend, she's still learning English and wanted to be able to read subtitles more like a transcript for some trickier language issues (and to understand the jokes in Friends by discussing them with me).
4
4
5
5
With a spot of feature creep and some encoding detection needs, it evolved into being able to detect character encoding, along with being able to understand both .srt and .vtt formats to save some pre-processing work.
The script will check which format the subtitle file is (incase of incorrect file extensions), detect the character encoding used then write out a .txt file with the same name as your input. If the output file already exists it will ask for permission to delete and create a new one.
12
12
## Advanced Usage:
13
-
The script has six more arguments you can parse:
13
+
The script has more advanced arguments you can parse:
14
14
-*--utf8* or *-8*
15
15
Forces the output file to use [UTF-8](https://en.wikipedia.org/wiki/UTF-8) encoding. This may eliminate character encoding issues if you cannot view the output file. In practice, if you can read the contents of the input subtitle file successfully the output should work without the need to change the encoding.
16
16
-*--pause* or *-p*
@@ -20,26 +20,27 @@ Prints the output to the console while writing to the file, may help with debugg
20
20
-*--copy* or *-c*
21
21
Copies input to output without change, appends *-copy* to filename *e.g.: subtitle-copy.srt*, handy to use with *--utf8* to quickly change encoding. Might be useful if your video player app cannot understand your original subtitle file encoding.
22
22
-*--overwrite* or *-o*
23
-
Skips asking ```Output file already exists, delete and make a new one? [y/n]``` and simply deletes the existing output file to create a new one. Ideal for batch processing.
23
+
Skips asking `Output file already exists, delete and make a new one? [y/n]` and simply deletes the existing output file to create a new one. Ideal for batch processing.
24
24
-*--oneliners* or *-1*
25
25
Writes all sentences in one line, even if the original file divides some sentences into many lines or subtitles.
26
26
-*--help* or *-h*
27
27
Shows above information.
28
28
## Required External Modules:
29
29
-[Send2Trash](https://pypi.org/project/Send2Trash/) Python module to safely delete the old output file on both Win and \*nix based systems.
30
-
-~~[cchardet](https://pypi.org/project/cchardet/) Python module to detect your subtitle file encoding~~ (Removed for v2.0 release due to issues with Python 3.10.x installs, still used in v1.0 and will work on Python 3.9.x installs).
31
-
-[charset_normalizer](https://github.com/Ousret/charset_normalizer) Python module to detect your subtitle file encoding (v2.0+ supports Python 3.9.x and 3.10.x).
30
+
-~~[cchardet](https://pypi.org/project/cchardet/) Python module to detect your subtitle file encoding~~ (Removed for v2.0+ release due to issues with Python 3.10.x installs, still used in v1.0 and will work on Python 3.9.x installs).
31
+
-[charset_normalizer](https://github.com/Ousret/charset_normalizer) Python module to detect your subtitle file encoding (v2.0 and YYYY-MM-DD versions, supports Python 3.9.x and above).
32
32
33
-
If your system does not these installed, it will auto install them on first use.
33
+
If your system does not these installed, it will auto install them on first use (or if you install a new version of Python later). If you prefer you can install them either manually, or by using the `requirements.txt`
34
34
## Features:
35
35
- Fast (aside from initial missing modules install on slow net connections)
36
-
- Input files character encoding formats are autodetected (if supported by [cchardet](https://pypi.org/project/cchardet/)[v1.0] or [charset_normalizer](https://github.com/Ousret/charset_normalizer)[v2.0+])
36
+
- Input files character encoding formats are autodetected (if supported by [cchardet](https://pypi.org/project/cchardet/)[v1.0] or [charset_normalizer](https://github.com/Ousret/charset_normalizer)[v2.0+]). For most languages it should be fine, for Chinese and near neighbour languages it can be tricky, a subtitle may contain valid characters for Mandarin or Cantonese (or other dialects) and be in potentially the wrong encoding. This can result in some wonky detection but it should not affect the overall output.
37
37
- Output files are wrote in the same encoding as the input or can be forced to UTF8
38
38
- Should be cross platform friendly thanks to PathLib and Send2Trash
39
39
- Handles UNC style ```\\myserver\myshare\mysub.srt``` paths thanks to PathLib
40
40
- Handles SRT to TXT or WEBVTT to TXT
41
41
- Handles multi line subtitles and subtitle lines with just numbers (does not confuse them with SRT line numbers)
42
-
- WEBVTT: Removes 'WEBVTT', 'Kind: xxxx', 'Language: xxx' headers and Timestamps from output
42
+
- Strips formatting tags, and rogue `{\an8}` tags you sometimes find in poorly converted subtitles
43
+
- WEBVTT: Removes 'WEBVTT', headers, metadata, notes, styles and timestamps from output
43
44
- SRT: Removes subtitle line #'s and Timestamps, will not work if first subtitle is not 1 or if duplicated line numbers are present (rare cases but possible), use [SubtitleEdit](https://github.com/SubtitleEdit/subtitleedit) to renumber lines for now if this happens.
44
45
## Examples:
45
46
WEBVTT Input:
@@ -154,6 +155,5 @@ Output:
154
155
- Possibly handle more formats (.ssa Sub Station Alpha would be the other major one I could think of), for now you can use something like [SubtitleEdit](https://github.com/SubtitleEdit/subtitleedit) to convert most other formats to .srt or .vtt. If you have a format you would like to convert to txt, contact me or raise an issue to see if I can add support.
155
156
- GUI option for simple drag and drop usage.
156
157
- Figure out a checking method for misnumbered or duplicate numbered SRT line numbers.
157
-
- Handle stripping out SRT formatting tags for bold, italic etc...
158
158
## License:
159
159
Released as CC0, use it how you wish. If you do use it elsewhere, please be awesome and tag me as the original author. 🙂
0 commit comments