You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+15-17Lines changed: 15 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
# subtotxt
2
-
Quickly convert a [SubRip](https://en.wikipedia.org/wiki/SubRip) .srt or [WEBVTT](https://en.wikipedia.org/wiki/WebVTT) .vtt subtitle file to plain text. Removes timestamps and .srt/.vtt subtitle line numbers.
2
+
Quickly convert a [SubRip](https://en.wikipedia.org/wiki/SubRip) .srt, [SubStation Alpha](https://wiki.multimedia.cx/index.php?title=SubStation_Alpha) .ssa/.ass or [WEBVTT](https://en.wikipedia.org/wiki/WebVTT) .vtt subtitle file to plain text. Removes timestamps and .srt/.vtt subtitle line numbers.
3
3
This was a quick project thrown together for my girlfriend, she's still learning English and wanted to be able to read subtitles more like a transcript for some trickier language issues (and to understand the jokes in Friends by discussing them with me).
4
4
5
5
With a spot of feature creep and some encoding detection needs, it evolved into being able to detect character encoding, along with being able to understand both .srt and .vtt formats to save some pre-processing work.
@@ -11,20 +11,16 @@ or
11
11
The script will check which format the subtitle file is (incase of incorrect file extensions), detect the character encoding used then write out a .txt file with the same name as your input. If the output file already exists it will ask for permission to delete and create a new one.
12
12
## Advanced Usage:
13
13
The script has more advanced arguments you can parse:
14
-
-*--utf8* or *-8*
15
-
Forces the output file to use [UTF-8](https://en.wikipedia.org/wiki/UTF-8) encoding. This may eliminate character encoding issues if you cannot view the output file. In practice, if you can read the contents of the input subtitle file successfully the output should work without the need to change the encoding.
16
-
-*--pause* or *-p*
17
-
Pause the script at the sanity check stage to let you check some stats before continuing, handy if the output is not working.
18
-
-*--screen* or *-s*
19
-
Prints the output to the console while writing to the file, may help with debugging failed outputs.
20
-
-*--copy* or *-c*
21
-
Copies input to output without change, appends *-copy* to filename *e.g.: subtitle-copy.srt*, handy to use with *--utf8* to quickly change encoding. Might be useful if your video player app cannot understand your original subtitle file encoding.
22
-
-*--overwrite* or *-o*
23
-
Skips asking `Output file already exists, delete and make a new one? [y/n]` and simply deletes the existing output file to create a new one. Ideal for batch processing.
24
-
-*--oneliners* or *-1*
25
-
Writes all sentences in one line, even if the original file divides some sentences into many lines or subtitles.
26
-
-*--help* or *-h*
27
-
Shows above information.
14
+
-**--dir** or **-d**: Multiple file mode, use this **instead** of `-f` and point it at a folder containing your subtitles. It will run through and process them all, the files must have `.srt`, `.vtt`, `.ssa` or `.ass` extensions. Path can be a full path e.g. `C:\mysubs` or a relative path `.\`.
15
+
-**--noname** or **-nn**: For SubStation Alpha this prevents prepending the subtitle line with the character name given in the file, if present. A line with a character might appear as `Blackadder: Your name is Bob?`. I highly recommend this setting if using `oneliners` below. For other formats we attempt to remove `NAME:` from the beginning of the subtitle line.
16
+
-**--nosort** or **-ns**: Specifically for SubStation Alpha files, one aspect of these files is that the subtitles can be placed in any order, when the file is processed it works out when a line will appear. I imagine the main reason for this is you could split the dialogue into one block, and labels for signs, books, etc... in another. By default we sort and most examples I've seen have everything in one large block.
17
+
-**--utf8** or **-8**: Forces the output file to use [UTF-8](https://en.wikipedia.org/wiki/UTF-8) encoding. This may eliminate character encoding issues if you cannot view the output file. In practice, if you can read the contents of the input subtitle file successfully the output should work without the need to change the encoding.
18
+
-**--pause** or **-p**: Pause the script at the sanity check stage to let you check some stats before continuing, handy if the output is not working.
19
+
-**--screen** or **-s**: Prints the output to the console while writing to the file, may help with debugging failed outputs.
20
+
-**--copy** or **-c**: Copies input to output without change, appends *-copy* to filename *e.g.: subtitle-copy.srt*, handy to use with *--utf8* to quickly change encoding. Might be useful if your video player app cannot understand your original subtitle file encoding.
21
+
-**--overwrite** or **-o**: Skips asking `Output file already exists, delete and make a new one? [y/n]` and simply deletes the existing output file to create a new one. Ideal for batch processing.
22
+
-**--oneliners** or **-1**: Writes all sentences in one line, even if the original file divides some sentences into many lines or subtitles.
23
+
-**--help** or **-h**: Shows above information.
28
24
## Required External Modules:
29
25
-[Send2Trash](https://pypi.org/project/Send2Trash/) Python module to safely delete the old output file on both Win and \*nix based systems.
30
26
-~~[cchardet](https://pypi.org/project/cchardet/) Python module to detect your subtitle file encoding~~ (Removed for v2.0+ release due to issues with Python 3.10.x installs, still used in v1.0 and will work on Python 3.9.x installs).
@@ -33,15 +29,17 @@ Shows above information.
33
29
If your system does not these installed, it will auto install them on first use (or if you install a new version of Python later). If you prefer you can install them either manually, or by using the `requirements.txt`
34
30
## Features:
35
31
- Fast (aside from initial missing modules install on slow net connections)
32
+
- Process a single file or point at a folder to process all supported files.
36
33
- Input files character encoding formats are autodetected (if supported by [cchardet](https://pypi.org/project/cchardet/)[v1.0] or [charset_normalizer](https://github.com/Ousret/charset_normalizer)[v2.0+]). For most languages it should be fine, for Chinese and near neighbour languages it can be tricky, a subtitle may contain valid characters for Mandarin or Cantonese (or other dialects) and be in potentially the wrong encoding. This can result in some wonky detection but it should not affect the overall output.
37
34
- Output files are wrote in the same encoding as the input or can be forced to UTF8
38
35
- Should be cross platform friendly thanks to PathLib and Send2Trash
39
36
- Handles UNC style ```\\myserver\myshare\mysub.srt``` paths thanks to PathLib
40
37
- Handles SRT to TXT or WEBVTT to TXT
41
38
- Handles multi line subtitles and subtitle lines with just numbers (does not confuse them with SRT line numbers)
42
-
- Strips formatting tags, and rogue `{\an8}` tags you sometimes find in poorly converted subtitles
39
+
- Strips formatting tags, and rogue `{\an8}` tags you sometimes find in poorly converted subtitles
43
40
- WEBVTT: Removes 'WEBVTT', headers, metadata, notes, styles and timestamps from output
44
41
- SRT: Removes subtitle line #'s and Timestamps, will not work if first subtitle is not 1 or if duplicated line numbers are present (rare cases but possible), use [SubtitleEdit](https://github.com/SubtitleEdit/subtitleedit) to renumber lines for now if this happens.
42
+
- SSA/ASS: Removes all non dialogue lines, detects script version, removes positional {xxx} tags from text.
45
43
## Examples:
46
44
WEBVTT Input:
47
45
```
@@ -152,7 +150,7 @@ Output:
152
150
Fue estupendo.
153
151
```
154
152
## Future plans:
155
-
- Possibly handle more formats (.ssa Sub Station Alpha would be the other major one I could think of), for now you can use something like [SubtitleEdit](https://github.com/SubtitleEdit/subtitleedit) to convert most other formats to .srt or .vtt. If you have a format you would like to convert to txt, contact me or raise an issue to see if I can add support.
153
+
- Possibly handle more formats, for now you can use something like [SubtitleEdit](https://github.com/SubtitleEdit/subtitleedit) to convert most other formats to .srt or .vtt. If you have a format you would like to convert to txt, contact me or raise an issue to see if I can add support.
156
154
- GUI option for simple drag and drop usage.
157
155
- Figure out a checking method for misnumbered or duplicate numbered SRT line numbers.
0 commit comments