You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: GlotScript/GlotScript.py
+3-4Lines changed: 3 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -2,13 +2,12 @@
2
2
Author: Amir Hossein Kargaran
3
3
Date: August, 2023
4
4
5
-
Description: This code detects the script of the given texts.
5
+
Description: This code detects the script (writing system) of the given text.
6
6
7
7
MIT License
8
8
9
-
Original code is from Meta Platforms, Inc. and affiliates and is based on the MIT license, with permission for distribution and modification.
10
-
The original code is capable of detecting less than 40 scripts.
11
-
Original code repository: https://github.com/facebookresearch/stopes/blob/main/stopes/pipelines/monolingual/utils/predict_script.py
9
+
Original code is from Meta and is based on the MIT license, with permission for distribution and modification.
10
+
The original code is capable of detecting less than 40 scripts: https://github.com/facebookresearch/stopes/blob/main/stopes/pipelines/monolingual/utils/predict_script.py
Copy file name to clipboardExpand all lines: README.md
+23-13Lines changed: 23 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,18 +1,21 @@
1
-
# LangScriptID
2
-
Detect the script of text based on ISO 15924.
1
+
# GlotScript
2
+
Detect the script (writing system) of text based on ISO 15924.
3
3
- The codes were sourced from [Wikipedia ISO_15924](https://en.wikipedia.org/wiki/ISO_15924).
4
4
- Unicode ranges were extracted from [Unicode Character Database](https://www.unicode.org/Public/15.0.0/ucd/Scripts.txt).
5
5
6
+
## Special codes
7
+
-`Zinh` code is the Unicode script property value of characters that may be used with multiple scripts, and that inherit their script from a preceding base character. In some cases, we opted to integrate parts of the Zinh code (e.g. ARABIC FATHATAN..ARABIC HAMZA BELOW, ARABIC LETTER SUPERSCRIPT ALEF) into a different block.
8
+
-`Zyyy` code is the Unicode script for "Common" characters.
-[List of Unicode characters - Wikipedia](https://en.wikipedia.org/wiki/List_of_Unicode_characters)
57
63
-[Lightweight Plain-Text Editor for macOS - CotEditor](https://github.com/coteditor/CotEditor/blob/main/CotEditor/Sources/Unicode.UTF32.CodeUnit%2BBlockName.swift)
58
64
-[The Cygwin Terminal – terminal emulator for Cygwin, MSYS, and WSL - mintty](https://github.com/mintty/mintty/blob/master/src/scripts.t)
@@ -70,3 +76,7 @@ If you use any part of this library in your research, please cite it using the f
70
76
-[Gradient Boosting on Decision Trees - catboost](https://github.com/catboost/catboost/blob/master/contrib/python/fonttools/fontTools/unicodedata/Blocks.py)
0 commit comments