Skip to content

Commit 657f349

Browse files
committed
update name.
change name.
1 parent 76b7266 commit 657f349

File tree

5 files changed

+35
-25
lines changed

5 files changed

+35
-25
lines changed

LangScriptID/LangScriptID.py renamed to GlotScript/GlotScript.py

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,12 @@
22
Author: Amir Hossein Kargaran
33
Date: August, 2023
44
5-
Description: This code detects the script of the given texts.
5+
Description: This code detects the script (writing system) of the given text.
66
77
MIT License
88
9-
Original code is from Meta Platforms, Inc. and affiliates and is based on the MIT license, with permission for distribution and modification.
10-
The original code is capable of detecting less than 40 scripts.
11-
Original code repository: https://github.com/facebookresearch/stopes/blob/main/stopes/pipelines/monolingual/utils/predict_script.py
9+
Original code is from Meta and is based on the MIT license, with permission for distribution and modification.
10+
The original code is capable of detecting less than 40 scripts: https://github.com/facebookresearch/stopes/blob/main/stopes/pipelines/monolingual/utils/predict_script.py
1211
"""
1312

1413
import string

GlotScript/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
from .GlotScript import get_script_predictor
2+
3+
__version__ = '1.0'

LangScriptID/__init__.py

Lines changed: 0 additions & 3 deletions
This file was deleted.

README.md

Lines changed: 23 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,21 @@
1-
# LangScriptID
2-
Detect the script of text based on ISO 15924.
1+
# GlotScript
2+
Detect the script (writing system) of text based on ISO 15924.
33
- The codes were sourced from [Wikipedia ISO_15924](https://en.wikipedia.org/wiki/ISO_15924).
44
- Unicode ranges were extracted from [Unicode Character Database](https://www.unicode.org/Public/15.0.0/ucd/Scripts.txt).
55

6+
## Special codes
7+
- `Zinh` code is the Unicode script property value of characters that may be used with multiple scripts, and that inherit their script from a preceding base character. In some cases, we opted to integrate parts of the Zinh code (e.g. ARABIC FATHATAN..ARABIC HAMZA BELOW, ARABIC LETTER SUPERSCRIPT ALEF) into a different block.
8+
- `Zyyy` code is the Unicode script for "Common" characters.
69

710
## Install
811
```bash
9-
pip3 install LangScriptID@git+https://github.com/kargaranamir/LangScriptID
12+
pip3 install GlotScript@git+https://github.com/cisnlp/GlotScript
1013
```
1114

1215
## Usage
1316

1417
```python
15-
from LangScriptID import get_script_predictor
18+
from GlotScript import get_script_predictor
1619
sp = get_script_predictor()
1720
```
1821

@@ -22,13 +25,13 @@ sp('これは日本人です')
2225
```
2326

2427
```python
25-
sp('This is Latin')
26-
>> ('Latn', 1.0, {'details': {'Latn': 1.0}, 'tie': False, 'interval': 1})
28+
sp('This is Latin')[:1]
29+
>> ('Latn', 1.0)
2730
```
2831

2932
```python
30-
sp('මේක සිංහල')
31-
>> ('Sinh', 1.0, {'details': {'Sinh': 1.0}, 'tie': False, 'interval': 1})
33+
sp('මේක සිංහල')[0]
34+
>> 'Sinh'
3235
```
3336

3437
```python
@@ -41,18 +44,21 @@ sp('𝄞𝄫 𒊕𒀸')
4144
If you use any part of this library in your research, please cite it using the following BibTex entry.
4245

4346
```
44-
@misc{langscriptid,
45-
author = {Kargaran, Amir Hossein},
46-
title = {LangScriptID Python Library},
47+
@misc{glotscript,
48+
author = {Kargaran, Amir Hossein and Yvon, Fran{\c{c}}ois and Sch{\"u}tze, Hinrich},
49+
title = {GlotScript},
4750
year = {2023},
4851
publisher = {GitHub},
4952
journal = {GitHub Repository},
50-
howpublished = {\url{https://github.com/kargaranamir/LangScriptID}},
53+
howpublished = {\url{https://github.com/cisnlp/GlotScript}},
5154
}
5255
```
5356

5457

55-
## Related Sources
58+
## Exploring Unicode Blocks: Related Sources
59+
<details>
60+
<summary>Click to Exapand</summary>
61+
5662
- [List of Unicode characters - Wikipedia](https://en.wikipedia.org/wiki/List_of_Unicode_characters)
5763
- [Lightweight Plain-Text Editor for macOS - CotEditor](https://github.com/coteditor/CotEditor/blob/main/CotEditor/Sources/Unicode.UTF32.CodeUnit%2BBlockName.swift)
5864
- [The Cygwin Terminal – terminal emulator for Cygwin, MSYS, and WSL - mintty](https://github.com/mintty/mintty/blob/master/src/scripts.t)
@@ -70,3 +76,7 @@ If you use any part of this library in your research, please cite it using the f
7076
- [Gradient Boosting on Decision Trees - catboost](https://github.com/catboost/catboost/blob/master/contrib/python/fonttools/fontTools/unicodedata/Blocks.py)
7177
- [Blender](https://github.com/blender/blender/blob/main/source/blender/blenfont/intern/blf_glyph.cc)
7278
- [Unicode Wikipedia](https://en.wikipedia.org/wiki/Unicode_block)
79+
80+
</details>
81+
82+

setup.py

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,14 @@
44
long_description = fh.read()
55

66
setup(
7-
name="LangScriptID",
8-
version="0.1",
7+
name="GlotScript",
8+
version="1.0",
99
author="Amir Hossein Kargaran",
1010
author_email="[email protected]",
11-
description="A package for detecting the script and language of given texts.",
11+
description="A package for detecting the script (writing system) of given text.",
1212
long_description=long_description,
1313
long_description_content_type="text/markdown",
14-
url="https://github.com/kargaranamir/LangScriptID",
14+
url="https://github.com/cisnlp/GlotScript",
1515
packages=find_packages(),
1616
classifiers=[
1717
"License :: OSI Approved :: MIT License",
@@ -20,5 +20,6 @@
2020
"Programming Language :: Python :: 3.7",
2121
"Programming Language :: Python :: 3.8",
2222
"Programming Language :: Python :: 3.9",
23+
"Programming Language :: Python :: 3.10",
2324
],
24-
)
25+
)

0 commit comments

Comments
 (0)