You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+4-3Lines changed: 4 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -3,7 +3,7 @@
3
3
4
4
greCy is a set of spaCy ancient Greek models and its installer. The models were trained using the [Perseus](https://universaldependencies.org/treebanks/grc_perseus/index.html) and [Proiel UD](https://universaldependencies.org/treebanks/grc_proiel/index.html) corpora. Prior to installation, the models can be tested on my [Ancient Greek Syntax Analyzer](https://huggingface.co/spaces/Jacobo/syntax) on the [Hugging Face Hub](https://huggingface.co/), where you can also check the various performance metrics of each model.
5
5
6
-
In general, models trained with the Proiel corpus perform better in POS Tagging and Dependency Parsing, while Perseus models are better at sentence segmentation using punctuation, and Morphological Analysis. Lemmatization is similar across models because they share the same neural lemmatizer in two variants: the most accurate lemmatizer was trained with word vectors, and the other was not The best models for lemmatization are the large (_lg) models.
6
+
In general, models trained with the Proiel corpus perform better in POS Tagging and Dependency Parsing, while Perseus models are better at sentence segmentation using punctuation, and Morphological Analysis. Lemmatization is similar across models because they share the same neural lemmatizer in two variants: the most accurate lemmatizer was trained with word vectors, and the other was not. The best models for lemmatization are the large models.
7
7
8
8
### Installation
9
9
@@ -13,7 +13,7 @@ First install the python package as usual:
13
13
pip install -U grecy
14
14
```
15
15
16
-
Once the package is successfully installed, you can proceed any of the followings models:
16
+
Once the package is successfully installed, you can proceed to dowload and install any of the followings models:
17
17
18
18
* grc_perseus_sm
19
19
* grc_proiel_sm
@@ -28,7 +28,7 @@ The models can be installed from the terminal with the commands below:
28
28
```
29
29
python -m grecy install MODEL
30
30
```
31
-
where you replace MODEL by any of the model names listed above. The suffixes after the corpus name _sm, _lg, and _trf indicate the size of the model which directly depend on the word embedding used to train the models. The smallest models end in _sm (small) and the less accurate ones: they are good for testing and building lightweight apps. The _lg and _trf are the large and transformers models which are more accurate. The _lg were trained using fasttext word vectors in the spaCy floret version, and the _trf models were using a special version of BERT, pertained by ourselves with the largest Ancient Greek corpus we could find (see more below). If you would like to work with word similarity, choose the _lg models. The vectors for large models were trained with the TLG corpus using [floret](https://github.com/explosion/floret), a fork of [fastText](https://fasttext.cc/).
31
+
where you replace MODEL by any of the model names listed above. The suffixes after the corpus name,_sm, _lg, and _trf, indicate the size of the model which directly depends on the word embedding used for training. The smallest models end in _sm (small) and are the less accurate ones: they are good for testing and building lightweight apps. The _lg and _trf are the large and transformers models which are more accurate. The _lg were trained using fasttext word vectors in the spaCy floret version, and the _trf models were trained using a special version of BERT, pertained by ourselves with the largest available Ancient Greek corpus, namely, the TLG. The vectors for large models were also trained with the TLG corpus.
32
32
33
33
34
34
### Loading
@@ -89,6 +89,7 @@ For a general comparison, I share here the metrics of the Proiel transformer grc
0 commit comments