GitHub - GraphGrail/polyglot: Multilingual text (NLP) processing toolkit

GraphGrail Ai – is the world’s first Artificial Intelligence platform for Blockchain built on top of Natural Language Understanding technology with the DApps marketplace

Algorithms:

Word Embeddings Word embedding is a mapping of a word to a d-dimensional vector space. This real valued vector representation captures semantic and syntactic features. Polyglot offers a simple interface to load several formats of word embeddings. Formats The Embedding class can read word embeddings from different sources: Gensim word2vec objects: (from_gensim method) Word2vec binary/text models: (from_word2vec method) GloVe models (from_glove method) polyglot pickle files: (load method)
Part of Speech Tagging Part of speech tagging task aims to assign every word/token in plain text a category that identifies the syntactic functionality of the word occurrence. Polyglot recognizes 17 parts of speech, this set is called the universal part of speech tag set: ADJ: adjective ADP: adposition ADV: adverb AUX: auxiliary verb CONJ: coordinating conjunction DET: determiner INTJ: interjection NOUN: noun NUM: numeral PART: particle PRON: pronoun PROPN: proper noun PUNCT: punctuation SCONJ: subordinating conjunction SYM: symbol VERB: verb X: other
Named Entity Extraction Named entity extraction task aims to extract phrases from plain text that correpond to entities. Polyglot recognizes 3 categories of entities: Locations (Tag: I-LOC): cities, countries, regions, continents, neighborhoods, administrative divisions … Organizations (Tag: I-ORG): sports teams, newspapers, banks, universities, schools, non-profits, companies, … Persons (Tag: I-PER): politicians, scientists, artists, atheletes …
Morphological Analysis

Languages Coverage Using polyglot vocabulary dictionaries, we trained morfessor models on the most frequent words 50,000 words of each language. from polyglot.downloader import downloader print(downloader.supported_languages_table("morph2"))

Piedmontese language 2. Lombard language 3. Gan Chinese 4. Sicilian 5. Scots 6. Kirghiz, Kyrgyz 7. Pashto, Pushto 8. Kurdish 9. Portuguese
Kannada 11. Korean 12. Khmer
Kazakh 14. Ilokano 15. Polish
Panjabi, Punjabi 17. Georgian 18. Chuvash
Alemannic 20. Czech 21. Welsh
Chechen 23. Catalan; Valencian 24. Northern Sami
Sanskrit (Saṁskṛta) 26. Slovene 27. Javanese
Slovak 29. Bosnian-Croatian-Serbian 30. Bavarian
Swedish 32. Swahili 33. Sundanese
Serbian 35. Albanian 36. Japanese
Western Frisian 38. French 39. Finnish
Upper Sorbian 41. Faroese 42. Persian
Sinhala, Sinhalese 44. Italian 45. Amharic
Aragonese 47. Volapük 48. Icelandic
Sakha 50. Afrikaans 51. Indonesian
Interlingua 53. Azerbaijani 54. Ido
Arabic 56. Assamese 57. Yoruba
Yiddish 59. Waray-Waray 60. Croatian
Hungarian 62. Haitian; Haitian Creole 63. Quechua
Armenian 65. Hebrew (modern) 66. Silesian
Hindi 68. Divehi; Dhivehi; Mald... 69. German
Danish 71. Occitan 72. Tagalog
Turkmen 74. Thai 75. Tajik
Greek, Modern 77. Telugu 78. Tamil
Oriya 80. Ossetian, Ossetic 81. Tatar
Turkish 83. Kapampangan 84. Venetian
Manx 86. Gujarati 87. Galician
Irish 89. Scottish Gaelic; Gaelic 90. Nepali
Cebuano 92. Zazaki 93. Walloon
Dutch 95. Norwegian 96. Norwegian Nynorsk
West Flemish 98. Chinese 99. Bosnian
Breton 101. Belarusian 102. Bulgarian
Bashkir 104. Egyptian Arabic 105. Tibetan Standard, Tib...
Bengali 107. Burmese 108. Romansh
Marathi (Marāṭhī) 110. Malay 111. Maltese
Russian 113. Macedonian 114. Malayalam
Mongolian 116. Malagasy 117. Vietnamese
Spanish; Castilian 119. Estonian 120. Basque
Bishnupriya Manipuri 122. Asturian 123. English
Esperanto 125. Luxembourgish, Letzeb... 126. Latin
Uighur, Uyghur 128. Ukrainian 129. Limburgish, Limburgan...
Latvian 131. Urdu 132. Lithuanian
Fiji Hindi 134. Uzbek 135. Romanian, Moldavian, ...

This module is not belong to Graph Grail!!! It will be used to integrate with the micro services provided by Graph Grail.

Name		Name	Last commit message	Last commit date
Latest commit History 264 Commits
docs		docs
notebooks		notebooks
polyglot		polyglot
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
.vimrc		.vimrc
AUTHORS.rst		AUTHORS.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
HISTORY.rst		HISTORY.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
README.rst		README.rst
nb2rst.sh		nb2rst.sh
requirements.txt		requirements.txt
rtd_requirements.txt		rtd_requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

License

GraphGrail/polyglot

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages