GraphGrail Ai – is the world’s first Artificial Intelligence platform for Blockchain built on top of Natural Language Understanding technology with the DApps marketplace
Algorithms:
-
Word Embeddings Word embedding is a mapping of a word to a d-dimensional vector space. This real valued vector representation captures semantic and syntactic features. Polyglot offers a simple interface to load several formats of word embeddings. Formats The Embedding class can read word embeddings from different sources: Gensim word2vec objects: (from_gensim method) Word2vec binary/text models: (from_word2vec method) GloVe models (from_glove method) polyglot pickle files: (load method)
-
Part of Speech Tagging Part of speech tagging task aims to assign every word/token in plain text a category that identifies the syntactic functionality of the word occurrence. Polyglot recognizes 17 parts of speech, this set is called the universal part of speech tag set: ADJ: adjective ADP: adposition ADV: adverb AUX: auxiliary verb CONJ: coordinating conjunction DET: determiner INTJ: interjection NOUN: noun NUM: numeral PART: particle PRON: pronoun PROPN: proper noun PUNCT: punctuation SCONJ: subordinating conjunction SYM: symbol VERB: verb X: other
-
Named Entity Extraction Named entity extraction task aims to extract phrases from plain text that correpond to entities. Polyglot recognizes 3 categories of entities: Locations (Tag: I-LOC): cities, countries, regions, continents, neighborhoods, administrative divisions … Organizations (Tag: I-ORG): sports teams, newspapers, banks, universities, schools, non-profits, companies, … Persons (Tag: I-PER): politicians, scientists, artists, atheletes …
-
Morphological Analysis
Languages Coverage Using polyglot vocabulary dictionaries, we trained morfessor models on the most frequent words 50,000 words of each language. from polyglot.downloader import downloader print(downloader.supported_languages_table("morph2"))
- Piedmontese language 2. Lombard language 3. Gan Chinese 4. Sicilian 5. Scots 6. Kirghiz, Kyrgyz 7. Pashto, Pushto 8. Kurdish 9. Portuguese
- Kannada 11. Korean 12. Khmer
- Kazakh 14. Ilokano 15. Polish
- Panjabi, Punjabi 17. Georgian 18. Chuvash
- Alemannic 20. Czech 21. Welsh
- Chechen 23. Catalan; Valencian 24. Northern Sami
- Sanskrit (Saṁskṛta) 26. Slovene 27. Javanese
- Slovak 29. Bosnian-Croatian-Serbian 30. Bavarian
- Swedish 32. Swahili 33. Sundanese
- Serbian 35. Albanian 36. Japanese
- Western Frisian 38. French 39. Finnish
- Upper Sorbian 41. Faroese 42. Persian
- Sinhala, Sinhalese 44. Italian 45. Amharic
- Aragonese 47. Volapük 48. Icelandic
- Sakha 50. Afrikaans 51. Indonesian
- Interlingua 53. Azerbaijani 54. Ido
- Arabic 56. Assamese 57. Yoruba
- Yiddish 59. Waray-Waray 60. Croatian
- Hungarian 62. Haitian; Haitian Creole 63. Quechua
- Armenian 65. Hebrew (modern) 66. Silesian
- Hindi 68. Divehi; Dhivehi; Mald... 69. German
- Danish 71. Occitan 72. Tagalog
- Turkmen 74. Thai 75. Tajik
- Greek, Modern 77. Telugu 78. Tamil
- Oriya 80. Ossetian, Ossetic 81. Tatar
- Turkish 83. Kapampangan 84. Venetian
- Manx 86. Gujarati 87. Galician
- Irish 89. Scottish Gaelic; Gaelic 90. Nepali
- Cebuano 92. Zazaki 93. Walloon
- Dutch 95. Norwegian 96. Norwegian Nynorsk
- West Flemish 98. Chinese 99. Bosnian
- Breton 101. Belarusian 102. Bulgarian
- Bashkir 104. Egyptian Arabic 105. Tibetan Standard, Tib...
- Bengali 107. Burmese 108. Romansh
- Marathi (Marāṭhī) 110. Malay 111. Maltese
- Russian 113. Macedonian 114. Malayalam
- Mongolian 116. Malagasy 117. Vietnamese
- Spanish; Castilian 119. Estonian 120. Basque
- Bishnupriya Manipuri 122. Asturian 123. English
- Esperanto 125. Luxembourgish, Letzeb... 126. Latin
- Uighur, Uyghur 128. Ukrainian 129. Limburgish, Limburgan...
- Latvian 131. Urdu 132. Lithuanian
- Fiji Hindi 134. Uzbek 135. Romanian, Moldavian, ...
This module is not belong to Graph Grail!!! It will be used to integrate with the micro services provided by Graph Grail.