-
-
Notifications
You must be signed in to change notification settings - Fork 37
Description
On the back of the PR #13, it appears there are other types of phrase i.e. pronouns, or dates or organisations etc... - the details can be discussed. So far we have achieved these and there are a number of others to cover:
Name entity recognition features:
- PERSON | People, including fictional.
- NORP | Nationalities or religious or political groups.
- FAC | Buildings, airports, highways, bridges, etc.
- ORG | Companies, agencies, institutions, etc.
- GPE | Countries, cities, states.
- LOC | Non-GPE locations, mountain ranges, bodies of water.
- PRODUCT | Objects, vehicles, foods, etc. (Not services.)
- EVENT | Named hurricanes, battles, wars, sports events, etc.
- WORK_OF_ART | Titles of books, songs, etc.
- LAW | Named documents made into laws.
- LANGUAGE | Any named language. (related to Language Detection Feature #4 feature request)
- DATE | Absolute or relative dates or periods.
- TIME | Times smaller than a day.
- PERCENT | Percentage, including ”%“.
- MONEY | Monetary values, including unit.
- QUANTITY | Measurements, as of weight or distance.
- ORDINAL | “first”, “second”, etc.
- CARDINAL | Numerals that do not fall under another type.
Parts of speech features:
- (NOUN | noun | girl, cat, tree, air, beauty) Noun phrase count via Added Noun phrase count #13 by @ritikjain51 and Add noun phrase count to the granular features functionality #47
- ADJ | adjective | big, old, green, incomprehensible, first
- ADP | adposition | in, to, during
- ADV | adverb | very, tomorrow, down, where, there
- AUX | auxiliary | is, has (done), will (do), should (do)
- CONJ | conjunction | and, or, but
- CCONJ | coordinating conjunction | and, or, but
- DET | determiner | a, an, the
- INTJ | interjection | psst, ouch, bravo, hello
- NUM | numeral | 1, 2017, one, seventy-seven, IV, MMXIV
- PART | particle | ’s, not,
- PRON | pronoun | I, you, he, she, myself, themselves, somebody
- PROPN | proper noun | Mary, John, London, NATO, HBO
- PUNCT | punctuation | ., (, ), ?
- SCONJ | subordinating conjunction | if, while, that
- SYM | symbol | $, %, §, ©, +, −, ×, ÷, =, :), 😝
- VERB | verb | run, runs, running, eat, ate, eating
- SPACE | space
See https://spacy.io/api/annotation#section-named-entities and http://www.nltk.org/book/ for details on the above items.
We will replace one or more existing functionalities in the libraries with the above, case-by-case basis. It would be best to group each of them and give them unique names like name-entity-recognition-features and parts-of-speech-features, respectively and club them with granular features.
Both NLTK and Spacey would be used to fulfill these functionalities.