explosion spaCy Language Support · Discussions · GitHub

Sort by: Latest activity

Language Support Discussions

Discuss the language data and training models for new languages

Pinned to Language Support

Adding models for new languages master thread
enhancement Feature requests and improvements lang / all Global language data new language Adding support for new languages to spaCy.
ines started Dec 16, 2018 in Language Support

141

Discussions

You must be logged in to vote

Japanese Training data (as used in the model ja_core_news_lg for example)
lang / ja Japanese language data and models
xwd started Jun 1, 2021 in Language Support

1
You must be logged in to vote

create new pipeline for Persian
lang / fa Persian language data and models
aliebrahiiimi started May 29, 2021 in Language Support

5
You must be logged in to vote

Characterization of PoS accuracy
feat / tagger Feature: Part-of-speech tagger perf / accuracy Performance: accuracy
dandiep started May 27, 2021 in Language Support

2
You must be logged in to vote

Using Spacy V2 en_core_web_lg-2.3.1 model in Spacy V3
feat / tagger Feature: Part-of-speech tagger perf / accuracy Performance: accuracy
udaypk started May 21, 2021 in Language Support

4
You must be logged in to vote

zh_core_web_lg static embedding come from where?
lang / zh Chinese language data and models feat / vectors Feature: Word vectors and similarity
rgib37190 started May 24, 2021 in Language Support

4
You must be logged in to vote

Datasets used for the French pretrained pipelines
lang / fr French language data and models
michelemarzollo started May 20, 2021 in Language Support

2
You must be logged in to vote

Improved Italian lemmatizer: ongoing work or plans?
enhancement Feature requests and improvements lang / it Italian language data and models feat / lemmatizer Feature: Rule-based and lookup lemmatization
gtoffoli started Apr 19, 2021 in Language Support

10
You must be logged in to vote

With which corpora is the French accurate pipeline (fr_dep_new_trf) trained ?
lang / fr French language data and models feat / training Feature: Training utils, Example, Corpus and converters
XavBeckers started May 7, 2021 in Language Support

1
You must be logged in to vote

Incorrect lemmas for Italian language
lang / it Italian language data and models feat / lemmatizer Feature: Rule-based and lookup lemmatization
acazzaro started Apr 28, 2021 in Language Support

1
You must be logged in to vote

Training a lemmatizer on Universal Dependencies
feat / lemmatizer Feature: Rule-based and lookup lemmatization
thiippal started Feb 3, 2021 in Language Support

4
You must be logged in to vote

Bug: The different punctuation at the end of a sentense lead analysis results wrong.
feat / tagger Feature: Part-of-speech tagger feat / parser Feature: Dependency Parser
qingyun1988 started Apr 14, 2021 in Language Support

4
You must be logged in to vote

Part of Speech Tagger: "this/that/these/those" Pronouns / Determiners distinction not made
feat / tagger Feature: Part-of-speech tagger feat / training Feature: Training utils, Example, Corpus and converters
JosephPotashnik started Apr 13, 2021 in Language Support

4
You must be logged in to vote

Sesotho Model development
enhancement Feature requests and improvements
AtomLaw started Dec 23, 2020 in Language Support

3
You must be logged in to vote

gpt-neo with spacy?
feat / transformer Feature: Transformer
Huehnerbrust started Apr 5, 2021 in Language Support

1
You must be logged in to vote

Logging scores on the training set
lang / hu Hungarian language data and models feat / scorer Feature: Scorer
oroszgy started Mar 29, 2021 in Language Support

2
You must be logged in to vote

Term extraction of medical guidelines in German?
lang / de German language data and models models Issues related to the statistical models
zopyx started Mar 30, 2021 in Language Support

0
You must be logged in to vote

Adding lemmatizer and ner to pipeline
lang / sv Swedish language data and models feat / config Feature: Training config
Nuccy90 started Feb 16, 2021 in Language Support

6
You must be logged in to vote

Spacy 3.0 - specify my own candidate generator to use custom UMLS path
third-party Third-party packages and services
phil-oxenberg started Mar 16, 2021 in Language Support

2
You must be logged in to vote

Tokenizer exceptions for Sentencizer
feat / sentencizer Feature: Sentencizer (rule-based sentence segmenter)
bittlingmayer started Feb 26, 2021 in Language Support

2
You must be logged in to vote

Performance of transformer model with and without NER
feat / ner Feature: Named Entity Recognizer perf / speed Performance: speed
rykcode started Feb 22, 2021 in Language Support

0
You must be logged in to vote

'en_core_web_trf' optimal optimizer's learning rate and number of training epochs?
feat / config Feature: Training config
traceymai started Feb 15, 2021 in Language Support

5
You must be logged in to vote

Running a language model for spaCy 0.101
v1 spaCy v1.x
rkrovetz started Feb 11, 2021 in Language Support

1
You must be logged in to vote

Amharic: What do I need to do to create am_core_web_sm
lang / am Amharic language data and models
yosiasz started Feb 4, 2021 in Language Support · Closed

1
You must be logged in to vote

Using the ICU library and CLDR data for tokenization?
enhancement Feature requests and improvements
jack-pappas started Feb 3, 2021 in Language Support

0
You must be logged in to vote

Chinese POS tag incorrect?
lang / zh Chinese language data and models feat / tagger Feature: Part-of-speech tagger
bokailong started Feb 2, 2021 in Language Support

1