Named Entity Recognition in Turkish news texts using CRF

Conditional Random Fields model for named entity recognition in Turkish news texts which is implemented in Python.

Sample input format (tab seperated) is described below:

Word	POS	Annotation
Tek	Adj	O
çatı	Noun	O
altında	Noun	O
dokuz	Num	O
ayrı	Adj	O
salonda	Noun	O
gerçekleştirilecek	Verb	O
Şenlik	Noun	O
kapsamında	Noun	O
doksanın	Noun	O
üzerinde	Noun	O
etkinlik	Noun	O
yer	Noun	O
alacak	Verb	O

You can also use the trained model ("crf_v2.joblib") to label your test dataset. The output of the model consists of "word - predicted annotation - pos" triple where each item is seperated with tab.

Sample output of the model is given below:

Word	Predicted_Annotation	POS
Istanbul	LOCATION	Noun
yüzde	PERCENT	Noun
2013	DATE	Num
Meclis ˙	ORGANIZATION	Noun
lira ˙	MONEY	Noun
simdi	TIME	Adv

In order to evaluate the performance of the model, you can execute "CRF_Eval.java". It calculates CONLL F1-score, precision and recall for each annotation type using sequence alignment algorithm.

Citing

If you use this model in an academic publication, please refer to: https://ieeexplore.ieee.org/document/8806523

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
CRF_Eval.java		CRF_Eval.java
NER-CRFSuite.ipynb		NER-CRFSuite.ipynb
README.md		README.md
crf_v2.joblib		crf_v2.joblib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Named Entity Recognition in Turkish news texts using CRF

Citing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Named Entity Recognition in Turkish news texts using CRF

Citing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages