Skip to content

obtic-sorbonne/arabic-ner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ArabicLitTAL

This repository contains:

  • a literary corpus in Arabic annotated in spatial named entities
  • named entity recognition models for Spacy created from these data

Authors

Motasem ALRAHABI, Carmen BRANDO, Muhamed ALKHALIL, Joseph DICHY

Sorbonne Université, France, motasem.alrahabi@paris-sorbonne.fr

École des Hautes Études en Sciences Sociales, France, carmen.brando@ehess.fr

Université de New York, Abou Dhabi, EAU, muhamed.alkhalil@nyu.edu

Université Canadienne de Doubaï (CUD), EAU, joseph.dichy@yahoo.fr

The corpus

  • A Paris Profile (1834) by Rifa'a al-Tahtawi
  • The Uncovering of Europe (1857) by A. Fares Al-Shidyaq
  • The World in Paris (1900) by Ahmad Zaki
  • A Trip to Europe (1912) by Jurji Zaydan
  • The Paris Days (1931) by Zaki Mubarak
  • Memoirs of a Witness to the Century (1965) by Malek Bin Nabi

These texts were annotated in spatial named entites: administrative places, buildings & natural places.

Anotations are standoff at the phrase level, the format is Spacy's JSON (https://spacy.io/api/annotation).

The NER models

There are ten models created for a 10-fold cross validation. Performances for each models is provided as well as errors found in terms of false positives and false negatives.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages