Skip to content

FirstDraftGIS/genesis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

genesis

Scripts that creates training and testing data for geo-parsing from Wikipedia

structure

Genesis includes two parts, (1) a genesis.tsv file and (2) a folder of raw text with maps. The genesis.tsv file is useful for machine learning and easily imported into databases. Secondly, the folder contains a folder for each page on the English Wikipedia with enough text with places to make a map. Each of these subfolders includes two files. The first is a .txt file that includes the text of the page. The second is a geojson file that is a map of the places mentioned in the article.

download links

license

The training data is released under Creative Commons Attribution-ShareAlike 3.0 license, which is the same license as much of Wikipedia's content: https://en.wikipedia.org/wiki/Wikipedia:Text_of_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License

About

Scripts that creates training and testing data for geo-parsing from Wikipedia

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages