Utilities to help me with Chinese-language work and other NLP tasks
-
json_texts: Contains research files in progress, in JSON format. Format specification is atjson_format_for_prosody.txt. Programhandle_files.pyenables encrypted version of data files to be pushed to repo, but keeps contents private. -
character_count.py: Count the Chinese characters (only) in a file and return their overall percentages. File to be opened must be in directoryDATA. -
separate_pinyin/Takes a string of Pīnyīn as input and returns a list of the discrete component syllables. There is a second programcount_syllables.pyto count the number of syllables found. -
convert_pinyin/: Convert files in Pages (v. 3, "Pages '08") format so that their non-standard tonal diacritics are normalized to Unicode. Does not work with later versions of Pages. Sample font ("shyrbaw" 時報, based on Times) is included in directory. -
statistics/: Little programs to calculate statistical tests. -
poetry_flask/: The beginnings of a web application to assist the study of medieval Chinese prosody. -
hanamin_fonts/: Copy of the HANAMIN fonts for use with this project.
[end]