Either create repo from tree-TM codebase and use that as dependency or update mallet dependencies to use latest version of mallet.
mallet is used to train the topic model in the preprocessing step, and mallet output files are read in to app, both text files and the java serialized mallet training files. The formats of these (at least the model.docs output file) have changed in latest mallet version.
Two ways to go here:
- create repo from tree-TM codebase here and update mallet dependency there to latest version. Still need to update code that reads in mallet output files in alto.
- just use latest mallet directly and do necessary updates to alto alone.
I'd like to get tree-TM up-to-date, as I'm planning on using models with different structured priors for different annotation use cases, but that's a whole other level of effort. We could also just try and help move this PR of @Foroughp's that is merging tree-TM into mallet proper: mimno/Mallet#74