Skip to content
This repository was archived by the owner on May 23, 2024. It is now read-only.

add getopts to preprocessing bash scripts (new_dataset.sh and train_mallet.sh, etc) #12

@robbymeals

Description

@robbymeals

this is the script that builds a new dataset, from snagajob postings specifically. It's pretty quick and dirty still, I need to add getopts, shabang header and a help string, etc. I'll also add documentation to the README.

It requires the unzipped tree-TM codebase somewhere on the machine, to pass to script as MALLET_HOME, a python joblib serialized list of mongo posting ids, and access to our mongo cluster.

It's called like:

bash scripts/new_dataset.sh
postings_samp . en_lang_postings_samp_small.pkl 20 ~/tree-TM/bin/ 8
$MONGO_USER $MONGO_PASSWORD $MONGO_HOST $MONGO_PORT $MONGO_DB
and it creates all the necessary files on disk, creates the expected directory trees, and trains the topic model.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions