-
Notifications
You must be signed in to change notification settings - Fork 1
Description
This proposal outlines a workflow where users never leave the Kaiaulu R Notebook. Instead, a standalone Python script (for both training and prediction) is exposed to Kaiaulu via tools.yml and invoked from R using system calls.
This shifts the entire analysis workflow (data download, parsing, training, prediction, and results) into a single Kaiaulu Notebook.
Left side: Kaiaulu
R Scripts (R/sentiment.R)
pysenti_train_model(pysenti_path,reply_dt,model_save_path, model) Calls a Python script (train_or_predict.py) that passes in a data table to train a sentiment model.
pysenti_predict(pysenti_path, reply_dt, model_save_path, model): Calls the same Python script to that passes in a data table to predict sentiment on new data.
get_pysenti_path("pysenti", "../conf/tools.yml"): Fetches the path to the Python script based on a configuration (tools.yml).
Configuration (tools.yml)
Defines the path to the Python script for training/prediction:
pysenti: ~/path/to/train_or_predict.py
Vignette (vignettes/sentiment_analysis.Rmd)
Download data
Parse
Train or predict using pysenti_train_model() or pysenti_predict()
Load table with results
Right side: pysenti
Script (exec/train_or_predict.py)
Handles train and predict commands.
Receives the model path and the parsed data table from R, then executes the corresponding Python functions in:
API functions (api/model.py)
train_model(This function already exists):
train_model(parsed_dt, model_saved_path, model_select=0): Takes in a data table (w/ columns "Text" and "Polarity" where Polarity is assumed to have already correct values to train the models) Trains a model and saves it to a path "model_saved_path" and returns that path.
We will add this function:
predict_sentiment(), this is similar to the existing test_model(). test_model() predicts sentiment, compares it to the flat true labels, then combines both labels into a data table and returns it. Our new function predict_sentiment() only predicts sentiment, then returns a data table with updated sentiment values.
predict_sentiment(parsed_dt, model_saved_path, model_select=0): Takes in a data table (w/ columns "Text" and "Polarity") then applies predicted sentiment values and returns that table.