Skip to content

Add R-to-Python syscall workflow for sentiment analysis training and prediction in a Kaiaulu Notebook #3

@geraldmjhuff

Description

@geraldmjhuff

This proposal outlines a workflow where users never leave the Kaiaulu R Notebook. Instead, a standalone Python script (for both training and prediction) is exposed to Kaiaulu via tools.yml and invoked from R using system calls.

This shifts the entire analysis workflow (data download, parsing, training, prediction, and results) into a single Kaiaulu Notebook.

Image

Left side: Kaiaulu

R Scripts (R/sentiment.R)
pysenti_train_model(pysenti_path,reply_dt,model_save_path, model) Calls a Python script (train_or_predict.py) that passes in a data table to train a sentiment model.
pysenti_predict(pysenti_path, reply_dt, model_save_path, model): Calls the same Python script to that passes in a data table to predict sentiment on new data.
get_pysenti_path("pysenti", "../conf/tools.yml"): Fetches the path to the Python script based on a configuration (tools.yml).

Configuration (tools.yml)
Defines the path to the Python script for training/prediction:

pysenti: ~/path/to/train_or_predict.py

Vignette (vignettes/sentiment_analysis.Rmd)
Download data
Parse
Train or predict using pysenti_train_model() or pysenti_predict()
Load table with results

Right side: pysenti

Script (exec/train_or_predict.py)
Handles train and predict commands.
Receives the model path and the parsed data table from R, then executes the corresponding Python functions in:

API functions (api/model.py)
train_model(This function already exists):
train_model(parsed_dt, model_saved_path, model_select=0): Takes in a data table (w/ columns "Text" and "Polarity" where Polarity is assumed to have already correct values to train the models) Trains a model and saves it to a path "model_saved_path" and returns that path.

We will add this function:
predict_sentiment(), this is similar to the existing test_model(). test_model() predicts sentiment, compares it to the flat true labels, then combines both labels into a data table and returns it. Our new function predict_sentiment() only predicts sentiment, then returns a data table with updated sentiment values.
predict_sentiment(parsed_dt, model_saved_path, model_select=0): Takes in a data table (w/ columns "Text" and "Polarity") then applies predicted sentiment values and returns that table.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions