Documentation

This document provides an overview of the codebase for generating synthetic MWOs by LLMs and humanising them using rule-based approaches.

MWO Sentence Generation via LLM

The following functionalities are implemented:

llm_generate.py: code to prepare few-shot examples, generate synthetic MWO sentences using GP-4o mini, processing of LM outputs
- get_all_paths(): get all stored paths from json files in path_patterns directory
- get_generate_prompt(): prepare prompt for LLM to generate synthetic MWO sentences
- get_generate_fewshot(): prepare few-shot examples for LLM
- generate_mwo(): generate synthetic MWO sentences using LLM (simple)
- generate_diverse_mwo(): generate diverse synthetic MWO sentences using LLM
- process_mwo_response(): process LLM outputs of synthetic MWO sentences
- get_samples(): samples paths from each path type
llm_prompt.py: code to get list of prompt variations, processing of LLM outputs, and paraphrasing the prompts
- initialise_prompts(): get list of prompt variations for LLM
- check_similarity(): check similarity between prompt variations
- process_prompt_response(): process LLM outputs of prompt variations
- paraphrase_prompt(): paraphrase the prompts for LLM
diversity_experiment.ipynb: experiments for increasing the diversity of the LLM-generated MWO sentences per path
- Same prompt VS Variations of prompt
- Single generation VS Batch generation
- Generation VS Paraphrasing
You can find the few-shot examples used in the fewshot_messages directory.
Some logs of the LLM-generated MWO sentences can be found in the mwo_sentences directory.

The following functionalities are implemented: