This repo include codes that we used for the experiments in our ACL 2023 paper (main):
Python Code Generation by Asking Clarification Questions
Haau-Sing Li, Mohsen Mesgar, André F. T. Martins, Iryna Gurevych
Contact person: Haau-Sing Li
https://www.ukp.tu-darmstadt.de/
Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.
⚠️ This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.
-
Installing packages from
requirements.txt. Note that for ranking model please refer to our fork oftransformer_rankers. (We usePython: 3.9.12andcuda 11.6) -
Download our dataset.
-
(Optional) If you want to generate the dataset files for training on different modules, you can use the following script.
python3 gen_dataset.py --/path/to/data/- Training
- Clarification Need Prediction
python3 classifier.py --model_name $MODEL \
--data_dir /path/to/data \
--model_dir /path/to/saved/models \
--seed $SEED- CQ Ranking
python3 ranker.py --model_name $MODEL --seed $SEED --num_epochs $NUM_EPOCHS \
--negative_sampling_strategy $SAMPLING_STRATEGY \
--train_batch_size 32 --eval_batch_size 1024 \
--learning_rate 5e-5 --max_seq_len 192 \
--save_dir /path/to/dir- Code Generation
python3 {t5|plbart|causal_lm}.py --data_dir /path/to/data
--model_dir /path/to/saved/models
--model_name $MODEL
--data_affix $DATA_AFF
--seed $SD
--num_train_epochs #I use 40 since it converges only after these many.- You should run code from
./evaluate_moduleto evaluate models, at least for rankers and code generator as causal LMs (since they take more time).
- Inference on the whole pipeline. You should run files from
./evaluate_pipeline. the order should be:
pred_ranker.pygen_data_preds.pypred_plbart/t5.py