-
Notifications
You must be signed in to change notification settings - Fork 53
Open
Description
Hi
@VictorSanh
Thanks for releasing the code and data. I am trying to retrain it in pytorch
Some questions , in your paper you have p=1 vs p=5.7 results
Say for p=1 we take one random prompt per example of a dataset. This is fine perfectly
I have some doubts about the
1) Sampling strategy: proportional to the number of examples in each dataset (we treated any dataset with over 500'000 examples as having 500'000/num_templates examples) -
Does this mean for big datasets like gigaword you include 422661 examples instead of 3803957
2) On huggingface T0 it says Fine-tuning steps: 12'200 but in your script says
export TRAIN_STEPS=1112200. Any idea how many epochs you trained ?
3) Can you tell the total number of samples included for p=1 given tasks ['commonsense_qa', 'dream', 'quail', 'quartz', 'social_i_qa', 'wiqa', 'cosmos_qa', 'qasc', 'quarel', 'sciq', 'wiki_hop', 'adversarial_qa_dbert', 'adversarial_qa_dbidaf', 'adversarial_qa_droberta', 'quoref', 'duorc_ParaphraseRC', 'duorc_SelfRC', 'ropes', 'wiki_qa', 'common_gen', 'wiki_bio', 'app_reviews', 'amazon_polarity', 'imdb', 'rotten_tomatoes', 'gigaword', 'cnn_dailymail', 'multi_news', 'samsum', 'xsum', 'ag_news', 'dbpedia_14', 'trec', 'paws_labeled_final', 'glue_mrpc', 'glue_qqp', 'yelp_review_full', 'kilt_tasks_hotpotqa']
I have Num examples = 3068602 , which was done by taking p=1 from individual datasets , for datasets bigger than 500k dividing num of samples by num_of_prompts. If you have the file for T0 ( p=1 ) or (p=5.7) do you mind sharing them
4) Example grouping: We use packing to combine multiple training examples into a single sequence to reach the maximum sequence length . Not sure whats this ? Is it necessary and how can we do it ?
matthiasgalle
Metadata
Metadata
Assignees
Labels
No labels