Skip to content

janettec/cs230-robot-apocalypse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cs230-robot-apocalypse

Preprocessing:

  • Normalize punctuation
  • Tokenize (word and character level)
  • Learn and apply Byte Pair Encoding (BPE)
  • Generate python code corpus
  • Generate annotation corpus
  • Shuffle data
  • Split data to train/dev/test sets

Running:

A shell script for hyperparameter tuning was generated using generate_sweep_commands.py. The data used for the coarse sweep can be found in split_data.

To train manually:

python train.py   --cell_type 'lstm' \ 
                    --attention_type 'luong' \
                    --hidden_units 1024 \
                    --depth 2 \
                    --embedding_size 500 \
                    --num_encoder_symbols 30000 \
                    --num_decoder_symbols 30000 ...
                    

To decode manually:

python decode.py  --beam_width 5 \
                    --decode_batch_size 30 \
                    --model_path $PATH_TO_A_MODEL_CHECKPOINT (e.g. model/translate.ckpt-100) \
                    --max_decode_step 300 \
                    --write_n_best False
                    --decode_input $PATH_TO_DECODE_INPUT
                    --decode_output $PATH_TO_DECODE_OUTPUT
                    

Seq2seq model modified from JayParks (https://github.com/JayParks/tf-seq2seq)

BLEU score calculation:

perl multi-bleu-detok.perl data/dev.code < output$MODEL_NUM_dev/output_train_$BEAM_WIDTH

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •