Here is the list of all arguments available to launch a training.
If an argument expects some specific values, you can get them by
using the --help flag.
-
--chunk_n_jobs: sliding window size -
--duration_type: either stochastic or deterministic -
--first_machine_id_is_one: in pure taillard format, machine numbering start at 1 -
--generate_duration_bounds X Y: real durations are sampled within the bounds ($-X%$ ,$+Y%$ ) -
--load_from_job: index of first job to use -
--load_max_jobs: max number of jobs for sampling -
--load_problem: forces to read a problem definition instead of generating one -
--max_duration: max duration of tasks for deterministic problem generation -
--max_n_j: max number of jobs (default is n_j) -
--max_n_m: max number of machines (default is n_m) -
--n_j: number of jobs -
--n_m: number of machines -
--sample_n_jobs: number of jobs to sample -
--validate_on_total_data: force evaluation on global problem
--load_problem: forces to read a problem definition instead of generating one--train_dir: the directory containing all problems you want to train on--test_dir: the directory containing all test problems--train_test_split: if no--test_diris provided, the train instances will be splitted according to this ratio
-
--batch_size: batch size for PPO -
--clip_range: clip gradients -
--dont_normalize_advantage: do not normlize advantage function -
--ent_coef: entropy coefficient in PPO loss -
--fe_lr: learning rate of ther feature extreactor, if different from global learning rate -
--fixed_problem: force use of same problem along all training -
--freeze_graph: freeze graph during learning (for debugging purposes) -
--gae_lambda: lambda parameter of the Generalized Advantage Estimation -
--gamma: discount factor, default 1 for finite horizon -
--lr: learning rate -
--n_epochs: number of time a given replay buffer is used during training -
--n_steps_episode: number of action per sequence (generally$k \times n_j \times n_m$ ) -
--optimizer: optimizer to use -
--target_kl: target kl for PPO -
--total_timesteps: total number of training timesteps -
--vf_coef: value function coefficient in PPO loss
-
--device: device idcpu,cuda:0... -
--n_workers: number of data collecting threads (size of data buffer is n_steps_episode$\times$ n_workers) -
--vecenv_type: type of threading for data collection
--fixed_random_validation: number of fixed problem to generate for validation--fixed_validation: Fix and use same problems for agent evaluation and or-tools. When used, the validation instances are solved once for all baselines (ortools and custom heuristics)--max_time_ortools: or-tools timeout--n_test_problems: number of problems to generate for validation (in case they are not pre-generated with fixed_validation and fixed_random_validation--n_validation_env: number of validation environment for model evaluation--test_print_every: print frequency of evaluations--validation_freq: number of steps between evaluations and their values are reused for the rest of the training. Only the trained model is evaluated every time the validation evaluation is triggered.
--custom_heuristic_names: heuristics to use as a comparison (SPT,MWKR,MOPNR,FDD/MWKR), only available for JSSP problems.--ortools_strategy: any number of strategies to use for the OR-Tools solver (realistic,averagistic)
--conflicts: conflict encoding in GNN--graph_has_relu: add (r)elu in GNN--graph_pooling: global pooling mode default is learn (ie pool node)--hidden_dim_actor: latent dim of actor--hidden_dim_critic: latent dim of actor--hidden_dim_features_extractor: latent dimension in GNN--mlp_act: activation function in MLP in GNN (if any, default to gelu)--n_attention_heads: number of attention heads in GNN (if any)--n_layers_features_extractor: number of layers in GNN--n_mlp_layers_actor: number of layers of actor--n_mlp_layers_critic: number of layers of critic--n_mlp_layers_features_extractor: number of layers in MLP in GNN (if any)--normalize_gnn: add normalization layers in GNN--residual_gnn: add residual connections in GNN--reverse_adj_in_gnn: invert adj direction in pyg feature extractor (for debug, deprecated)
--do_not_observe_updated_bounds: task completion time (tct) bounds are computed on-the-fly during trial (necessary for L2D reward model), with this option updated tct bounds are not given to the agent (not observed)--dont_normalize_input: do not normalize state data--insertion_mode: allow insertion--observe_duration_when_affect: with this option, real durations are observed at affectation time and used to tighten task completion time bounds.--reward_model_config: reward model--transition_model_config: transition type
--reinit_head_before_ppo: replace the actor and policy heads by newly initialized weights just before starting PPO (and after all model's weights), useful after a pretraining for example--resume: use the experiment exact name to load the weight of the previous saved model--retrain PATH: load the model pointed by the PATH
--path: directory where training logs are saved, a subdirectory is created for each new training