-
Notifications
You must be signed in to change notification settings - Fork 4
Description
This gist of the experiments is to compare them. We need a diff facility to compare inputs and results across runs.
We don't have to limit this comparison to defined outputs in the pipeline. Any file changed between two runs can be diffed.
There can be three types of diffs:
- Unstructured diffs: This is for binary files that we don't recognize. Only the content digest is reported.
- Structured diffs: For a file format that we can parse, we can report the individual differences across runs. JSON, YAML or any other format that we can parse for results can be reported as structured diff.
- Text diffs: This is for the source code files that may have lead to changes in other files.
The workflow is as follows:
- User has a bunch of files, source, params, data, model, etc.
- User modifies some of these manually. e.g. updating the source code.
- User modifies some of these with
xvc exp run --input-paramcommand. - User runs a command (or pipeline) on the files.
- Xvc clones/rechecks/copies files from original to a directory in
.xvc/exp/KEYWORD-RANDOMSTRING-TIMESTAMPdirectory. - Xvc links the original cache.
- Xvc creates a
.xvc-expdirectory to store experiment specific data. - Xvc modifies the files with the given modification option.
--input-param params.yaml params.my-param 123,124,135creates 3 experiments, each changingparams.yaml::params.my-paramto a given value.
- Xvc runs the given command (or pipeline) in the directory
- Xvc stores the updated artifacts in the common cache, symlinking the results.
- User asks for results diffed from the original.
- Xvc compares each of the directories for the changed files.
- Xvc shows unstructured files digest strings.
- Xvc shows structured files changed values.
- Xvc shows text file diffs similar to Git.
All results must be reported in JSON. Tables may be built from this JSON.
The second facility xvc exp provides is to modify structured files quickly for each experiment.
xvc exp run --input-param file.yaml dict.key value1,value2,value3 will parse file.yaml, update dict.key with value1 and run an experiment, update with value2 and run another, update with value3 and run another.
xvc exp run --input-param file.json dict.key '0;5;100' will run experiments with 0,5,10,15,20,...,100 (inclusive).
Files to be modified are JSON, YAML1.2 and TOML files. (Anything serde can read/write is possible in theory.)
We can extend this functionality to regex. --input-regex file.txt 'my_var = (.*)' 0;0.1;1 updates $1 in regex with the values.
We can also use --command-template for this. xvc exp run --command-template 'python train.py ${{EXP_VALUE}}' 0;0.2;10 will run python train.py with parameters 0, 0.2, 0.4, .... in different experiments.
If there are more than one --input-param, --input-regex, --command-template parameters, we build permutations of values. xvc exp run --input-param file.yaml dict.key 1,2,3 --input-param another.yaml another.key 5,6,7 will run 9 experiments.
There may be three subcommands for xvc exp run.
xvc exp run pipeline --name: (xvcerp) Runs a pipeline command with the given parameters. (xvc pipeline run --name)xvc exp run command 'cmd': Runs a generic command as experimentxvc exp run template 'cmd ${{EXP_VALUE_1}} ${{EXP_VALUE_2}} 1,2,3 4,5,6runs a command by substituing values to the command string.
--input-param and --input-regex options are available to all three of these. Maybe instead of --input-param, it's better to use --update-param and --update-regex. Maybe we can merge these, but I don't like to have corner cases.
--keyword will set the KEYWORD portion of experiment names. By default, this is exp. User may want to set to a searchable name.
The updated params, and run commands are stored in .xvc-exp directory. It may contain the exact script that was run.