`xvc experiment`

This gist of the experiments is to compare them. We need a `diff` facility to compare inputs and results across runs. 

We don't have to limit this comparison to defined outputs in the pipeline. Any file changed between two runs can be diffed. 

There can be three types of diffs: 
- Unstructured diffs: This is for binary files that we don't recognize. Only the content digest is reported. 
- Structured diffs: For a file format that we can parse, we can report the individual differences across runs. JSON, YAML or any other format that we can parse for results can be reported as structured diff. 
- Text diffs: This is for the source code files that may have lead to changes in other files. 

The workflow is as follows: 

- User has a bunch of files, source, params, data, model, etc. 
- User modifies some of these manually. e.g. updating the source code. 
- User modifies some of these with `xvc exp run --input-param` command. 
- User runs a command (or pipeline) on the files. 
- Xvc clones/rechecks/copies files from original to a directory in `.xvc/exp/KEYWORD-RANDOMSTRING-TIMESTAMP` directory. 
- Xvc links the original cache.
- Xvc creates a `.xvc-exp` directory to store experiment specific data.
- Xvc modifies the files with the given modification option.
    - `--input-param params.yaml params.my-param 123,124,135` creates 3 experiments, each changing `params.yaml::params.my-param` to a given value.
- Xvc runs the given command (or pipeline) in the directory
- Xvc stores the updated artifacts in the common cache, symlinking the results.
- User asks for results diffed from the original.
- Xvc compares each of the directories for the changed files.
- Xvc shows unstructured files digest strings.
- Xvc shows structured files changed values.
- Xvc shows text file diffs similar to Git.

All results must be reported in JSON. Tables may be built from this JSON.  

The second facility `xvc exp` provides is to modify structured files quickly for each experiment. 

`xvc exp run --input-param file.yaml dict.key value1,value2,value3` will parse `file.yaml`, update `dict.key` with `value1` and run an experiment, update with `value2` and run another, update with `value3` and run another. 

`xvc exp run --input-param file.json dict.key '0;5;100'` will run experiments with `0,5,10,15,20,...,100` (inclusive).

Files to be modified are JSON, YAML1.2 and TOML files. (Anything serde can read/write is possible in theory.)

We can extend this functionality to regex. `--input-regex file.txt 'my_var = (.*)' 0;0.1;1` updates `$1` in regex with the values. 

We can also use `--command-template` for this. `xvc exp run --command-template 'python train.py ${{EXP_VALUE}}' 0;0.2;10` will run `python train.py` with parameters 0, 0.2, 0.4, .... in different experiments. 

If there are more than one `--input-param`, `--input-regex`, `--command-template` parameters, we build permutations of values. `xvc exp run --input-param file.yaml dict.key 1,2,3 --input-param another.yaml another.key 5,6,7` will run 9 experiments. 

There may be three subcommands for `xvc exp run`. 

- `xvc exp run pipeline --name`: (`xvcerp`) Runs a pipeline command with the given parameters. (`xvc pipeline run --name`)
- `xvc exp run command 'cmd'`: Runs a generic command as experiment
- `xvc exp run template 'cmd ${{EXP_VALUE_1}} ${{EXP_VALUE_2}} 1,2,3 4,5,6` runs a command by substituing values to the command string.

`--input-param`  and `--input-regex` options are available to all three of these. Maybe instead of `--input-param`, it's better to use `--update-param` and `--update-regex`. Maybe we can merge these, but I don't like to have corner cases. 

`--keyword` will set the `KEYWORD` portion of experiment names. By default, this is `exp`. User may want to set to a searchable name. 

The updated params, and run commands are stored in `.xvc-exp` directory. It may contain the exact script that was run. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`xvc experiment` #184

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

xvc experiment #184

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`xvc experiment` #184