Experiments proliferate quickly in ML projects where there are many
parameters to tune or other permutations of the code. DVC 2.0 introduces a new
way to organize such projects and only keep what we ultimately need with
experiments. DVC can track experiments for you so there's no need to commit
each one to Git. This way your repo doesn't become polluted with all of them.
You can discard experiments once they're no longer needed.
📖 See Experiment Management for more information on DVC's approach.
If you prefer to run locally, you can also supply the commands in this scenario in a container:
docker run -it dvcorg/doc-katacoda:start-experiments
In the parameters and metrics scenario, we learned how to tune pipelines and compare their performance. However, when the number of parameters increases, it becomes unfeasible to keep track of the changes through Git commits. In version 2.0, DVC introduced a new way of running and comparing experiments in repositories without checking in them to Git.
All the commands we'll see in this scenario are subcommands of
dvc exp. Let's
see the help text first:
dvc exp --help
The first command we'll use is
dvc exp run. It's like
dvc repro with added
features for experiments, like changing the hyperparameters with command line
dvc exp run --set-param featurize.max_features=1500 \ -S featurize.ngrams=2
-S) flag sets the values for parameters as a shortcut
Check that the
featurize.max_features value has been updated in
git diff params.yaml
We can compare the experiment results with:
dvc exp diff