Experiments proliferate quickly in ML projects where there are many parameters
to tune or other permutations of the code. DVC 2.0 introduces a new way to
organize such projects and only keep what we ultimately need with
experiments. DVC can track experiments for you so there's no need to commit
each one to Git. This way your repo doesn't become polluted with all of them.
You can discard experiments once they're no longer needed.
For this scenario we have a new project that uses Tensorflow and the venerable MNIST dataset. The project has two Artifical Neural Networks with several hyperparameters.
📖 See Experiment Management for more information on DVC's approach.
If you prefer to run locally, you can also supply the commands in this scenario in a container:
docker run -it dvcorg/doc-katacoda:start-experiments
In the parameters and metrics scenario, we learned how to tune pipelines and compare their performance. However, when the number of parameters increases, it becomes unfeasible to keep track of the changes through Git commits. In version 2.0, DVC introduced a new way of running and comparing experiments in repositories without checking in them to Git.
All the commands we'll see in this scenario are subcommands of
dvc exp. Let's
see the help text first:
dvc exp --help
The first command we'll use is
dvc exp run. It's like
dvc repro with added
features for experiments, like changing the hyperparameters with command line
dvc exp run --set-param model.name=mlp
-S) flag sets the values for parameters as a shortcut
model.name parameter has been updated in
git diff params.yaml
We can compare the experiment results with:
dvc exp diff