Experiments proliferate quickly in ML projects where there are many
parameters to tune or other permutations of the code. DVC 2.0 introduces a new
way to organize such projects and only keep what we ultimately need with dvc
experiments
. DVC can track experiments for you so there's no need to commit
each one to Git. This way your repo doesn't become polluted with all of them.
You can discard experiments once they're no longer needed.
📖 See Experiment Management for more information on DVC's approach.
If you prefer to run locally, you can also supply the commands in this scenario in a container:
docker run -it dvcorg/doc-katacoda:start-experiments

Steps
Experiments
Step 1
Running experiments
In the parameters and metrics scenario, we learned how to tune pipelines and compare their performance. However, when the number of parameters increases, it becomes unfeasible to keep track of the changes through Git commits. In version 2.0, DVC introduced a new way of running and comparing experiments in repositories without checking in them to Git.
All the commands we'll see in this scenario are subcommands of dvc exp
. Let's
see the help text first:
dvc exp --help
The first command we'll use is dvc exp run
. It's like dvc repro
with added
features for experiments, like changing the hyperparameters with command line
options:
dvc exp run --set-param featurize.max_features=1500 \
-S featurize.ngrams=2
The --set-param
(or -S
) flag sets the values for parameters as a shortcut
to editing params.yaml
.
Check that the featurize.max_features
value has been updated in params.yaml
:
git diff params.yaml
We can compare the experiment results with:
dvc exp diff