Difficulty: Beginner
Estimated Time: 15-20 min

Experiments proliferate quickly in ML projects where there are many parameters to tune or other permutations of the code. DVC 2.0 introduces a new way to organize such projects and only keep what we ultimately need with dvc experiments. DVC can track experiments for you so there's no need to commit each one to Git. This way your repo doesn't become polluted with all of them. You can discard experiments once they're no longer needed.

📖 See Experiment Management for more information on DVC's approach.

If you prefer to run locally, you can also supply the commands in this scenario in a container:

docker run -it dvcorg/doc-katacoda:start-experiments

Experiments

Step 1 of 6

Step 1

Running experiments

In the parameters and metrics scenario, we learned how to tune pipelines and compare their performance. However, when the number of parameters increases, it becomes unfeasible to keep track of the changes through Git commits. In version 2.0, DVC introduced a new way of running and comparing experiments in repositories without checking in them to Git.

All the commands we'll see in this scenario are subcommands of dvc exp. Let's see the help text first:

dvc exp --help

The first command we'll use is dvc exp run. It's like dvc repro with added features for experiments, like changing the hyperparameters with command line options:

dvc exp run --set-param featurize.max_features=1500 \
            -S featurize.ngrams=2

The --set-param (or -S) flag sets the values for parameters as a shortcut to editing params.yaml.

Check that the featurize.max_features value has been updated in params.yaml:

git diff params.yaml

We can compare the experiment results with:

dvc exp diff