Okay, now that we've learned how to track data and models with DVC and how to version them with Git, next question is:
- How can we use these artifacts outside of the project?
- How do I download a model to deploy it?
- How do I download a specific version of a model?
- How do I reuse datasets across different projects?
These questions tend to come up when you browse the files that DVC saves to remote storage, e.g.
s3://dvc-public/remote/get-started/fb/89904ef053f04d64eafcc3d70db673😱 instead of the original files, name such as
Let's learn how any DVC tracked ML model, dataset or file can be accessed:
- From CLI with
- From Python API with
- From another repository with
If you prefer to run locally, you can also supply the commands in this scenario in a container:
docker run -it dvcorg/doc-katacoda:start-accessing
We can download any file from a DVC repository:
dvc get \ https://github.com/iterative/dataset-registry \ get-started/data.xml
Just for fun, let's try to download it with
storage="https://remote.dvc.org/dataset-registry" path="a3/04afb96060aad90176268345e10355" wget -O data.xml.1 $storage/$path
Check whether they are the same file:
diff data.xml data.xml.1
Instead of downloading the data file directly, e.g., with
aws s3 cp,
wget, we are accessing it using a Git repo URL as an entry point or as
a data/model registry.
By the way, we didn't initialize DVC in the current directory yet.
doesn't need an initialized project.
Let's initialize DVC now.