Difficulty: Advanced
Estimated Time: 45-60 minutes

Prerequisites: If you haven't already done it, you should study first these examples which are closely related:

Usually projects have a central data storage, which can be accessed by all the parties involved in the project. It helps in sharing the data of the project with the commands dvc push and dvc pull (which are similar to git push and git pull).

However the commands dvc push and dvc pull support only a few cloud storage types, like: SSH, S3, GCS, HDFS, etc. while there are lots of other cloud storages out there (for example look at the ones supported by rclone). Does it mean that we cannot use these storage types for sharing data with DVC? Not at all! Using a couple of tricks we can still share DVC data through them.

In this example we will see how to achieve this with the help of a SSH storage and rsync. Yes, SSH is one of the storage types that is already supported by DVC, and normally we don't need to do this. But we are using it just as an example, since SSH is easy to be used for an interactive tutorial. Once you understand how it works, it should be easy to implement it for other storage types.

Synchronized DVC Storage

Step 1 of 5

Step 1

Prepare

The setup of this example is similar to that of using a SSH server for data sharing.

  1. Setup of the server:

    play setup-server.sh

  2. Click on this command to switch to the first user on another terminal tab: su - first-user

    Then run the setup commands:

    play setup-first-user.sh

  3. Click on this command to switch to the second user on another terminal tab: su - second-user

    Then run the setup commands:

    play setup-second-user.sh