Difficulty: Intermediate
Estimated Time: 30-45 minutes

Usually projects have a central data storage, located somewhere on the cloud, where it can be accessed by all the parties involved in the project. It helps in sharing the data of the project, intermediate results, models, etc. among the project members. This is done with the commands dvc push and dvc pull (which are similar to git push and git pull).

In this example we will assume a central data storage server that can be accessed through SSH from two different users. For the sake of example the central Git repository will be located in this server too, but in general it can be anywhere, it doesn't have to be on the same server with the DVC data storage.

We will see some configurations and setup that can enable and facilitate data sharing in such a scenario. However keep in mind that this is just an example, and other variations in configuration might be possible, depending on your real situation.

SSH Remote DVC Storage

Step 1 of 4

Step 1

Setup the central server

  1. Create user accounts:

    useradd -m -s /bin/bash user1

    echo user1:pass1 | chpasswd

    ls -al /home/user1/

    useradd -m -s /bin/bash user2

    echo user2:pass2 | chpasswd

    ls -al /home/user2/

  2. Create groups for Git and DVC:

    addgroup git-group

    adduser user1 git-group

    adduser user2 git-group

    addgroup dvc-group

    adduser user1 dvc-group

    adduser user2 dvc-group

  3. Create a bare Git repository for the project:

    git init --bare --shared /srv/project.git

    cd /srv/project.git

    ls -al

    chgrp -R git-group .

    chmod -R g+rws .

    ls -al

    cd -

  4. Create a directory for the DVC remote cache:

    mkdir /srv/project.cache

    cd /srv/project.cache/

    chgrp -R dvc-group .

    chmod -R g+rws .

    cd -