Difficulty: Advanced
Estimated Time: 40-50 minutes

Some teams may prefer using one single shared machine to run their experiments. This allows them to have better resource utilization such as the ability to use multiple GPUs, centralize all data storage, etc.

With DVC, you can easily setup shared data storage on the server. This allows your team to store and share data for your projects efficiently, and to have almost instantaneous workspace restoration speed, similar to git checkout for your code.

In this example we will see how two different users on the same host can share data with the help of a local data storage. So, both of the users and the data storage are located on the same machine and no remote server or storage is involved.

Shared Server

Step 1 of 3

Step 1

Create a directory for the data

Since our filesystem (ext4) does not support reflinks, let's create and mount an XFS filesystem. We will see later why using a filesystem that supports reflinks is important for an efficient operation of DVC.

Note: If your root filesystem already supports reflinks, you can skip the steps 1-3.

  1. Create an image file big enough to hold the data and all the caches:

    cd /var/local/

    df -h /

    fallocate -l 20G data.img

    ls -lh

    df -h /

  2. Format it as an XFS filesystem with reflink enabled, and mount it:

    mkfs.xfs \
        -m reflink=1 \
        -L data \
        data.img
    

    mkdir data

    ls -lh

    mount -o loop data.img data

  3. Make sure that it is mounted automatically on reboot:

    cat <<EOF >> /etc/fstab
    /var/local/data.img  /var/local/data  auto  loop  0 0
    EOF
    

    cat /etc/fstab | grep data

    mount -a

  4. Declare the environment variable DATA that contains the path to the data directory:

    echo 'export DATA=/var/local/data' \
        >> ~/.bashrc
    

    source ~/.bashrc

    echo $DATA

    Declare it for all the new users as well:

    echo 'export DATA=/var/local/data' \
        >> /etc/skel/.bashrc