Some teams may prefer using one single shared machine to run their experiments. This allows them to have better resource utilization such as the ability to use multiple GPUs, centralize all data storage, etc.
With DVC, you can easily setup shared data storage on the server. This
allows your team to store and share data for your projects
efficiently, and to have almost instantaneous workspace restoration
speed, similar to
git checkout for your code.
In this example we will see how two different users on the same host can share data with the help of a local data storage. So, both of the users and the data storage are located on the same machine and no remote server or storage is involved.
Create a directory for the data
Since our filesystem (ext4) does not support reflinks, let's create and mount an XFS filesystem. We will see later why using a filesystem that supports reflinks is important for an efficient operation of DVC.
Note: If your root filesystem already supports reflinks, you can skip the steps 1-3.
Create an image file big enough to hold the data and all the caches:
cd /var/local/ fallocate -l 20G data.img df -h /
Format it as an XFS filesystem with reflink enabled, and mount it:
mkfs.xfs \ -m reflink=1 \ -L data \ data.img mkdir data mount -o loop data.img data
(Optional) Make sure that it is mounted automatically on reboot:
cat <<EOF >> /etc/fstab /var/local/data.img /var/local/data auto loop 0 0 EOF
Declare the environment variable DATA that contains the path to the data directory:
echo 'export DATA=/var/local/data' \ >> ~/.bashrc source ~/.bashrc echo $DATA
Declare it for all the new users as well:
echo 'export DATA=/var/local/data' \ >> /etc/skel/.bashrc