Difficulty: Intermediate
Estimated Time: 45-60 min

This tutorial covers creating a model to classify images of hand-written digits (0 to 9) using MNIST as the data-set. The focus of the tutorial is to show how we use DVC in order to version our data pipeline, the benefits that it brings to our workflow.

MNIST

Step 1 of 5

Step 1

Setup

  1. Create a directory for the project and initialize it:

    mkdir mnist

    cd mnist

    git init

    dvc init

    tree -a -I .git

  2. Create and config a local data storage:

    mkdir /root/dvc-storage

    dvc remote add --default \
        storage /root/dvc-storage
    

    dvc remote list

    cat .dvc/config

  3. Add the code and create other directories:

    mkdir code data metrics

    ls /usr/local/share/

    tar -C /tmp -xzvf \
        /usr/local/share/mnist-example-code.tgz
    

    tree /tmp/mnist-example-code/

    cp /tmp/mnist-example-code/SVM/*.py code/

    tree

  4. Create a virtualenv and install the requirements:

    virtualenv -p python3 .env

    echo .env/ >> .gitignore

    echo __pycache__/ >> .gitignore

    source .env/bin/activate

    cat << EOF > requirements.txt
    numpy==1.15.4
    pandas==0.23.4
    python-dateutil==2.7.5
    pytz==2018.7
    scikit-learn==0.20.2
    scipy==1.2.0
    six==1.12.0
    sklearn==0.0
    EOF
    

    cat requirements.txt

    pip install -r requirements.txt

  1. Commit progress to Git:

    git status -s

    git add .

    git commit -m "Initialized project"