Prepare the dataset

The smile dataset that we will use can be found on GitHub.

We will use dataset in repos/SMILEsmileD/SMILEs/:

  • negatives
  • positives

Check number of images in the two sets:

ls repos/SMILEsmileD/SMILEs/negatives/negatives7/. -1 | wc -l
ls repos/SMILEsmileD/SMILEs/positives/positives7/. -1 | wc -l

The dataset consists of ~3000 positive images and ~9000 negative images. To avoid generating a biased model, we would like to have the same number of positive and negative images.

To fix this, we can augment the dataset by using this augmentation script on the positive images. Using this script will increase the number of positive examples by 3x:

Create folder, if it is not already present:

mkdir repos/SMILEsmileD/SMILEs/positives/positives_aug/

Augment data set:

python2 repos/openmv/tools/ --input repos/SMILEsmileD/SMILEs/positives/positives7/ --output repos/SMILEsmileD/SMILEs/positives/positives_aug/ --count 3
