2015-11-10 23:38:27 +08:00
|
|
|
# Training new neural network models
|
|
|
|
|
2018-03-02 08:03:55 +08:00
|
|
|
Note: Kenneth Jung noticed that the model definitions are slightly
|
|
|
|
different than the pre-trained models.
|
|
|
|
For more information, see issues
|
|
|
|
[#351](https://github.com/cmusatyalab/openface/issues/351) and
|
|
|
|
[#349](https://github.com/cmusatyalab/openface/issues/349).
|
|
|
|
|
|
|
|
---
|
|
|
|
|
2015-11-11 03:31:24 +08:00
|
|
|
We have also released our deep neural network (DNN)
|
|
|
|
training infrastructure to promote an open ecosystem and enable quicker
|
|
|
|
bootstrapping for new research and development.
|
|
|
|
|
|
|
|
There is a distinction between training the DNN model for feature representation
|
|
|
|
and training a model for classifying people with the DNN model.
|
|
|
|
If you're interested in creating a new classifier,
|
|
|
|
see [Demo 3](http://cmusatyalab.github.io/openface/demo-3-classifier/).
|
2015-11-12 00:00:21 +08:00
|
|
|
This page is for advanced users interested in training a new DNN model
|
|
|
|
and should be done with large datasets (>500k images) to improve the
|
|
|
|
feature representation.
|
2015-11-10 23:38:27 +08:00
|
|
|
|
2015-11-12 00:00:21 +08:00
|
|
|
*Warning:* Training is computationally and memory expensive and takes a
|
2016-01-13 21:36:16 +08:00
|
|
|
day on our Tesla K40 GPU.
|
2015-11-01 20:52:46 +08:00
|
|
|
|
|
|
|
A rough overview of training is:
|
|
|
|
|
|
|
|
## 1. Create raw image directory.
|
|
|
|
Create a directory for your raw images so that images from different
|
|
|
|
people are in different subdirectories. The names of the labels or
|
|
|
|
images do not matter, and each person can have a different amount of images.
|
|
|
|
The images should be formatted as `jpg` or `png` and have
|
|
|
|
a lowercase extension.
|
|
|
|
|
|
|
|
```
|
|
|
|
$ tree data/mydataset/raw
|
|
|
|
person-1
|
|
|
|
├── image-1.jpg
|
|
|
|
├── image-2.png
|
|
|
|
...
|
|
|
|
└── image-p.png
|
|
|
|
|
|
|
|
...
|
|
|
|
|
|
|
|
person-m
|
|
|
|
├── image-1.png
|
|
|
|
├── image-2.jpg
|
|
|
|
...
|
|
|
|
└── image-q.png
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## 2. Preprocess the raw images
|
2016-04-08 05:18:57 +08:00
|
|
|
If you plan to compute LFW accuracies, remove all LFW identities for your dataset.
|
|
|
|
We provide an example script doing this with string matching in
|
|
|
|
[remove-lfw-names.py](https://github.com/cmusatyalab/openface/blob/master/data/casia-facescrub/remove-lfw-names.py).
|
|
|
|
|
2015-11-01 20:52:46 +08:00
|
|
|
Change `8` to however many
|
|
|
|
separate processes you want to run:
|
2016-01-16 00:01:26 +08:00
|
|
|
`for N in {1..8}; do ./util/align-dlib.py <path-to-raw-data> align outerEyesAndNose <path-to-aligned-data> --size 96 & done`.
|
2015-11-01 20:52:46 +08:00
|
|
|
|
2016-01-17 05:32:16 +08:00
|
|
|
Prune out directories with less than 3 images per class with
|
|
|
|
`./util/prune-dataset.py <path-to-aligned-data> --numImagesThreshold 3`.
|
|
|
|
|
2016-03-07 08:54:40 +08:00
|
|
|
<!-- Split the dataset into `train` and `val` subdirectories -->
|
|
|
|
<!-- with `./util/create-train-val-split.py <path-to-aligned-data> <validation-ratio>`. -->
|
|
|
|
<!-- One option could be to have all of your data in `train` and -->
|
|
|
|
<!-- then validate the model with the LFW experiment. -->
|
2016-01-07 06:01:52 +08:00
|
|
|
|
2015-11-01 20:52:46 +08:00
|
|
|
## 3. Train the model
|
2015-11-01 21:09:21 +08:00
|
|
|
Run [training/main.lua](https://github.com/cmusatyalab/openface/blob/master/training/main.lua) to start training the model.
|
|
|
|
Edit the dataset options in [training/opts.lua](https://github.com/cmusatyalab/openface/blob/master/training/opts.lua) or
|
2015-11-01 20:52:46 +08:00
|
|
|
pass them as command-line parameters.
|
|
|
|
This will output the loss and in-progress models to `training/work`.
|
2016-01-17 05:32:16 +08:00
|
|
|
The GPU memory usage is determined by the `-peoplePerBatch` and
|
|
|
|
`-imagesPerPerson` parameters, which default to 15 and 20 respectively
|
|
|
|
and consume about 12GB of memory.
|
|
|
|
These determine an upper-bound on the mini-batch size and
|
|
|
|
should be reduced for less GPU memory consumption.
|
2015-11-01 20:52:46 +08:00
|
|
|
|
2015-11-07 02:57:34 +08:00
|
|
|
Warning: Metadata about the on-disk data is cached in
|
2016-03-07 08:54:40 +08:00
|
|
|
`training/work/trainCache.t7` and assumes
|
2015-11-07 02:57:34 +08:00
|
|
|
the data directory does not change.
|
|
|
|
If your data directory changes, delete these
|
|
|
|
files so they will be regenerated.
|
|
|
|
|
2015-11-09 21:26:33 +08:00
|
|
|
### Stopping and starting training
|
|
|
|
Models are saved in the `work` directory after every epoch.
|
|
|
|
If the training process is killed, it can be resumed from
|
|
|
|
the last saved model with the `-retrain` option.
|
|
|
|
Also pass a different `-manualSeed` so a different image
|
|
|
|
sequence is sampled and correctly set `-epochNumber`.
|
|
|
|
|
2015-11-01 20:52:46 +08:00
|
|
|
## 4. Analyze training
|
2015-11-01 21:09:21 +08:00
|
|
|
Visualize the loss with [training/plot-loss.py](https://github.com/cmusatyalab/openface/blob/master/training/plot-loss.py).
|
2015-12-18 00:42:58 +08:00
|
|
|
Install the Python dependencies from
|
|
|
|
[training/requirements.txt](https://github.com/cmusatyalab/openface/blob/master/training/requirements.txt)
|
|
|
|
with `pip2 install -r requirements.txt`.
|