openface/docs/training-new-models.md

# Training new neural network models

Note: Kenneth Jung noticed that the model definitions are slightly
different than the pre-trained models.
For more information, see issues
[#351](https://github.com/cmusatyalab/openface/issues/351) and
[#349](https://github.com/cmusatyalab/openface/issues/349).

---

We have also released our deep neural network (DNN)
training infrastructure to promote an open ecosystem and enable quicker
bootstrapping for new research and development.

There is a distinction between training the DNN model for feature representation
and training a model for classifying people with the DNN model.
If you're interested in creating a new classifier,
see [Demo 3](http://cmusatyalab.github.io/openface/demo-3-classifier/).
This page is for advanced users interested in training a new DNN model
and should be done with large datasets (>500k images) to improve the
feature representation.

*Warning:* Training is computationally and memory expensive and takes a
day on our Tesla K40 GPU.

A rough overview of training is:

## 1. Create raw image directory.
Create a directory for your raw images so that images from different
people are in different subdirectories. The names of the labels or
images do not matter, and each person can have a different amount of images.
The images should be formatted as `jpg` or `png` and have
a lowercase extension.

```
$ tree data/mydataset/raw
person-1
├── image-1.jpg
├── image-2.png
...
└── image-p.png

...

person-m
├── image-1.png
├── image-2.jpg
...
└── image-q.png
```


## 2. Preprocess the raw images
If you plan to compute LFW accuracies, remove all LFW identities for your dataset.
We provide an example script doing this with string matching in
[remove-lfw-names.py](https://github.com/cmusatyalab/openface/blob/master/data/casia-facescrub/remove-lfw-names.py).

Change `8` to however many
separate processes you want to run:
`for N in {1..8}; do ./util/align-dlib.py <path-to-raw-data> align outerEyesAndNose <path-to-aligned-data> --size 96 & done`.

Prune out directories with less than 3 images per class with
`./util/prune-dataset.py <path-to-aligned-data> --numImagesThreshold 3`.

<!-- Split the dataset into `train` and `val` subdirectories -->
<!-- with `./util/create-train-val-split.py <path-to-aligned-data> <validation-ratio>`. -->
<!-- One option could be to have all of your data in `train` and -->
<!-- then validate the model with the LFW experiment. -->

## 3. Train the model
Run [training/main.lua](https://github.com/cmusatyalab/openface/blob/master/training/main.lua) to start training the model.
Edit the dataset options in [training/opts.lua](https://github.com/cmusatyalab/openface/blob/master/training/opts.lua) or
pass them as command-line parameters.
This will output the loss and in-progress models to `training/work`.
The GPU memory usage is determined by the `-peoplePerBatch` and
`-imagesPerPerson` parameters, which default to 15 and 20 respectively
and consume about 12GB of memory.
These determine an upper-bound on the mini-batch size and
should be reduced for less GPU memory consumption.

Warning: Metadata about the on-disk data is cached in
`training/work/trainCache.t7` and assumes
the data directory does not change.
If your data directory changes, delete these
files so they will be regenerated.

### Stopping and starting training
Models are saved in the `work` directory after every epoch.
If the training process is killed, it can be resumed from
the last saved model with the `-retrain` option.
Also pass a different `-manualSeed` so a different image
sequence is sampled and correctly set `-epochNumber`.

## 4. Analyze training
Visualize the loss with [training/plot-loss.py](https://github.com/cmusatyalab/openface/blob/master/training/plot-loss.py).
Install the Python dependencies from
[training/requirements.txt](https://github.com/cmusatyalab/openface/blob/master/training/requirements.txt)
with `pip2 install -r requirements.txt`.
Add note to training new models. 2015-11-10 23:38:27 +08:00			`# Training new neural network models`

Mention inconsistencies between the pre-trained models and the model definitions for #351 and #349 2018-03-02 08:03:55 +08:00			`Note: Kenneth Jung noticed that the model definitions are slightly`
			`different than the pre-trained models.`
			`For more information, see issues`
			`[#351](https://github.com/cmusatyalab/openface/issues/351) and`
			`[#349](https://github.com/cmusatyalab/openface/issues/349).`

			`---`

Docs: Better distinguish between DNN and classification models. 2015-11-11 03:31:24 +08:00			`We have also released our deep neural network (DNN)`
			`training infrastructure to promote an open ecosystem and enable quicker`
			`bootstrapping for new research and development.`

			`There is a distinction between training the DNN model for feature representation`
			`and training a model for classifying people with the DNN model.`
			`If you're interested in creating a new classifier,`
			`see [Demo 3](http://cmusatyalab.github.io/openface/demo-3-classifier/).`
Docs: Mention batch-represent with the classification demo. 2015-11-12 00:00:21 +08:00			`This page is for advanced users interested in training a new DNN model`
			`and should be done with large datasets (>500k images) to improve the`
			`feature representation.`
Add note to training new models. 2015-11-10 23:38:27 +08:00
Docs: Mention batch-represent with the classification demo. 2015-11-12 00:00:21 +08:00			`Warning: Training is computationally and memory expensive and takes a`
Minor changes to docs and README. 2016-01-13 21:36:16 +08:00			`day on our Tesla K40 GPU.`
Initial commit of mkdocs for #29. 2015-11-01 20:52:46 +08:00
			`A rough overview of training is:`

			`## 1. Create raw image directory.`
			`Create a directory for your raw images so that images from different`
			`people are in different subdirectories. The names of the labels or`
			`images do not matter, and each person can have a different amount of images.`
			The images should be formatted as `jpg` or `png` and have
			`a lowercase extension.`

			```
			`$ tree data/mydataset/raw`
			`person-1`
			`├── image-1.jpg`
			`├── image-2.png`
			`...`
			`└── image-p.png`

			`...`

			`person-m`
			`├── image-1.png`
			`├── image-2.jpg`
			`...`
			`└── image-q.png`
			```


			`## 2. Preprocess the raw images`
Docs: Mention LFW overlapping name removal. 2016-04-08 05:18:57 +08:00			`If you plan to compute LFW accuracies, remove all LFW identities for your dataset.`
			`We provide an example script doing this with string matching in`
			`[remove-lfw-names.py](https://github.com/cmusatyalab/openface/blob/master/data/casia-facescrub/remove-lfw-names.py).`

Initial commit of mkdocs for #29. 2015-11-01 20:52:46 +08:00			Change `8` to however many
			`separate processes you want to run:`
Docs: innerEyesAndBottomLip -> outerEyesAndNose Thanks @kleinsound! 2016-01-16 00:01:26 +08:00			`for N in {1..8}; do ./util/align-dlib.py <path-to-raw-data> align outerEyesAndNose <path-to-aligned-data> --size 96 & done`.
Initial commit of mkdocs for #29. 2015-11-01 20:52:46 +08:00
Improve docs. Thanks @kleinsound! 2016-01-17 05:32:16 +08:00			`Prune out directories with less than 3 images per class with`
			`./util/prune-dataset.py <path-to-aligned-data> --numImagesThreshold 3`.

Update docs on training a DNN. 2016-03-07 08:54:40 +08:00			<!-- Split the dataset into `train` and `val` subdirectories -->
			<!-- with `./util/create-train-val-split.py <path-to-aligned-data> <validation-ratio>`. -->
			<!-- One option could be to have all of your data in `train` and -->
			`<!-- then validate the model with the LFW experiment. -->`
docs: Mention that no validation images can be used. 2016-01-07 06:01:52 +08:00
Initial commit of mkdocs for #29. 2015-11-01 20:52:46 +08:00			`## 3. Train the model`
Fix broken links for #29. 2015-11-01 21:09:21 +08:00			`Run [training/main.lua](https://github.com/cmusatyalab/openface/blob/master/training/main.lua) to start training the model.`
			`Edit the dataset options in [training/opts.lua](https://github.com/cmusatyalab/openface/blob/master/training/opts.lua) or`
Initial commit of mkdocs for #29. 2015-11-01 20:52:46 +08:00			`pass them as command-line parameters.`
			This will output the loss and in-progress models to `training/work`.
Improve docs. Thanks @kleinsound! 2016-01-17 05:32:16 +08:00			The GPU memory usage is determined by the `-peoplePerBatch` and
			`-imagesPerPerson` parameters, which default to 15 and 20 respectively
			`and consume about 12GB of memory.`
			`These determine an upper-bound on the mini-batch size and`
			`should be reduced for less GPU memory consumption.`
Initial commit of mkdocs for #29. 2015-11-01 20:52:46 +08:00
Training: Add note metadata caches. 2015-11-07 02:57:34 +08:00			`Warning: Metadata about the on-disk data is cached in`
Update docs on training a DNN. 2016-03-07 08:54:40 +08:00			`training/work/trainCache.t7` and assumes
Training: Add note metadata caches. 2015-11-07 02:57:34 +08:00			`the data directory does not change.`
			`If your data directory changes, delete these`
			`files so they will be regenerated.`

Docs: Mention how to restart training. 2015-11-09 21:26:33 +08:00			`### Stopping and starting training`
			Models are saved in the `work` directory after every epoch.
			`If the training process is killed, it can be resumed from`
			the last saved model with the `-retrain` option.
			Also pass a different `-manualSeed` so a different image
			sequence is sampled and correctly set `-epochNumber`.

Initial commit of mkdocs for #29. 2015-11-01 20:52:46 +08:00			`## 4. Analyze training`
Fix broken links for #29. 2015-11-01 21:09:21 +08:00			`Visualize the loss with [training/plot-loss.py](https://github.com/cmusatyalab/openface/blob/master/training/plot-loss.py).`
Mention Python dependencies for training. 2015-12-18 00:42:58 +08:00			`Install the Python dependencies from`
			`[training/requirements.txt](https://github.com/cmusatyalab/openface/blob/master/training/requirements.txt)`
			with `pip2 install -r requirements.txt`.