openface/docs/training-new-models.md

# Training new neural network models

We have also released our deep neural network (DNN)
training infrastructure to promote an open ecosystem and enable quicker
bootstrapping for new research and development.

There is a distinction between training the DNN model for feature representation
and training a model for classifying people with the DNN model.
If you're interested in creating a new classifier,
see [Demo 3](http://cmusatyalab.github.io/openface/demo-3-classifier/).
This page is for advanced users interested in training a new DNN model
and should be done with large datasets (>500k images) to improve the
feature representation.

*Warning:* Training is computationally and memory expensive and takes a
day on our Tesla K40 GPU.

A rough overview of training is:

## 1. Create raw image directory.
Create a directory for your raw images so that images from different
people are in different subdirectories. The names of the labels or
images do not matter, and each person can have a different amount of images.
The images should be formatted as `jpg` or `png` and have
a lowercase extension.

```
$ tree data/mydataset/raw
person-1
├── image-1.jpg
├── image-2.png
...
└── image-p.png

...

person-m
├── image-1.png
├── image-2.jpg
...
└── image-q.png
```


## 2. Preprocess the raw images
If you plan to compute LFW accuracies, remove all LFW identities for your dataset.
We provide an example script doing this with string matching in
[remove-lfw-names.py](https://github.com/cmusatyalab/openface/blob/master/data/casia-facescrub/remove-lfw-names.py).

Change `8` to however many
separate processes you want to run:
`for N in {1..8}; do ./util/align-dlib.py <path-to-raw-data> align outerEyesAndNose <path-to-aligned-data> --size 96 & done`.

Prune out directories with less than 3 images per class with
`./util/prune-dataset.py <path-to-aligned-data> --numImagesThreshold 3`.

<!-- Split the dataset into `train` and `val` subdirectories -->
<!-- with `./util/create-train-val-split.py <path-to-aligned-data> <validation-ratio>`. -->
<!-- One option could be to have all of your data in `train` and -->
<!-- then validate the model with the LFW experiment. -->

## 3. Train the model
Run [training/main.lua](https://github.com/cmusatyalab/openface/blob/master/training/main.lua) to start training the model.
Edit the dataset options in [training/opts.lua](https://github.com/cmusatyalab/openface/blob/master/training/opts.lua) or
pass them as command-line parameters.
This will output the loss and in-progress models to `training/work`.
The GPU memory usage is determined by the `-peoplePerBatch` and
`-imagesPerPerson` parameters, which default to 15 and 20 respectively
and consume about 12GB of memory.
These determine an upper-bound on the mini-batch size and
should be reduced for less GPU memory consumption.

Warning: Metadata about the on-disk data is cached in
`training/work/trainCache.t7` and assumes
the data directory does not change.
If your data directory changes, delete these
files so they will be regenerated.

### Stopping and starting training
Models are saved in the `work` directory after every epoch.
If the training process is killed, it can be resumed from
the last saved model with the `-retrain` option.
Also pass a different `-manualSeed` so a different image
sequence is sampled and correctly set `-epochNumber`.

## 4. Analyze training
Visualize the loss with [training/plot-loss.py](https://github.com/cmusatyalab/openface/blob/master/training/plot-loss.py).
Install the Python dependencies from
[training/requirements.txt](https://github.com/cmusatyalab/openface/blob/master/training/requirements.txt)
with `pip2 install -r requirements.txt`.
Add note to training new models. 2015-11-10 23:38:27 +08:00			`# Training new neural network models`

Docs: Better distinguish between DNN and classification models. 2015-11-11 03:31:24 +08:00			`We have also released our deep neural network (DNN)`
			`training infrastructure to promote an open ecosystem and enable quicker`
			`bootstrapping for new research and development.`

			`There is a distinction between training the DNN model for feature representation`
			`and training a model for classifying people with the DNN model.`
			`If you're interested in creating a new classifier,`
			`see [Demo 3](http://cmusatyalab.github.io/openface/demo-3-classifier/).`
Docs: Mention batch-represent with the classification demo. 2015-11-12 00:00:21 +08:00			`This page is for advanced users interested in training a new DNN model`
			`and should be done with large datasets (>500k images) to improve the`
			`feature representation.`
Add note to training new models. 2015-11-10 23:38:27 +08:00
Docs: Mention batch-represent with the classification demo. 2015-11-12 00:00:21 +08:00			`Warning: Training is computationally and memory expensive and takes a`
Minor changes to docs and README. 2016-01-13 21:36:16 +08:00			`day on our Tesla K40 GPU.`
Initial commit of mkdocs for #29. 2015-11-01 20:52:46 +08:00
			`A rough overview of training is:`

			`## 1. Create raw image directory.`
			`Create a directory for your raw images so that images from different`
			`people are in different subdirectories. The names of the labels or`
			`images do not matter, and each person can have a different amount of images.`
			The images should be formatted as `jpg` or `png` and have
			`a lowercase extension.`

			```
			`$ tree data/mydataset/raw`
			`person-1`
			`├── image-1.jpg`
			`├── image-2.png`
			`...`
			`└── image-p.png`

			`...`

			`person-m`
			`├── image-1.png`
			`├── image-2.jpg`
			`...`
			`└── image-q.png`
			```


			`## 2. Preprocess the raw images`
Docs: Mention LFW overlapping name removal. 2016-04-08 05:18:57 +08:00			`If you plan to compute LFW accuracies, remove all LFW identities for your dataset.`
			`We provide an example script doing this with string matching in`
			`[remove-lfw-names.py](https://github.com/cmusatyalab/openface/blob/master/data/casia-facescrub/remove-lfw-names.py).`

Initial commit of mkdocs for #29. 2015-11-01 20:52:46 +08:00			Change `8` to however many
			`separate processes you want to run:`
Docs: innerEyesAndBottomLip -> outerEyesAndNose Thanks @kleinsound! 2016-01-16 00:01:26 +08:00			`for N in {1..8}; do ./util/align-dlib.py <path-to-raw-data> align outerEyesAndNose <path-to-aligned-data> --size 96 & done`.
Initial commit of mkdocs for #29. 2015-11-01 20:52:46 +08:00
Improve docs. Thanks @kleinsound! 2016-01-17 05:32:16 +08:00			`Prune out directories with less than 3 images per class with`
			`./util/prune-dataset.py <path-to-aligned-data> --numImagesThreshold 3`.

Update docs on training a DNN. 2016-03-07 08:54:40 +08:00			<!-- Split the dataset into `train` and `val` subdirectories -->
			<!-- with `./util/create-train-val-split.py <path-to-aligned-data> <validation-ratio>`. -->
			<!-- One option could be to have all of your data in `train` and -->
			`<!-- then validate the model with the LFW experiment. -->`
docs: Mention that no validation images can be used. 2016-01-07 06:01:52 +08:00
Initial commit of mkdocs for #29. 2015-11-01 20:52:46 +08:00			`## 3. Train the model`
Fix broken links for #29. 2015-11-01 21:09:21 +08:00			`Run [training/main.lua](https://github.com/cmusatyalab/openface/blob/master/training/main.lua) to start training the model.`
			`Edit the dataset options in [training/opts.lua](https://github.com/cmusatyalab/openface/blob/master/training/opts.lua) or`
Initial commit of mkdocs for #29. 2015-11-01 20:52:46 +08:00			`pass them as command-line parameters.`
			This will output the loss and in-progress models to `training/work`.
Improve docs. Thanks @kleinsound! 2016-01-17 05:32:16 +08:00			The GPU memory usage is determined by the `-peoplePerBatch` and
			`-imagesPerPerson` parameters, which default to 15 and 20 respectively
			`and consume about 12GB of memory.`
			`These determine an upper-bound on the mini-batch size and`
			`should be reduced for less GPU memory consumption.`
Initial commit of mkdocs for #29. 2015-11-01 20:52:46 +08:00
Training: Add note metadata caches. 2015-11-07 02:57:34 +08:00			`Warning: Metadata about the on-disk data is cached in`
Update docs on training a DNN. 2016-03-07 08:54:40 +08:00			`training/work/trainCache.t7` and assumes
Training: Add note metadata caches. 2015-11-07 02:57:34 +08:00			`the data directory does not change.`
			`If your data directory changes, delete these`
			`files so they will be regenerated.`

Docs: Mention how to restart training. 2015-11-09 21:26:33 +08:00			`### Stopping and starting training`
			Models are saved in the `work` directory after every epoch.
			`If the training process is killed, it can be resumed from`
			the last saved model with the `-retrain` option.
			Also pass a different `-manualSeed` so a different image
			sequence is sampled and correctly set `-epochNumber`.

Initial commit of mkdocs for #29. 2015-11-01 20:52:46 +08:00			`## 4. Analyze training`
Fix broken links for #29. 2015-11-01 21:09:21 +08:00			`Visualize the loss with [training/plot-loss.py](https://github.com/cmusatyalab/openface/blob/master/training/plot-loss.py).`
Mention Python dependencies for training. 2015-12-18 00:42:58 +08:00			`Install the Python dependencies from`
			`[training/requirements.txt](https://github.com/cmusatyalab/openface/blob/master/training/requirements.txt)`
			with `pip2 install -r requirements.txt`.