From 4eb080dce25178b1d812368645454ba828b42f78 Mon Sep 17 00:00:00 2001 From: Brandon Amos Date: Thu, 24 Sep 2015 14:47:17 -0400 Subject: [PATCH] Initial commit of stubbed README. --- README.md | 154 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 154 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..15ca7ee --- /dev/null +++ b/README.md @@ -0,0 +1,154 @@ +# FaceNet + +This is a Python and Torch implementation of the CVPR 2015 paper +[FaceNet: A Unified Embedding for Face Recognition and Clustering](http://www.cv-foundation.org/openaccess/content_cvpr_2015/app/1A_089.pdf) +by Florian Schroff, Dmitry Kalenichenko, and James Philbin at Google +using publicly available libraries and datasets. +Torch allows CPU and CUDA + +**Crafted by [Brandon Amos](http://bamos.github.io) in the +[Elijah](http://elijah.cs.cmu.edu) research group at +Carnegie Mellon University.** + +--- + +The following example shows the workflow for a single input +image of Sylvestor Stallone from the publicly available +[LFW dataset](http://vis-www.cs.umass.edu/lfw/person/Sylvester_Stallone.html). + +1. Detect faces with a pre-trained models from + [dlib](http://blog.dlib.net/2014/02/dlib-186-released-make-your-own-object.html) + [OpenCV](http://docs.opencv.org/master/d7/d8b/tutorial_py_face_detection.html). +2. Transform the face for the neural network. + This repository uses dlib's + [real-time pose estimation](http://blog.dlib.net/2014/08/real-time-face-pose-estimation.html) + with OpenCV's + [affine transformation](http://docs.opencv.org/doc/tutorials/imgproc/imgtrans/warp_affine/warp_affine.html) + to try to make the eyes and nose appear in + the same location on each image. +3. Use a deep neural network to represent (or embed) the face on + a 128-dimensional hypersphere. + The embedding is a generic representation for anybody's face. + Unlike other face representations, this embedding has the nice property + that a larger distance between two face embeddings means + that the faces are likely not of the same person. + This trivializes clustering, similarity detection, + and classification tasks. + +![](./images/summary.jpg) + +# Help Wanted! + +As the following table shows, the forefront of deep learning research +is driven by large private datasets. +In face recognition, there are no open source implementations or +models trained on these datasets. +If you have access to a large dataset, we are very interested +in training a new FaceNet model with it. +Please contact Brandon Amos at [bamos@cs.cmu.edu](mailto:bamos@cs.cmu.edu). + +| Dataset | Public | #Photos | #People | +|---|---|---|---| +| [DeepFace](https://research.facebook.com/publications/480567225376225/deepface-closing-the-gap-to-human-level-performance-in-face-verification/) (Facebook) | No | 4.4 Million | 4k | +| [Web-Scale Training...](http://arxiv.org/abs/1406.5266) (Facebook) | No | 500 Million | 10 Million | +| FaceNet (Google) | No | 100-200 Million | 8 Million | +| [FaceScrub](http://vintage.winklerbros.net/facescrub.html) | Yes | 100k | 500 | +| [CASIA-WebFace](http://arxiv.org/abs/1411.7923) | Yes | 500k | 10k | + +# Real-Time Web Demo +See [our YouTube video](TODO) of using this in a real-time web application +for face recognition. +The source is available in [demos/www](/demos/www). + +TODO: Screenshot + +# Cool demo, but I want numbers. What's the accuracy? +Even though the public datasets we trained on have orders of magnitude less data +than private industry datasets, the accuracy is remarkably high and +outperforms all other open-source face recognition implementations we +are aware of on the standard +[LFW](http://vis-www.cs.umass.edu/lfw/results.html) +benchmark. +We had to fallback to using the deep funneled versions for +152 of 13233 images because dlib failed to detect a face or landmarks. + +TODO: ROC Curve + +This can be generated with the following commands from the root `facenet` +directory, assuming you have downloaded and placed the raw and +deep funneled lfw data from [here](http://http://vis-www.cs.umass.edu/lfw/) +in `./data/lfw/raw` and `./data/lfw/deepfunneled`. + +1. Install prerequisites as below. +2. Preprocess the raw `lfw` images, change `8` to however many + separate processes you want to run: + `for N in {1..8}; do ./util/align-dlib.py data/lfw/raw align affine data/lfw/dlib-affine-sz:96 --size 96 &; done`. + Fallback to deep funneled versions for images that dlib failed + to align: + `./util/align-dlib.py data/lfw/raw align affine data/lfw/dlib-affine-sz:96 --size 96 --fallbackLfw data/lfw/deepfunneled` +3. Generate representations with `./batch-represent/main.lua -outDir evaluation/lfw.nn4.v1.reps -model models/facenet/nn4.v1.t7 -data data/lfw/dlib-affine-sz:96` +4. Generate the ROC curve from the `evaluation` directory with `./lfw-roc.py --workDir lfw.nn4.v1.reps`. + This creates `roc.pdf` in the `lfw.nn4.v1.reps` directory. + +# Setup + +## Check out git submodules +Clone with `--recursive` or run `git submodule init && git submodule update` +after checking out. + +## Download the models +Run `./models/download_models.sh` to download pre-trained FaceNet +models on the combined CASIA-WebFace and FaceScrub database. +This also downloads dlib's pre-trained model for face landmark detection. + +## With Docker +TODO + +This repo can be deployed as a container with [Docker](https://www.docker.com/) +for CPU mode: + +``` +./models/download_models.sh +sudo docker build -t facenet . +sudo docker run -t -i -v $PWD:/facenet facenet /bin/bash +cd /facenet +TODO +``` + +To use, place your images in `facenet` on your host and +access them from the shared Docker directory. + +## By hand +TODO + +Dependencies: ++ [torch7](https://github.com/torch/torch7) ++ [dpnn](https://github.com/nicholas-leonard/dpnn) ++ TODO + +Optional dependencies: ++ CUDA 6.5+ ++ [cudnn.torch](https://github.com/soumith/cudnn.torch) + + +# Usage +## Existing Models +TODO + +# Training new models +TODO +[Atcold/torch-TripletEmbedding](https://github.com/Atcold/torch-TripletEmbedding) + +# Licensing +This source is copyright Carnegie Mellon University +and licensed under the [Apache 2.0 License](./LICENSE). +Portions from the following third party sources have +been modified and are included in this repository. +These portions are noted in the source files and are +copyright their respective authors with +the licenses listed. + +Project | Modified | License +---|---|---| +[Atcold/torch-TripletEmbedding](https://github.com/Atcold/torch-TripletEmbedding) | No | MIT +[facebook/fbnn](https://github.com/facebook/fbnn) | Yes | BSD