docs: Add OpenBLAS execution times.

This commit is contained in:
Brandon Amos 2016-03-10 14:13:10 -05:00
parent 3a66bc2345
commit 338516a153
2 changed files with 9 additions and 6 deletions

View File

@ -32,6 +32,10 @@ script.
Use the `-DUSE_AVX_INSTRUCTIONS=ON` in the first `cmake` command.
If your architecture does not support AVX, try SSE4 or SSE2.
3. Make sure Torch is linking with [OpenBLAS](http://www.openblas.net/),
instead of netlib for BLAS and LAPACK.
From our experiments, a single neural network forward pass that
executes in 460ms with netlib executes in 59ms with OpenBLAS.
## I'm getting an illegal instruction error in the pre-built Docker container.

View File

@ -44,16 +44,15 @@ API differences between the models are:
## Performance
The performance is measured by averaging 500 forward passes with
[util/profile-network.lua](https://github.com/cmusatyalab/openface/blob/master/util/profile-network.lua)
and the following results are from an 8 core 3.70 GHz CPU
and the following results use OpenBLAS on an 8 core 3.70 GHz CPU
and a Tesla K40 GPU.
| Model | Runtime (CPU) | Runtime (GPU) |
|---|---|---|
| nn4.v1 | 679.75 ms ± 114.22 ms | 21.96 ms ± 6.71 ms |
| nn4.v2 |687.27 ms ± 119.50 ms | 20.82 ms ± 6.03 ms |
| nn4.small1.v1 | 528.33 ms ± 109.31 ms | 15.90 ms ± 5.18 ms |
| nn4.small2.v1 | 460.89 ms ± 85.74 ms | 13.72 ms ± 4.64 ms |
| nn4.v1 | 75.67 ms ± 19.97 ms | 21.96 ms ± 6.71 ms |
| nn4.v2 | 82.74 ms ± 19.96 ms | 20.82 ms ± 6.03 ms |
| nn4.small1.v1 | 69.58 ms ± 16.17 ms | 15.90 ms ± 5.18 ms |
| nn4.small2.v1 | 58.9 ms ± 15.36 ms | 13.72 ms ± 4.64 ms |
## Accuracy on the LFW Benchmark