docs: Add OpenBLAS execution times.
This commit is contained in:
parent
3a66bc2345
commit
338516a153
|
@ -32,6 +32,10 @@ script.
|
|||
Use the `-DUSE_AVX_INSTRUCTIONS=ON` in the first `cmake` command.
|
||||
If your architecture does not support AVX, try SSE4 or SSE2.
|
||||
|
||||
3. Make sure Torch is linking with [OpenBLAS](http://www.openblas.net/),
|
||||
instead of netlib for BLAS and LAPACK.
|
||||
From our experiments, a single neural network forward pass that
|
||||
executes in 460ms with netlib executes in 59ms with OpenBLAS.
|
||||
|
||||
## I'm getting an illegal instruction error in the pre-built Docker container.
|
||||
|
||||
|
|
|
@ -44,16 +44,15 @@ API differences between the models are:
|
|||
## Performance
|
||||
The performance is measured by averaging 500 forward passes with
|
||||
[util/profile-network.lua](https://github.com/cmusatyalab/openface/blob/master/util/profile-network.lua)
|
||||
and the following results are from an 8 core 3.70 GHz CPU
|
||||
and the following results use OpenBLAS on an 8 core 3.70 GHz CPU
|
||||
and a Tesla K40 GPU.
|
||||
|
||||
| Model | Runtime (CPU) | Runtime (GPU) |
|
||||
|---|---|---|
|
||||
| nn4.v1 | 679.75 ms ± 114.22 ms | 21.96 ms ± 6.71 ms |
|
||||
| nn4.v2 |687.27 ms ± 119.50 ms | 20.82 ms ± 6.03 ms |
|
||||
| nn4.small1.v1 | 528.33 ms ± 109.31 ms | 15.90 ms ± 5.18 ms |
|
||||
| nn4.small2.v1 | 460.89 ms ± 85.74 ms | 13.72 ms ± 4.64 ms |
|
||||
|
||||
| nn4.v1 | 75.67 ms ± 19.97 ms | 21.96 ms ± 6.71 ms |
|
||||
| nn4.v2 | 82.74 ms ± 19.96 ms | 20.82 ms ± 6.03 ms |
|
||||
| nn4.small1.v1 | 69.58 ms ± 16.17 ms | 15.90 ms ± 5.18 ms |
|
||||
| nn4.small2.v1 | 58.9 ms ± 15.36 ms | 13.72 ms ± 4.64 ms |
|
||||
|
||||
## Accuracy on the LFW Benchmark
|
||||
|
||||
|
|
Loading…
Reference in New Issue