Included Compatibility with the Nvidia's Apex library, which can do Floating Point16 calculations. This gives significant speedup in training. This code has been tested on a single RTX2070. If the Nvidia Apex library is not found the code should run as normal.
To install Apex: https://github.com/NVIDIA/apex#quick-start
Known bugs:
-Does not work with adam parameter
-Gradient overflow keeps happening at the start, however it automatically reduces loss scale to 8192 after which this notification disappears
examples:
Loading: https://i.imgur.com/3nZROJz.png
Training: https://i.imgur.com/Q2w52m7.png