深度学习中的优化方法(SGD,Adagrad,Adadelta,Adam,Adamax,Nadam, Radam)
SGD,Adagrad,Adadelta,Adam,Adamax,Nadam: https://zhuanlan.zhihu.com/p/22252270
提供了不需要可调参数的动态warmup的Radam: https://zhuanlan.zhihu.com/p/85911013