The most widely used optimization method in machine learning field is undoubtedly the Stochastic Gradient Descent (SGD) method. It is extremely straight forward and easy to implement in optimize neural networks. However, how to determine parameters of SGD can have a large impact on its convergence rate. In order to make SGD more robust and efficient, researchers have tried various ways such as momentum, weight decay and others.
Sequential Quadratic Programming