Module: pennylane

class AdamOptimizer(stepsize=0.01, beta1=0.9, beta2=0.99, eps=1e-08)[source]

Gradient-descent optimizer with adaptive learning rate, first and second moment.

Adaptive Moment Estimation uses a step-dependent learning rate, a first moment \(a\) and a second moment \(b\), reminiscent of the momentum and velocity of a particle:

\[x^{(t+1)} = x^{(t)} - \eta^{(t+1)} \frac{a^{(t+1)}}{\sqrt{b^{(t+1)}} + \epsilon },\]

where the update rules for the three values are given by

\[\begin{split}a^{(t+1)} &= \frac{\beta_1 a^{(t)} + (1-\beta_1)\nabla f(x^{(t)})}{(1- \beta_1)},\\ b^{(t+1)} &= \frac{\beta_2 b^{(t)} + (1-\beta_2) ( \nabla f(x^{(t)}))^{\odot 2} }{(1- \beta_2)},\\ \eta^{(t+1)} &= \eta^{(t)} \frac{\sqrt{(1-\beta_2)}}{(1-\beta_1)}.\end{split}\]

Above, \(( \nabla f(x^{(t-1)}))^{\odot 2}\) denotes the element-wise square operation, which means that each element in the gradient is multiplied by itself. The hyperparameters \(\beta_1\) and \(\beta_2\) can also be step-dependent. Initially, the first and second moment are zero.

The shift \(\epsilon\) avoids division by zero.

For more details, see [kingma2014adam].

  • stepsize (float) – the user-defined hyperparameter \(\eta\)
  • beta1 (float) – hyperparameter governing the update of the first and second moment
  • beta2 (float) – hyperparameter governing the update of the first and second moment
  • eps (float) – offset \(\epsilon\) added for numerical stability
apply_grad(grad, x)[source]

Update the variables x to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.

  • grad (array) – The gradient of the objective function at point \(x^{(t)}\): \(\nabla f(x^{(t)})\)
  • x (array) – the current value of the variables \(x^{(t)}\)

the new values \(x^{(t+1)}\)

Return type:



Reset optimizer by erasing memory of past steps.