Module: pennylane

Adaptive Moment Estimation uses a step-dependent learning rate, a first moment $$a$$ and a second moment $$b$$, reminiscent of the momentum and velocity of a particle:

$x^{(t+1)} = x^{(t)} - \eta^{(t+1)} \frac{a^{(t+1)}}{\sqrt{b^{(t+1)}} + \epsilon },$

where the update rules for the three values are given by

$\begin{split}a^{(t+1)} &= \frac{\beta_1 a^{(t)} + (1-\beta_1)\nabla f(x^{(t)})}{(1- \beta_1)},\\ b^{(t+1)} &= \frac{\beta_2 b^{(t)} + (1-\beta_2) ( \nabla f(x^{(t)}))^{\odot 2} }{(1- \beta_2)},\\ \eta^{(t+1)} &= \eta^{(t)} \frac{\sqrt{(1-\beta_2)}}{(1-\beta_1)}.\end{split}$

Above, $$( \nabla f(x^{(t-1)}))^{\odot 2}$$ denotes the element-wise square operation, which means that each element in the gradient is multiplied by itself. The hyperparameters $$\beta_1$$ and $$\beta_2$$ can also be step-dependent. Initially, the first and second moment are zero.

The shift $$\epsilon$$ avoids division by zero.

Parameters: stepsize (float) – the user-defined hyperparameter $$\eta$$ beta1 (float) – hyperparameter governing the update of the first and second moment beta2 (float) – hyperparameter governing the update of the first and second moment eps (float) – offset $$\epsilon$$ added for numerical stability
Parameters: grad (array) – The gradient of the objective function at point $$x^{(t)}$$: $$\nabla f(x^{(t)})$$ x (array) – the current value of the variables $$x^{(t)}$$ the new values $$x^{(t+1)}$$ array