class AdamOptimizer(stepsize=0.01, beta1=0.9, beta2=0.99, eps=1e-08)[source]

Bases: pennylane.optimize.gradient_descent.GradientDescentOptimizer

Adaptive Moment Estimation uses a step-dependent learning rate, a first moment $$a$$ and a second moment $$b$$, reminiscent of the momentum and velocity of a particle:

$x^{(t+1)} = x^{(t)} - \eta^{(t+1)} \frac{a^{(t+1)}}{\sqrt{b^{(t+1)}} + \epsilon },$

where the update rules for the three values are given by

$\begin{split}a^{(t+1)} &= \frac{\beta_1 a^{(t)} + (1-\beta_1)\nabla f(x^{(t)})}{(1- \beta_1)},\\ b^{(t+1)} &= \frac{\beta_2 b^{(t)} + (1-\beta_2) ( \nabla f(x^{(t)}))^{\odot 2} }{(1- \beta_2)},\\ \eta^{(t+1)} &= \eta^{(t)} \frac{\sqrt{(1-\beta_2)}}{(1-\beta_1)}.\end{split}$

Above, $$( \nabla f(x^{(t-1)}))^{\odot 2}$$ denotes the element-wise square operation, which means that each element in the gradient is multiplied by itself. The hyperparameters $$\beta_1$$ and $$\beta_2$$ can also be step-dependent. Initially, the first and second moment are zero.

The shift $$\epsilon$$ avoids division by zero.

For more details, see arXiv:1412.6980.

Parameters
• stepsize (float) – the user-defined hyperparameter $$\eta$$

• beta1 (float) – hyperparameter governing the update of the first and second moment

• beta2 (float) – hyperparameter governing the update of the first and second moment

• eps (float) – offset $$\epsilon$$ added for numerical stability

 apply_grad(grad, x) Update the variables x to take a single optimization step. compute_grad(objective_fn, x[, grad_fn]) Compute gradient of the objective_fn at the point x. Reset optimizer by erasing memory of past steps. step(objective_fn, x[, grad_fn]) Update x with one step of the optimizer. update_stepsize(stepsize) Update the initialized stepsize value $$\eta$$.
apply_grad(grad, x)[source]

Update the variables x to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.

Parameters
• grad (array) – The gradient of the objective function at point $$x^{(t)}$$: $$\nabla f(x^{(t)})$$

• x (array) – the current value of the variables $$x^{(t)}$$

Returns

the new values $$x^{(t+1)}$$

Return type

array

static compute_grad(objective_fn, x, grad_fn=None)

Compute gradient of the objective_fn at the point x.

Parameters
• objective_fn (function) – the objective function for optimization

• x (array) – NumPy array containing the current values of the variables to be updated

• grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.

Returns

NumPy array containing the gradient $$\nabla f(x^{(t)})$$

Return type

array

reset()[source]

Reset optimizer by erasing memory of past steps.

step(objective_fn, x, grad_fn=None)

Update x with one step of the optimizer.

Parameters
• objective_fn (function) – the objective function for optimization

• x (array) – NumPy array containing the current values of the variables to be updated

• grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.

Returns

the new variable values $$x^{(t+1)}$$

Return type

array

update_stepsize(stepsize)

Update the initialized stepsize value $$\eta$$.

This allows for techniques such as learning rate scheduling.

Parameters

stepsize (float) – the user-defined hyperparameter $$\eta$$