AdagradOptimizer

Module: pennylane

class AdagradOptimizer(stepsize=0.01, eps=1e-08)[source]

Gradient-descent optimizer with past-gradient-dependent learning rate in each dimension.

Adagrad adjusts the learning rate for each parameter \(x_i\) in \(x\) based on past gradients. We therefore have to consider each parameter update individually,

\[x^{(t+1)}_i = x^{(t)}_i - \eta_i^{(t+1)} \partial_{w_i} f(x^{(t)}),\]

where the gradient is replaced by a (scalar) partial derivative.

The learning rate in step \(t\) is given by

\[\eta_i^{(t+1)} = \frac{ \eta_{\mathrm{init}} }{ \sqrt{a_i^{(t+1)} + \epsilon } }, ~~~ a_i^{(t+1)} = \sum_{k=1}^t (\partial_{x_i} f(x^{(k)}))^2.\]

The offset \(\epsilon\) avoids division by zero.

\(\eta\) is the step size, a user defined parameter.

Parameters:
  • stepsize (float) – the user-defined hyperparameter \(\eta\)
  • eps (float) – offset \(\epsilon\) added for numerical stability
apply_grad(grad, x)[source]

Update the variables x to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.

Parameters:
  • grad (array) – The gradient of the objective function at point \(x^{(t)}\): \(\nabla f(x^{(t)})\)
  • x (array) – the current value of the variables \(x^{(t)}\)
Returns:

the new values \(x^{(t+1)}\)

Return type:

array

reset()[source]

Reset optimizer by erasing memory of past steps.