Module: pennylane

class AdagradOptimizer(stepsize=0.01, eps=1e-08)[source]

Adagrad adjusts the learning rate for each parameter $$x_i$$ in $$x$$ based on past gradients. We therefore have to consider each parameter update individually,

$x^{(t+1)}_i = x^{(t)}_i - \eta_i^{(t+1)} \partial_{w_i} f(x^{(t)}),$

where the gradient is replaced by a (scalar) partial derivative.

The learning rate in step $$t$$ is given by

$\eta_i^{(t+1)} = \frac{ \eta_{\mathrm{init}} }{ \sqrt{a_i^{(t+1)} + \epsilon } }, ~~~ a_i^{(t+1)} = \sum_{k=1}^t (\partial_{x_i} f(x^{(k)}))^2.$

The offset $$\epsilon$$ avoids division by zero.

$$\eta$$ is the step size, a user defined parameter.

Parameters: stepsize (float) – the user-defined hyperparameter $$\eta$$ eps (float) – offset $$\epsilon$$ added for numerical stability
apply_grad(grad, x)[source]

Update the variables x to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.

Parameters: grad (array) – The gradient of the objective function at point $$x^{(t)}$$: $$\nabla f(x^{(t)})$$ x (array) – the current value of the variables $$x^{(t)}$$ the new values $$x^{(t+1)}$$ array
reset()[source]

Reset optimizer by erasing memory of past steps.