# Optimization methods¶

Module name: pennylane.optimize

Submodule containing PennyLane optimizers.

Warning

The built-in optimizers only support the default NumPy-interfacing QNode.

If using the PennyLane PyTorch or the PennyLane TensorFlow interfaces, PyTorch optimizers and TensorFlow optimizers (available in tf.train) should be used respectively.

In PennyLane, an optimizer is a procedure that executes one weight update step along (some function of) the negative gradient of the cost. This update depends in general on:

• The function $$f(x)$$, from which we calculate a gradient $$\nabla f(x)$$. If $$x$$ is a vector, the gradient is also a vector whose entries are the partial derivatives of $$f$$ with respect to the elements of $$x$$.
• the current weights $$x$$
• the (initial) step size $$\eta$$

The different optimizers can also depend on additional hyperparameters.

In the following, recursive definitions assume that $$x^{(0)}$$ is some initial value in the optimization landscape, and all other step-dependent values are initialized to zero at $$t=0$$.

## Available optimizers¶

 AdagradOptimizer([stepsize, eps]) Gradient-descent optimizer with past-gradient-dependent learning rate in each dimension. AdamOptimizer([stepsize, beta1, beta2, eps]) Gradient-descent optimizer with adaptive learning rate, first and second moment. GradientDescentOptimizer([stepsize]) Basic gradient-descent optimizer. MomentumOptimizer([stepsize, momentum]) Gradient-descent optimizer with momentum. NesterovMomentumOptimizer([stepsize, momentum]) Gradient-descent optimizer with Nesterov momentum. RMSPropOptimizer([stepsize, decay, eps]) Root mean squared propagation optimizer.

### Code details¶

class AdagradOptimizer(stepsize=0.01, eps=1e-08)[source]

Adagrad adjusts the learning rate for each parameter $$x_i$$ in $$x$$ based on past gradients. We therefore have to consider each parameter update individually,

$x^{(t+1)}_i = x^{(t)}_i - \eta_i^{(t+1)} \partial_{w_i} f(x^{(t)}),$

where the gradient is replaced by a (scalar) partial derivative.

The learning rate in step $$t$$ is given by

$\eta_i^{(t+1)} = \frac{ \eta_{\mathrm{init}} }{ \sqrt{a_i^{(t+1)} + \epsilon } }, ~~~ a_i^{(t+1)} = \sum_{k=1}^t (\partial_{x_i} f(x^{(k)}))^2.$

The offset $$\epsilon$$ avoids division by zero.

$$\eta$$ is the step size, a user defined parameter.

Parameters: stepsize (float) – the user-defined hyperparameter $$\eta$$ eps (float) – offset $$\epsilon$$ added for numerical stability
apply_grad(grad, x)[source]

Update the variables x to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.

Parameters: grad (array) – The gradient of the objective function at point $$x^{(t)}$$: $$\nabla f(x^{(t)})$$ x (array) – the current value of the variables $$x^{(t)}$$ the new values $$x^{(t+1)}$$ array
reset()[source]

Reset optimizer by erasing memory of past steps.

static compute_grad(objective_fn, x, grad_fn=None)

Compute gradient of the objective_fn at the point x.

Parameters: objective_fn (function) – the objective function for optimization x (array) – NumPy array containing the current values of the variables to be updated grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically. NumPy array containing the gradient $$\nabla f(x^{(t)})$$ array
step(objective_fn, x, grad_fn=None)

Update x with one step of the optimizer.

Parameters: objective_fn (function) – the objective function for optimization x (array) – NumPy array containing the current values of the variables to be updated grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically. the new variable values $$x^{(t+1)}$$ array
update_stepsize(stepsize)

Update the initialized stepsize value $$\eta$$.

This allows for techniques such as learning rate scheduling.

Parameters: stepsize (float) – the user-defined hyperparameter $$\eta$$
class AdamOptimizer(stepsize=0.01, beta1=0.9, beta2=0.99, eps=1e-08)[source]

Adaptive Moment Estimation uses a step-dependent learning rate, a first moment $$a$$ and a second moment $$b$$, reminiscent of the momentum and velocity of a particle:

$x^{(t+1)} = x^{(t)} - \eta^{(t+1)} \frac{a^{(t+1)}}{\sqrt{b^{(t+1)}} + \epsilon },$

where the update rules for the three values are given by

$\begin{split}a^{(t+1)} &= \frac{\beta_1 a^{(t)} + (1-\beta_1)\nabla f(x^{(t)})}{(1- \beta_1)},\\ b^{(t+1)} &= \frac{\beta_2 b^{(t)} + (1-\beta_2) ( \nabla f(x^{(t)}))^{\odot 2} }{(1- \beta_2)},\\ \eta^{(t+1)} &= \eta^{(t)} \frac{\sqrt{(1-\beta_2)}}{(1-\beta_1)}.\end{split}$

Above, $$( \nabla f(x^{(t-1)}))^{\odot 2}$$ denotes the element-wise square operation, which means that each element in the gradient is multiplied by itself. The hyperparameters $$\beta_1$$ and $$\beta_2$$ can also be step-dependent. Initially, the first and second moment are zero.

The shift $$\epsilon$$ avoids division by zero.

For more details, see [R1].

Parameters: stepsize (float) – the user-defined hyperparameter $$\eta$$ beta1 (float) – hyperparameter governing the update of the first and second moment beta2 (float) – hyperparameter governing the update of the first and second moment eps (float) – offset $$\epsilon$$ added for numerical stability
apply_grad(grad, x)[source]

Update the variables x to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.

Parameters: grad (array) – The gradient of the objective function at point $$x^{(t)}$$: $$\nabla f(x^{(t)})$$ x (array) – the current value of the variables $$x^{(t)}$$ the new values $$x^{(t+1)}$$ array
reset()[source]

Reset optimizer by erasing memory of past steps.

static compute_grad(objective_fn, x, grad_fn=None)

Compute gradient of the objective_fn at the point x.

Parameters: objective_fn (function) – the objective function for optimization x (array) – NumPy array containing the current values of the variables to be updated grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically. NumPy array containing the gradient $$\nabla f(x^{(t)})$$ array
step(objective_fn, x, grad_fn=None)

Update x with one step of the optimizer.

Parameters: objective_fn (function) – the objective function for optimization x (array) – NumPy array containing the current values of the variables to be updated grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically. the new variable values $$x^{(t+1)}$$ array
update_stepsize(stepsize)

Update the initialized stepsize value $$\eta$$.

This allows for techniques such as learning rate scheduling.

Parameters: stepsize (float) – the user-defined hyperparameter $$\eta$$
class GradientDescentOptimizer(stepsize=0.01)[source]

Base class for other gradient-descent-based optimizers.

A step of the gradient descent optimizer computes the new values via the rule

$x^{(t+1)} = x^{(t)} - \eta \nabla f(x^{(t)}).$

where $$\eta$$ is a user-defined hyperparameter corresponding to step size.

Parameters: stepsize (float) – the user-defined hyperparameter $$\eta$$
update_stepsize(stepsize)[source]

Update the initialized stepsize value $$\eta$$.

This allows for techniques such as learning rate scheduling.

Parameters: stepsize (float) – the user-defined hyperparameter $$\eta$$
step(objective_fn, x, grad_fn=None)[source]

Update x with one step of the optimizer.

Parameters: objective_fn (function) – the objective function for optimization x (array) – NumPy array containing the current values of the variables to be updated grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically. the new variable values $$x^{(t+1)}$$ array
static compute_grad(objective_fn, x, grad_fn=None)[source]

Compute gradient of the objective_fn at the point x.

Parameters: objective_fn (function) – the objective function for optimization x (array) – NumPy array containing the current values of the variables to be updated grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically. NumPy array containing the gradient $$\nabla f(x^{(t)})$$ array
apply_grad(grad, x)[source]

Update the variables x to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.

Parameters: grad (array) – The gradient of the objective function at point $$x^{(t)}$$: $$\nabla f(x^{(t)})$$ x (array) – the current value of the variables $$x^{(t)}$$ the new values $$x^{(t+1)}$$ array
class MomentumOptimizer(stepsize=0.01, momentum=0.9)[source]

$x^{(t+1)} = x^{(t)} - a^{(t+1)}.$

The accumulator term $$a$$ is updated as follows:

$a^{(t+1)} = m a^{(t)} + \eta \nabla f(x^{(t)}),$

with user defined parameters:

• $$\eta$$: the step size
• $$m$$: the momentum
Parameters: stepsize (float) – user-defined hyperparameter $$\eta$$ momentum (float) – user-defined hyperparameter $$m$$
apply_grad(grad, x)[source]

Update the variables x to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.

Parameters: grad (array) – The gradient of the objective function at point $$x^{(t)}$$: $$\nabla f(x^{(t)})$$ x (array) – the current value of the variables $$x^{(t)}$$ the new values $$x^{(t+1)}$$ array
reset()[source]

Reset optimizer by erasing memory of past steps.

static compute_grad(objective_fn, x, grad_fn=None)

Compute gradient of the objective_fn at the point x.

Parameters: objective_fn (function) – the objective function for optimization x (array) – NumPy array containing the current values of the variables to be updated grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically. NumPy array containing the gradient $$\nabla f(x^{(t)})$$ array
step(objective_fn, x, grad_fn=None)

Update x with one step of the optimizer.

Parameters: objective_fn (function) – the objective function for optimization x (array) – NumPy array containing the current values of the variables to be updated grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically. the new variable values $$x^{(t+1)}$$ array
update_stepsize(stepsize)

Update the initialized stepsize value $$\eta$$.

This allows for techniques such as learning rate scheduling.

Parameters: stepsize (float) – the user-defined hyperparameter $$\eta$$
class NesterovMomentumOptimizer(stepsize=0.01, momentum=0.9)[source]

Nesterov Momentum works like the Momentum optimizer, but shifts the current input by the momentum term when computing the gradient of the objective function:

$a^{(t+1)} = m a^{(t)} + \eta \nabla f(x^{(t)} - m a^{(t)}).$

The user defined parameters are:

• $$\eta$$: the step size
• $$m$$: the momentum
Parameters: stepsize (float) – user-defined hyperparameter $$\eta$$ momentum (float) – user-defined hyperparameter $$m$$
compute_grad(objective_fn, x, grad_fn=None)[source]

Compute gradient of the objective_fn at at the shifted point $$(x - m\times\text{accumulation})$$.

Parameters: objective_fn (function) – the objective function for optimization x (array) – NumPy array containing the current values of the variables to be updated grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically. NumPy array containing the gradient $$\nabla f(x^{(t)})$$ array
apply_grad(grad, x)

Update the variables x to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.

Parameters: grad (array) – The gradient of the objective function at point $$x^{(t)}$$: $$\nabla f(x^{(t)})$$ x (array) – the current value of the variables $$x^{(t)}$$ the new values $$x^{(t+1)}$$ array
reset()

Reset optimizer by erasing memory of past steps.

step(objective_fn, x, grad_fn=None)

Update x with one step of the optimizer.

Parameters: objective_fn (function) – the objective function for optimization x (array) – NumPy array containing the current values of the variables to be updated grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically. the new variable values $$x^{(t+1)}$$ array
update_stepsize(stepsize)

Update the initialized stepsize value $$\eta$$.

This allows for techniques such as learning rate scheduling.

Parameters: stepsize (float) – the user-defined hyperparameter $$\eta$$
class RMSPropOptimizer(stepsize=0.01, decay=0.9, eps=1e-08)[source]

Root mean squared propagation optimizer.

The root mean square progation optimizer is a modified Adagrad optimizer, with a decay of learning rate adaptation.

Extensions of the Adagrad optimization method generally start the sum $$a$$ over past gradients in the denominator of the learning rate at a finite $$t'$$ with $$0 < t' < t$$, or decay past gradients to avoid an ever-decreasing learning rate.

Root Mean Square propagation is such an adaptation, where

$a_i^{(t+1)} = \gamma a_i^{(t)} + (1-\gamma) (\partial_{x_i} f(x^{(t)}))^2.$
Parameters: stepsize (float) – the user-defined hyperparameter $$\eta$$ used in the Adagrad optmization decay (float) – the learning rate decay $$\gamma$$ eps (float) – offset $$\epsilon$$ added for numerical stability (see Adagrad)
apply_grad(grad, x)[source]

Update the variables x to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.

Parameters: grad (array) – The gradient of the objective function at point $$x^{(t)}$$: $$\nabla f(x^{(t)})$$ x (array) – the current value of the variables $$x^{(t)}$$ the new values $$x^{(t+1)}$$ array
static compute_grad(objective_fn, x, grad_fn=None)

Compute gradient of the objective_fn at the point x.

Parameters: objective_fn (function) – the objective function for optimization x (array) – NumPy array containing the current values of the variables to be updated grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically. NumPy array containing the gradient $$\nabla f(x^{(t)})$$ array
reset()

Reset optimizer by erasing memory of past steps.

step(objective_fn, x, grad_fn=None)

Update x with one step of the optimizer.

Parameters: objective_fn (function) – the objective function for optimization x (array) – NumPy array containing the current values of the variables to be updated grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically. the new variable values $$x^{(t+1)}$$ array
update_stepsize(stepsize)

Update the initialized stepsize value $$\eta$$.

This allows for techniques such as learning rate scheduling.

Parameters: stepsize (float) – the user-defined hyperparameter $$\eta$$