Optimization methods

Module name: pennylane.optimize

Submodule containing PennyLane optimizers.

Warning

The built-in optimizers only support the default NumPy-interfacing QNode.

If using the PennyLane PyTorch or the PennyLane TensorFlow interfaces, PyTorch optimizers and TensorFlow optimizers (available in tf.train) should be used respectively.

In PennyLane, an optimizer is a procedure that executes one weight update step along (some function of) the negative gradient of the cost. This update depends in general on:

  • The function \(f(x)\), from which we calculate a gradient \(\nabla f(x)\). If \(x\) is a vector, the gradient is also a vector whose entries are the partial derivatives of \(f\) with respect to the elements of \(x\).
  • the current weights \(x\)
  • the (initial) step size \(\eta\)

The different optimizers can also depend on additional hyperparameters.

In the following, recursive definitions assume that \(x^{(0)}\) is some initial value in the optimization landscape, and all other step-dependent values are initialized to zero at \(t=0\).

Available optimizers

AdagradOptimizer([stepsize, eps]) Gradient-descent optimizer with past-gradient-dependent learning rate in each dimension.
AdamOptimizer([stepsize, beta1, beta2, eps]) Gradient-descent optimizer with adaptive learning rate, first and second moment.
GradientDescentOptimizer([stepsize]) Basic gradient-descent optimizer.
MomentumOptimizer([stepsize, momentum]) Gradient-descent optimizer with momentum.
NesterovMomentumOptimizer([stepsize, momentum]) Gradient-descent optimizer with Nesterov momentum.
RMSPropOptimizer([stepsize, decay, eps]) Root mean squared propagation optimizer.
QGTOptimizer([stepsize, diag_approx]) Optimizer with adaptive learning rate, via calculation of the quantum geometric tensor.

Code details

class AdagradOptimizer(stepsize=0.01, eps=1e-08)[source]

Gradient-descent optimizer with past-gradient-dependent learning rate in each dimension.

Adagrad adjusts the learning rate for each parameter \(x_i\) in \(x\) based on past gradients. We therefore have to consider each parameter update individually,

\[x^{(t+1)}_i = x^{(t)}_i - \eta_i^{(t+1)} \partial_{w_i} f(x^{(t)}),\]

where the gradient is replaced by a (scalar) partial derivative.

The learning rate in step \(t\) is given by

\[\eta_i^{(t+1)} = \frac{ \eta_{\mathrm{init}} }{ \sqrt{a_i^{(t+1)} + \epsilon } }, ~~~ a_i^{(t+1)} = \sum_{k=1}^t (\partial_{x_i} f(x^{(k)}))^2.\]

The offset \(\epsilon\) avoids division by zero.

\(\eta\) is the step size, a user defined parameter.

Parameters:
  • stepsize (float) – the user-defined hyperparameter \(\eta\)
  • eps (float) – offset \(\epsilon\) added for numerical stability
apply_grad(grad, x)[source]

Update the variables x to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.

Parameters:
  • grad (array) – The gradient of the objective function at point \(x^{(t)}\): \(\nabla f(x^{(t)})\)
  • x (array) – the current value of the variables \(x^{(t)}\)
Returns:

the new values \(x^{(t+1)}\)

Return type:

array

reset()[source]

Reset optimizer by erasing memory of past steps.

static compute_grad(objective_fn, x, grad_fn=None)

Compute gradient of the objective_fn at the point x.

Parameters:
  • objective_fn (function) – the objective function for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.
Returns:

NumPy array containing the gradient \(\nabla f(x^{(t)})\)

Return type:

array

step(objective_fn, x, grad_fn=None)

Update x with one step of the optimizer.

Parameters:
  • objective_fn (function) – the objective function for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.
Returns:

the new variable values \(x^{(t+1)}\)

Return type:

array

update_stepsize(stepsize)

Update the initialized stepsize value \(\eta\).

This allows for techniques such as learning rate scheduling.

Parameters:stepsize (float) – the user-defined hyperparameter \(\eta\)
class AdamOptimizer(stepsize=0.01, beta1=0.9, beta2=0.99, eps=1e-08)[source]

Gradient-descent optimizer with adaptive learning rate, first and second moment.

Adaptive Moment Estimation uses a step-dependent learning rate, a first moment \(a\) and a second moment \(b\), reminiscent of the momentum and velocity of a particle:

\[x^{(t+1)} = x^{(t)} - \eta^{(t+1)} \frac{a^{(t+1)}}{\sqrt{b^{(t+1)}} + \epsilon },\]

where the update rules for the three values are given by

\[\begin{split}a^{(t+1)} &= \frac{\beta_1 a^{(t)} + (1-\beta_1)\nabla f(x^{(t)})}{(1- \beta_1)},\\ b^{(t+1)} &= \frac{\beta_2 b^{(t)} + (1-\beta_2) ( \nabla f(x^{(t)}))^{\odot 2} }{(1- \beta_2)},\\ \eta^{(t+1)} &= \eta^{(t)} \frac{\sqrt{(1-\beta_2)}}{(1-\beta_1)}.\end{split}\]

Above, \(( \nabla f(x^{(t-1)}))^{\odot 2}\) denotes the element-wise square operation, which means that each element in the gradient is multiplied by itself. The hyperparameters \(\beta_1\) and \(\beta_2\) can also be step-dependent. Initially, the first and second moment are zero.

The shift \(\epsilon\) avoids division by zero.

For more details, see [R1].

Parameters:
  • stepsize (float) – the user-defined hyperparameter \(\eta\)
  • beta1 (float) – hyperparameter governing the update of the first and second moment
  • beta2 (float) – hyperparameter governing the update of the first and second moment
  • eps (float) – offset \(\epsilon\) added for numerical stability
apply_grad(grad, x)[source]

Update the variables x to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.

Parameters:
  • grad (array) – The gradient of the objective function at point \(x^{(t)}\): \(\nabla f(x^{(t)})\)
  • x (array) – the current value of the variables \(x^{(t)}\)
Returns:

the new values \(x^{(t+1)}\)

Return type:

array

reset()[source]

Reset optimizer by erasing memory of past steps.

static compute_grad(objective_fn, x, grad_fn=None)

Compute gradient of the objective_fn at the point x.

Parameters:
  • objective_fn (function) – the objective function for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.
Returns:

NumPy array containing the gradient \(\nabla f(x^{(t)})\)

Return type:

array

step(objective_fn, x, grad_fn=None)

Update x with one step of the optimizer.

Parameters:
  • objective_fn (function) – the objective function for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.
Returns:

the new variable values \(x^{(t+1)}\)

Return type:

array

update_stepsize(stepsize)

Update the initialized stepsize value \(\eta\).

This allows for techniques such as learning rate scheduling.

Parameters:stepsize (float) – the user-defined hyperparameter \(\eta\)
class GradientDescentOptimizer(stepsize=0.01)[source]

Basic gradient-descent optimizer.

Base class for other gradient-descent-based optimizers.

A step of the gradient descent optimizer computes the new values via the rule

\[x^{(t+1)} = x^{(t)} - \eta \nabla f(x^{(t)}).\]

where \(\eta\) is a user-defined hyperparameter corresponding to step size.

Parameters:stepsize (float) – the user-defined hyperparameter \(\eta\)
update_stepsize(stepsize)[source]

Update the initialized stepsize value \(\eta\).

This allows for techniques such as learning rate scheduling.

Parameters:stepsize (float) – the user-defined hyperparameter \(\eta\)
step(objective_fn, x, grad_fn=None)[source]

Update x with one step of the optimizer.

Parameters:
  • objective_fn (function) – the objective function for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.
Returns:

the new variable values \(x^{(t+1)}\)

Return type:

array

static compute_grad(objective_fn, x, grad_fn=None)[source]

Compute gradient of the objective_fn at the point x.

Parameters:
  • objective_fn (function) – the objective function for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.
Returns:

NumPy array containing the gradient \(\nabla f(x^{(t)})\)

Return type:

array

apply_grad(grad, x)[source]

Update the variables x to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.

Parameters:
  • grad (array) – The gradient of the objective function at point \(x^{(t)}\): \(\nabla f(x^{(t)})\)
  • x (array) – the current value of the variables \(x^{(t)}\)
Returns:

the new values \(x^{(t+1)}\)

Return type:

array

class MomentumOptimizer(stepsize=0.01, momentum=0.9)[source]

Gradient-descent optimizer with momentum.

The momentum optimizer adds a “momentum” term to gradient descent which considers the past gradients:

\[x^{(t+1)} = x^{(t)} - a^{(t+1)}.\]

The accumulator term \(a\) is updated as follows:

\[a^{(t+1)} = m a^{(t)} + \eta \nabla f(x^{(t)}),\]

with user defined parameters:

  • \(\eta\): the step size
  • \(m\): the momentum
Parameters:
  • stepsize (float) – user-defined hyperparameter \(\eta\)
  • momentum (float) – user-defined hyperparameter \(m\)
apply_grad(grad, x)[source]

Update the variables x to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.

Parameters:
  • grad (array) – The gradient of the objective function at point \(x^{(t)}\): \(\nabla f(x^{(t)})\)
  • x (array) – the current value of the variables \(x^{(t)}\)
Returns:

the new values \(x^{(t+1)}\)

Return type:

array

reset()[source]

Reset optimizer by erasing memory of past steps.

static compute_grad(objective_fn, x, grad_fn=None)

Compute gradient of the objective_fn at the point x.

Parameters:
  • objective_fn (function) – the objective function for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.
Returns:

NumPy array containing the gradient \(\nabla f(x^{(t)})\)

Return type:

array

step(objective_fn, x, grad_fn=None)

Update x with one step of the optimizer.

Parameters:
  • objective_fn (function) – the objective function for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.
Returns:

the new variable values \(x^{(t+1)}\)

Return type:

array

update_stepsize(stepsize)

Update the initialized stepsize value \(\eta\).

This allows for techniques such as learning rate scheduling.

Parameters:stepsize (float) – the user-defined hyperparameter \(\eta\)
class NesterovMomentumOptimizer(stepsize=0.01, momentum=0.9)[source]

Gradient-descent optimizer with Nesterov momentum.

Nesterov Momentum works like the Momentum optimizer, but shifts the current input by the momentum term when computing the gradient of the objective function:

\[a^{(t+1)} = m a^{(t)} + \eta \nabla f(x^{(t)} - m a^{(t)}).\]

The user defined parameters are:

  • \(\eta\): the step size
  • \(m\): the momentum
Parameters:
  • stepsize (float) – user-defined hyperparameter \(\eta\)
  • momentum (float) – user-defined hyperparameter \(m\)
compute_grad(objective_fn, x, grad_fn=None)[source]

Compute gradient of the objective_fn at at the shifted point \((x - m\times\text{accumulation})\).

Parameters:
  • objective_fn (function) – the objective function for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.
Returns:

NumPy array containing the gradient \(\nabla f(x^{(t)})\)

Return type:

array

apply_grad(grad, x)

Update the variables x to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.

Parameters:
  • grad (array) – The gradient of the objective function at point \(x^{(t)}\): \(\nabla f(x^{(t)})\)
  • x (array) – the current value of the variables \(x^{(t)}\)
Returns:

the new values \(x^{(t+1)}\)

Return type:

array

reset()

Reset optimizer by erasing memory of past steps.

step(objective_fn, x, grad_fn=None)

Update x with one step of the optimizer.

Parameters:
  • objective_fn (function) – the objective function for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.
Returns:

the new variable values \(x^{(t+1)}\)

Return type:

array

update_stepsize(stepsize)

Update the initialized stepsize value \(\eta\).

This allows for techniques such as learning rate scheduling.

Parameters:stepsize (float) – the user-defined hyperparameter \(\eta\)
class RMSPropOptimizer(stepsize=0.01, decay=0.9, eps=1e-08)[source]

Root mean squared propagation optimizer.

The root mean square progation optimizer is a modified Adagrad optimizer, with a decay of learning rate adaptation.

Extensions of the Adagrad optimization method generally start the sum \(a\) over past gradients in the denominator of the learning rate at a finite \(t'\) with \(0 < t' < t\), or decay past gradients to avoid an ever-decreasing learning rate.

Root Mean Square propagation is such an adaptation, where

\[a_i^{(t+1)} = \gamma a_i^{(t)} + (1-\gamma) (\partial_{x_i} f(x^{(t)}))^2.\]
Parameters:
  • stepsize (float) – the user-defined hyperparameter \(\eta\) used in the Adagrad optmization
  • decay (float) – the learning rate decay \(\gamma\)
  • eps (float) – offset \(\epsilon\) added for numerical stability (see Adagrad)
apply_grad(grad, x)[source]

Update the variables x to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.

Parameters:
  • grad (array) – The gradient of the objective function at point \(x^{(t)}\): \(\nabla f(x^{(t)})\)
  • x (array) – the current value of the variables \(x^{(t)}\)
Returns:

the new values \(x^{(t+1)}\)

Return type:

array

static compute_grad(objective_fn, x, grad_fn=None)

Compute gradient of the objective_fn at the point x.

Parameters:
  • objective_fn (function) – the objective function for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.
Returns:

NumPy array containing the gradient \(\nabla f(x^{(t)})\)

Return type:

array

reset()

Reset optimizer by erasing memory of past steps.

step(objective_fn, x, grad_fn=None)

Update x with one step of the optimizer.

Parameters:
  • objective_fn (function) – the objective function for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.
Returns:

the new variable values \(x^{(t+1)}\)

Return type:

array

update_stepsize(stepsize)

Update the initialized stepsize value \(\eta\).

This allows for techniques such as learning rate scheduling.

Parameters:stepsize (float) – the user-defined hyperparameter \(\eta\)
class QGTOptimizer(stepsize=0.01, diag_approx=False)[source]

Optimizer with adaptive learning rate, via calculation of the quantum geometric tensor.

The QGT optimizer uses a step- and parameter-dependent learning rate, with the learning rate dependent on the pseudo-inverse of the quantum geometric tensor \(G\):

\[x^{(t+1)} = x^{(t)} - \eta G(f(x^{(t)})^{-1} \nabla f(x^{(t)}),\]

where \(f(x^{(t)}) = \langle 0 | U(x^{(t))^\dagger \hat{B} U(x^{(t)) | 0 \rangle\) is an expectation value of some observable measured on the variational quantum circuit \(U(x^{(t))\).

Consider a quantum node represented by the variational quantum circuit

\[U(\mathbf{\theta}) = W(\theta_{i+1}, \dots, \theta_{N})X(\theta_{i}) V(\theta_1, \dots, \theta_{i-1}),\]

where \(X(\theta_{i}) = e^{i\theta_i K_i}\) (i.e., the gate \(K_i\) is the generator of the parametrized operation \(X(\theta_i)\) corresponding to the \(i\)-th parameter).

The quantum geometric tensor element is thus given by:

\[G_{ij} = \langle 0 | V^{-1} K_i K_j V | 0\rangle - \langle 0 | V^{-1} K_i V | 0\rangle\right \langle 0 | V^{-1} K_j V | 0\rangle\right\]

For parametric layer \(\ell\) in the variational quantum circuit containing $n$ parameters, an \(n\times n\) block diagonal submatrix of the quantum geometric tensor \(G_{ij}^{(\ell)}\) is computed by directly querying the quantum device.

Note

The QGT optimizer only supports single QNodes as objective functions.

In particular:

  • For hybrid classical-quantum models, the “mixed geometry” of the model makes it unclear which metric should be used for which parameter. For example, parameters of quantum nodes are better suited to one metric (such as the QGT), whereas others (e.g., parameters of classical nodes) are likely better suited to another metric.
  • For multi-QNode models, we don’t know what geometry is appropriate if a parameter is shared amongst several QNodes.
Parameters:
  • stepsize (float) – the user-defined hyperparameter \(\eta\)
  • diag_approx (bool) – If True, forces a diagonal approximation where the calculated metric tensor only contains diagonal elements \(G_{ii}\). In some cases, this may reduce the time taken per optimization step.
  • tol (float) – tolerance used when finding the inverse of the quantum gradient tensor
step(qnode, x, recompute_tensor=True)[source]

Update x with one step of the optimizer.

Parameters:
  • qnode (QNode) – the QNode for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • recompute_tensor (bool) – Whether or not the metric tensor should be recomputed. If not, the metric tensor from the previous optimization step is used.
Returns:

the new variable values \(x^{(t+1)}\)

Return type:

array

apply_grad(grad, x)[source]

Update the variables x to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.

Parameters:
  • grad (array) – The gradient of the objective function at point \(x^{(t)}\): \(\nabla f(x^{(t)})\)
  • x (array) – the current value of the variables \(x^{(t)}\)
Returns:

the new values \(x^{(t+1)}\)

Return type:

array

static compute_grad(objective_fn, x, grad_fn=None)

Compute gradient of the objective_fn at the point x.

Parameters:
  • objective_fn (function) – the objective function for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.
Returns:

NumPy array containing the gradient \(\nabla f(x^{(t)})\)

Return type:

array

update_stepsize(stepsize)

Update the initialized stepsize value \(\eta\).

This allows for techniques such as learning rate scheduling.

Parameters:stepsize (float) – the user-defined hyperparameter \(\eta\)