Optimization methods

Module name: pennylane.optimize

Submodule containing PennyLane optimizers.

Warning

The built-in optimizers only support the default NumPy-interfacing QNode.

If using the PennyLane PyTorch or the PennyLane TensorFlow interfaces, PyTorch optimizers and TensorFlow optimizers (available in tf.train) should be used respectively.

In PennyLane, an optimizer is a procedure that executes one weight update step along (some function of) the negative gradient of the cost. This update depends in general on:

  • The function \(f(x)\), from which we calculate a gradient \(\nabla f(x)\). If \(x\) is a vector, the gradient is also a vector whose entries are the partial derivatives of \(f\) with respect to the elements of \(x\).
  • the current weights \(x\)
  • the (initial) step size \(\eta\)

The different optimizers can also depend on additional hyperparameters.

In the following, recursive definitions assume that \(x^{(0)}\) is some initial value in the optimization landscape, and all other step-dependent values are initialized to zero at \(t=0\).

Available optimizers

AdagradOptimizer([stepsize, eps]) Gradient-descent optimizer with past-gradient-dependent learning rate in each dimension.
AdamOptimizer([stepsize, beta1, beta2, eps]) Gradient-descent optimizer with adaptive learning rate, first and second moment.
GradientDescentOptimizer([stepsize]) Basic gradient-descent optimizer.
MomentumOptimizer([stepsize, momentum]) Gradient-descent optimizer with momentum.
NesterovMomentumOptimizer([stepsize, momentum]) Gradient-descent optimizer with Nesterov momentum.
RMSPropOptimizer([stepsize, decay, eps]) Root mean squared propagation optimizer.

Code details

class AdagradOptimizer(stepsize=0.01, eps=1e-08)[source]

Gradient-descent optimizer with past-gradient-dependent learning rate in each dimension.

Adagrad adjusts the learning rate for each parameter \(x_i\) in \(x\) based on past gradients. We therefore have to consider each parameter update individually,

\[x^{(t+1)}_i = x^{(t)}_i - \eta_i^{(t+1)} \partial_{w_i} f(x^{(t)}),\]

where the gradient is replaced by a (scalar) partial derivative.

The learning rate in step \(t\) is given by

\[\eta_i^{(t+1)} = \frac{ \eta_{\mathrm{init}} }{ \sqrt{a_i^{(t+1)} + \epsilon } }, ~~~ a_i^{(t+1)} = \sum_{k=1}^t (\partial_{x_i} f(x^{(k)}))^2.\]

The offset \(\epsilon\) avoids division by zero.

\(\eta\) is the step size, a user defined parameter.

Parameters:
  • stepsize (float) – the user-defined hyperparameter \(\eta\)
  • eps (float) – offset \(\epsilon\) added for numerical stability
apply_grad(grad, x)[source]

Update the variables x to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.

Parameters:
  • grad (array) – The gradient of the objective function at point \(x^{(t)}\): \(\nabla f(x^{(t)})\)
  • x (array) – the current value of the variables \(x^{(t)}\)
Returns:

the new values \(x^{(t+1)}\)

Return type:

array

reset()[source]

Reset optimizer by erasing memory of past steps.

static compute_grad(objective_fn, x, grad_fn=None)

Compute gradient of the objective_fn at the point x.

Parameters:
  • objective_fn (function) – the objective function for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.
Returns:

NumPy array containing the gradient \(\nabla f(x^{(t)})\)

Return type:

array

step(objective_fn, x, grad_fn=None)

Update x with one step of the optimizer.

Parameters:
  • objective_fn (function) – the objective function for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.
Returns:

the new variable values \(x^{(t+1)}\)

Return type:

array

update_stepsize(stepsize)

Update the initialized stepsize value \(\eta\).

This allows for techniques such as learning rate scheduling.

Parameters:stepsize (float) – the user-defined hyperparameter \(\eta\)
class AdamOptimizer(stepsize=0.01, beta1=0.9, beta2=0.99, eps=1e-08)[source]

Gradient-descent optimizer with adaptive learning rate, first and second moment.

Adaptive Moment Estimation uses a step-dependent learning rate, a first moment \(a\) and a second moment \(b\), reminiscent of the momentum and velocity of a particle:

\[x^{(t+1)} = x^{(t)} - \eta^{(t+1)} \frac{a^{(t+1)}}{\sqrt{b^{(t+1)}} + \epsilon },\]

where the update rules for the three values are given by

\[\begin{split}a^{(t+1)} &= \frac{\beta_1 a^{(t)} + (1-\beta_1)\nabla f(x^{(t)})}{(1- \beta_1)},\\ b^{(t+1)} &= \frac{\beta_2 b^{(t)} + (1-\beta_2) ( \nabla f(x^{(t)}))^{\odot 2} }{(1- \beta_2)},\\ \eta^{(t+1)} &= \eta^{(t)} \frac{\sqrt{(1-\beta_2)}}{(1-\beta_1)}.\end{split}\]

Above, \(( \nabla f(x^{(t-1)}))^{\odot 2}\) denotes the element-wise square operation, which means that each element in the gradient is multiplied by itself. The hyperparameters \(\beta_1\) and \(\beta_2\) can also be step-dependent. Initially, the first and second moment are zero.

The shift \(\epsilon\) avoids division by zero.

For more details, see [R1].

Parameters:
  • stepsize (float) – the user-defined hyperparameter \(\eta\)
  • beta1 (float) – hyperparameter governing the update of the first and second moment
  • beta2 (float) – hyperparameter governing the update of the first and second moment
  • eps (float) – offset \(\epsilon\) added for numerical stability
apply_grad(grad, x)[source]

Update the variables x to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.

Parameters:
  • grad (array) – The gradient of the objective function at point \(x^{(t)}\): \(\nabla f(x^{(t)})\)
  • x (array) – the current value of the variables \(x^{(t)}\)
Returns:

the new values \(x^{(t+1)}\)

Return type:

array

reset()[source]

Reset optimizer by erasing memory of past steps.

static compute_grad(objective_fn, x, grad_fn=None)

Compute gradient of the objective_fn at the point x.

Parameters:
  • objective_fn (function) – the objective function for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.
Returns:

NumPy array containing the gradient \(\nabla f(x^{(t)})\)

Return type:

array

step(objective_fn, x, grad_fn=None)

Update x with one step of the optimizer.

Parameters:
  • objective_fn (function) – the objective function for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.
Returns:

the new variable values \(x^{(t+1)}\)

Return type:

array

update_stepsize(stepsize)

Update the initialized stepsize value \(\eta\).

This allows for techniques such as learning rate scheduling.

Parameters:stepsize (float) – the user-defined hyperparameter \(\eta\)
class GradientDescentOptimizer(stepsize=0.01)[source]

Basic gradient-descent optimizer.

Base class for other gradient-descent-based optimizers.

A step of the gradient descent optimizer computes the new values via the rule

\[x^{(t+1)} = x^{(t)} - \eta \nabla f(x^{(t)}).\]

where \(\eta\) is a user-defined hyperparameter corresponding to step size.

Parameters:stepsize (float) – the user-defined hyperparameter \(\eta\)
update_stepsize(stepsize)[source]

Update the initialized stepsize value \(\eta\).

This allows for techniques such as learning rate scheduling.

Parameters:stepsize (float) – the user-defined hyperparameter \(\eta\)
step(objective_fn, x, grad_fn=None)[source]

Update x with one step of the optimizer.

Parameters:
  • objective_fn (function) – the objective function for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.
Returns:

the new variable values \(x^{(t+1)}\)

Return type:

array

static compute_grad(objective_fn, x, grad_fn=None)[source]

Compute gradient of the objective_fn at the point x.

Parameters:
  • objective_fn (function) – the objective function for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.
Returns:

NumPy array containing the gradient \(\nabla f(x^{(t)})\)

Return type:

array

apply_grad(grad, x)[source]

Update the variables x to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.

Parameters:
  • grad (array) – The gradient of the objective function at point \(x^{(t)}\): \(\nabla f(x^{(t)})\)
  • x (array) – the current value of the variables \(x^{(t)}\)
Returns:

the new values \(x^{(t+1)}\)

Return type:

array

class MomentumOptimizer(stepsize=0.01, momentum=0.9)[source]

Gradient-descent optimizer with momentum.

The momentum optimizer adds a “momentum” term to gradient descent which considers the past gradients:

\[x^{(t+1)} = x^{(t)} - a^{(t+1)}.\]

The accumulator term \(a\) is updated as follows:

\[a^{(t+1)} = m a^{(t)} + \eta \nabla f(x^{(t)}),\]

with user defined parameters:

  • \(\eta\): the step size
  • \(m\): the momentum
Parameters:
  • stepsize (float) – user-defined hyperparameter \(\eta\)
  • momentum (float) – user-defined hyperparameter \(m\)
apply_grad(grad, x)[source]

Update the variables x to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.

Parameters:
  • grad (array) – The gradient of the objective function at point \(x^{(t)}\): \(\nabla f(x^{(t)})\)
  • x (array) – the current value of the variables \(x^{(t)}\)
Returns:

the new values \(x^{(t+1)}\)

Return type:

array

reset()[source]

Reset optimizer by erasing memory of past steps.

static compute_grad(objective_fn, x, grad_fn=None)

Compute gradient of the objective_fn at the point x.

Parameters:
  • objective_fn (function) – the objective function for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.
Returns:

NumPy array containing the gradient \(\nabla f(x^{(t)})\)

Return type:

array

step(objective_fn, x, grad_fn=None)

Update x with one step of the optimizer.

Parameters:
  • objective_fn (function) – the objective function for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.
Returns:

the new variable values \(x^{(t+1)}\)

Return type:

array

update_stepsize(stepsize)

Update the initialized stepsize value \(\eta\).

This allows for techniques such as learning rate scheduling.

Parameters:stepsize (float) – the user-defined hyperparameter \(\eta\)
class NesterovMomentumOptimizer(stepsize=0.01, momentum=0.9)[source]

Gradient-descent optimizer with Nesterov momentum.

Nesterov Momentum works like the Momentum optimizer, but shifts the current input by the momentum term when computing the gradient of the objective function:

\[a^{(t+1)} = m a^{(t)} + \eta \nabla f(x^{(t)} - m a^{(t)}).\]

The user defined parameters are:

  • \(\eta\): the step size
  • \(m\): the momentum
Parameters:
  • stepsize (float) – user-defined hyperparameter \(\eta\)
  • momentum (float) – user-defined hyperparameter \(m\)
compute_grad(objective_fn, x, grad_fn=None)[source]

Compute gradient of the objective_fn at at the shifted point \((x - m\times\text{accumulation})\).

Parameters:
  • objective_fn (function) – the objective function for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.
Returns:

NumPy array containing the gradient \(\nabla f(x^{(t)})\)

Return type:

array

apply_grad(grad, x)

Update the variables x to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.

Parameters:
  • grad (array) – The gradient of the objective function at point \(x^{(t)}\): \(\nabla f(x^{(t)})\)
  • x (array) – the current value of the variables \(x^{(t)}\)
Returns:

the new values \(x^{(t+1)}\)

Return type:

array

reset()

Reset optimizer by erasing memory of past steps.

step(objective_fn, x, grad_fn=None)

Update x with one step of the optimizer.

Parameters:
  • objective_fn (function) – the objective function for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.
Returns:

the new variable values \(x^{(t+1)}\)

Return type:

array

update_stepsize(stepsize)

Update the initialized stepsize value \(\eta\).

This allows for techniques such as learning rate scheduling.

Parameters:stepsize (float) – the user-defined hyperparameter \(\eta\)
class RMSPropOptimizer(stepsize=0.01, decay=0.9, eps=1e-08)[source]

Root mean squared propagation optimizer.

The root mean square progation optimizer is a modified Adagrad optimizer, with a decay of learning rate adaptation.

Extensions of the Adagrad optimization method generally start the sum \(a\) over past gradients in the denominator of the learning rate at a finite \(t'\) with \(0 < t' < t\), or decay past gradients to avoid an ever-decreasing learning rate.

Root Mean Square propagation is such an adaptation, where

\[a_i^{(t+1)} = \gamma a_i^{(t)} + (1-\gamma) (\partial_{x_i} f(x^{(t)}))^2.\]
Parameters:
  • stepsize (float) – the user-defined hyperparameter \(\eta\) used in the Adagrad optmization
  • decay (float) – the learning rate decay \(\gamma\)
  • eps (float) – offset \(\epsilon\) added for numerical stability (see Adagrad)
apply_grad(grad, x)[source]

Update the variables x to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.

Parameters:
  • grad (array) – The gradient of the objective function at point \(x^{(t)}\): \(\nabla f(x^{(t)})\)
  • x (array) – the current value of the variables \(x^{(t)}\)
Returns:

the new values \(x^{(t+1)}\)

Return type:

array

static compute_grad(objective_fn, x, grad_fn=None)

Compute gradient of the objective_fn at the point x.

Parameters:
  • objective_fn (function) – the objective function for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.
Returns:

NumPy array containing the gradient \(\nabla f(x^{(t)})\)

Return type:

array

reset()

Reset optimizer by erasing memory of past steps.

step(objective_fn, x, grad_fn=None)

Update x with one step of the optimizer.

Parameters:
  • objective_fn (function) – the objective function for optimization
  • x (array) – NumPy array containing the current values of the variables to be updated
  • grad_fn (function) – Optional gradient function of the objective function with respect to the variables x. If None, the gradient function is computed automatically.
Returns:

the new variable values \(x^{(t+1)}\)

Return type:

array

update_stepsize(stepsize)

Update the initialized stepsize value \(\eta\).

This allows for techniques such as learning rate scheduling.

Parameters:stepsize (float) – the user-defined hyperparameter \(\eta\)