# qml.ShotAdaptiveOptimizer¶

class ShotAdaptiveOptimizer(min_shots, term_sampling='weighted_random_sampling', mu=0.99, b=1e-06, stepsize=0.07)[source]

Bases: pennylane.optimize.gradient_descent.GradientDescentOptimizer

Optimizer where the shot rate is adaptively calculated using the variances of the parameter-shift gradient.

By keeping a running average of the parameter-shift gradient and the variance of the parameter-shift gradient, this optimizer frugally distributes a shot budget across the partial derivatives of each parameter.

In addition, if computing the expectation value of a Hamiltonian using ExpvalCost, weighted random sampling can be used to further distribute the shot budget across the local terms from which the Hamiltonian is constructed.

Note

The shot adaptive optimizer only supports single QNodes or ExpvalCost objects as objective functions. The bound device must also be instantiated with a finite number of shots.

Parameters
• min_shots (int) – The minimum number of shots used to estimate the expectations of each term in the Hamiltonian. Note that this must be larger than 2 for the variance of the gradients to be computed.

• mu (float) – The running average constant $$\mu \in [0, 1]$$. Used to control how quickly the number of shots recommended for each gradient component changes.

• b (float) – Regularization bias. The bias should be kept small, but non-zero.

• term_sampling (str) – The random sampling algorithm to multinomially distribute the shot budget across terms in the Hamiltonian expectation value. Currently, only "weighted_random_sampling" is supported. Only takes effect if the objective function provided is an instance of ExpvalCost. Set this argument to None to turn off random sampling of Hamiltonian terms.

• stepsize (float) –

The learning rate $$\eta$$. The learning rate must be such that $$\eta < 2/L = 2/\sum_i|c_i|$$, where:

• $$L \leq \sum_i|c_i|$$ is the bound on the Lipschitz constant of the variational quantum algorithm objective function, and

• $$c_i$$ are the coefficients of the Hamiltonian used in the objective function.

Example

For VQE/VQE-like problems, the objective function for the optimizer can be realized as an ExpvalCost object, constructed using a Hamiltonian.

>>> coeffs = [2, 4, -1, 5, 2]
>>> obs = [
...   qml.PauliX(1),
...   qml.PauliZ(1),
...   qml.PauliX(0) @ qml.PauliX(1),
...   qml.PauliY(0) @ qml.PauliY(1),
...   qml.PauliZ(0) @ qml.PauliZ(1)
... ]
>>> H = qml.Hamiltonian(coeffs, obs)
>>> dev = qml.device("default.qubit", wires=2, shots=100)
>>> cost = qml.ExpvalCost(qml.templates.StronglyEntanglingLayers, H, dev)


Once constructed, the cost function can be passed directly to the optimizer’s step method. The attributes opt.shots_used and opt.total_shots_used can be used to track the number of shots per iteration, and across the life of the optimizer, respectively.

>>> params = qml.init.strong_ent_layers_uniform(n_layers=2, n_wires=2)
>>> opt = qml.ShotAdaptiveOptimizer(min_shots=10)
>>> for i in range(60):
...    params = opt.step(cost, params)
...    print(f"Step {i}: cost = {cost(params):.2f}, shots_used = {opt.total_shots_used}")
Step 0: cost = -5.69, shots_used = 240
Step 1: cost = -2.98, shots_used = 336
Step 2: cost = -4.97, shots_used = 624
Step 3: cost = -5.53, shots_used = 1054
Step 4: cost = -6.50, shots_used = 1798
Step 5: cost = -6.68, shots_used = 2942
Step 6: cost = -6.99, shots_used = 4350
Step 7: cost = -6.97, shots_used = 5814
Step 8: cost = -7.00, shots_used = 7230
Step 9: cost = -6.69, shots_used = 9006
Step 10: cost = -6.85, shots_used = 11286
Step 11: cost = -6.63, shots_used = 14934
Step 12: cost = -6.86, shots_used = 17934
Step 13: cost = -7.19, shots_used = 22950
Step 14: cost = -6.99, shots_used = 28302
Step 15: cost = -7.38, shots_used = 34134
Step 16: cost = -7.66, shots_used = 41022
Step 17: cost = -7.21, shots_used = 48918
Step 18: cost = -7.53, shots_used = 56286
Step 19: cost = -7.46, shots_used = 63822
Step 20: cost = -7.31, shots_used = 72534
Step 21: cost = -7.23, shots_used = 82014
Step 22: cost = -7.31, shots_used = 92838


The shot adaptive optimizer is based on the iCANS1 optimizer by Kübler et al. (2020), and works as follows:

1. The initial step of the optimizer is performed with some specified minimum number of shots, $$s_{min}$$, for all partial derivatives.

2. The parameter-shift rule is then used to estimate the gradient $$g_i$$ with $$s_i$$ shots for each parameter $$\theta_i$$, parameters, as well as the variances $$v_i$$ of the estimated gradients.

3. Gradient descent is performed for each parameter $$\theta_i$$, using the pre-defined learning rate $$\eta$$ and the gradient information $$g_i$$: $$\theta_i \rightarrow \theta_i - \eta g_i$$.

4. A maximum shot number is set by maximizing the improvement in the expected gain per shot. For a specific parameter value, the improvement in the expected gain per shot is then calculated via

$\gamma_i = \frac{1}{s_i} \left[ \left(\eta - \frac{1}{2} L\eta^2\right) g_i^2 - \frac{L\eta^2}{2s_i}v_i \right],$

where:

• $$L \leq \sum_i|c_i|$$ is the bound on the Lipschitz constant of the variational quantum algorithm objective function,

• $$c_i$$ are the coefficients of the Hamiltonian, and

• $$\eta$$ is the learning rate, and must be bound such that $$\eta < 2/L$$ for the above expression to hold.

5. Finally, the new values of $$s_{i+1}$$ (shots for partial derivative of parameter $$\theta_i$$) is given by:

$s_{i+1} = \frac{2L\eta}{2-L\eta}\left(\frac{v_i}{g_i^2}\right)\propto \frac{v_i}{g_i^2}.$

In addition to the above, to counteract the presence of noise in the system, a running average of $$g_i$$ and $$s_i$$ ($$\chi_i$$ and $$\xi_i$$ respectively) are used when computing $$\gamma_i$$ and $$s_i$$.

For more details, see:

• Andrew Arrasmith, Lukasz Cincio, Rolando D. Somma, and Patrick J. Coles. “Operator Sampling for Shot-frugal Optimization in Variational Algorithms.” arXiv:2004.06252 (2020).

• Jonas M. Kübler, Andrew Arrasmith, Lukasz Cincio, and Patrick J. Coles. “An Adaptive Optimizer for Measurement-Frugal Variational Algorithms.” Quantum 4, 263 (2020).

 apply_grad(grad, args) Update the variables to take a single optimization step. Verifies that the device used by the objective function is non-analytic. check_learning_rate(coeffs) Verifies that the learning rate is less than 2 over the Lipschitz constant, where the Lipschitz constant is given by $$\sum |c_i|$$ for Hamiltonian coefficients $$c_i$$. compute_grad(objective_fn, args, kwargs) Compute gradient of the objective function, as well as the variance of the gradient, at the given point. step(objective_fn, *args, **kwargs) Update trainable arguments with one step of the optimizer. step_and_cost(objective_fn, *args, **kwargs) Update trainable arguments with one step of the optimizer and return the corresponding objective function value prior to the step. update_stepsize(stepsize) Update the initialized stepsize value $$\eta$$. weighted_random_sampling(qnodes, coeffs, …) Returns an array of length shots containing single-shot estimates of the Hamiltonian gradient.
apply_grad(grad, args)

Update the variables to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.

Parameters
• grad (tuple [array]) – the gradient of the objective function at point $$x^{(t)}$$: $$\nabla f(x^{(t)})$$

• args (tuple) – the current value of the variables $$x^{(t)}$$

Returns

the new values $$x^{(t+1)}$$

Return type

list [array]

static check_device(dev)[source]

Verifies that the device used by the objective function is non-analytic.

Parameters

dev (Device) – the device to verify

Raises

ValueError – if the device is analytic

check_learning_rate(coeffs)[source]

Verifies that the learning rate is less than 2 over the Lipschitz constant, where the Lipschitz constant is given by $$\sum |c_i|$$ for Hamiltonian coefficients $$c_i$$.

Parameters

coeffs (Sequence[float]) – the coefficients of the terms in the Hamiltonian

Raises

ValueError – if the learning rate is large than $$2/\sum |c_i|$$

compute_grad(objective_fn, args, kwargs)[source]

Compute gradient of the objective function, as well as the variance of the gradient, at the given point.

Parameters
• objective_fn (function) – the objective function for optimization

• args – arguments to the objective function

• kwargs – keyword arguments to the objective function

Returns

a tuple of NumPy arrays containing the gradient $$\nabla f(x^{(t)})$$ and the variance of the gradient

Return type

tuple[array[float], array[float]]

step(objective_fn, *args, **kwargs)[source]

Update trainable arguments with one step of the optimizer.

Parameters
• objective_fn (function) – the objective function for optimization

• *args – variable length argument list for objective function

• **kwargs – variable length of keyword arguments for the objective function

Returns

The new variable values $$x^{(t+1)}$$. If single arg is provided, list[array] is replaced by array.

Return type

list[array]

step_and_cost(objective_fn, *args, **kwargs)[source]

Update trainable arguments with one step of the optimizer and return the corresponding objective function value prior to the step.

The objective function will be evaluated using the maximum number of shots across all parameters as determined by the optimizer during the optimization step.

Warning

Unlike other gradient descent optimizers, the objective function will be evaluated separately to the gradient computation, and will result in extra device evaluations.

Parameters
• objective_fn (function) – the objective function for optimization

• *args – variable length argument list for objective function

• **kwargs – variable length of keyword arguments for the objective function

Returns

the new variable values $$x^{(t+1)}$$ and the objective function output prior to the step. If single arg is provided, list [array] is replaced by array.

Return type

tuple[list [array], float]

update_stepsize(stepsize)

Update the initialized stepsize value $$\eta$$.

This allows for techniques such as learning rate scheduling.

Parameters

stepsize (float) – the user-defined hyperparameter $$\eta$$

static weighted_random_sampling(qnodes, coeffs, shots, argnums, *args, **kwargs)[source]

Returns an array of length shots containing single-shot estimates of the Hamiltonian gradient. The shots are distributed randomly over the terms in the Hamiltonian, as per a multinomial distribution.

Parameters
• qnodes (Sequence[QNode]) – Sequence of QNodes, each one when evaluated returning the corresponding expectation value of a term in the Hamiltonian.

• coeffs (Sequence[float]) – Sequences of coefficients corresponding to each term in the Hamiltonian. Must be the same length as qnodes.

• shots (int) – The number of shots used to estimate the Hamiltonian expectation value. These shots are distributed over the terms in the Hamiltonian, as per a Multinomial distribution.

• argnums (Sequence[int]) – the QNode argument indices which are trainable

• *args – Arguments to the QNodes

• **kwargs – Keyword arguments to the QNodes

Returns

the single-shot gradients of the Hamiltonian expectation value

Return type

array[float]