qml.ShotAdaptiveOptimizer¶

class
ShotAdaptiveOptimizer
(min_shots, term_sampling='weighted_random_sampling', mu=0.99, b=1e06, stepsize=0.07)[source]¶ Bases:
pennylane.optimize.gradient_descent.GradientDescentOptimizer
Optimizer where the shot rate is adaptively calculated using the variances of the parametershift gradient.
By keeping a running average of the parametershift gradient and the variance of the parametershift gradient, this optimizer frugally distributes a shot budget across the partial derivatives of each parameter.
In addition, if computing the expectation value of a Hamiltonian using
ExpvalCost
, weighted random sampling can be used to further distribute the shot budget across the local terms from which the Hamiltonian is constructed.Note
The shot adaptive optimizer only supports single QNodes or
ExpvalCost
objects as objective functions. The bound device must also be instantiated with a finite number of shots. Parameters
min_shots (int) – The minimum number of shots used to estimate the expectations of each term in the Hamiltonian. Note that this must be larger than 2 for the variance of the gradients to be computed.
mu (float) – The running average constant \(\mu \in [0, 1]\). Used to control how quickly the number of shots recommended for each gradient component changes.
b (float) – Regularization bias. The bias should be kept small, but nonzero.
term_sampling (str) – The random sampling algorithm to multinomially distribute the shot budget across terms in the Hamiltonian expectation value. Currently, only
"weighted_random_sampling"
is supported. Only takes effect if the objective function provided is an instance ofExpvalCost
. Set this argument toNone
to turn off random sampling of Hamiltonian terms.stepsize (float) –
The learning rate \(\eta\). The learning rate must be such that \(\eta < 2/L = 2/\sum_ic_i\), where:
\(L \leq \sum_ic_i\) is the bound on the Lipschitz constant of the variational quantum algorithm objective function, and
\(c_i\) are the coefficients of the Hamiltonian used in the objective function.
Example
For VQE/VQElike problems, the objective function for the optimizer can be realized as an
ExpvalCost
object, constructed using aHamiltonian
.>>> coeffs = [2, 4, 1, 5, 2] >>> obs = [ ... qml.PauliX(1), ... qml.PauliZ(1), ... qml.PauliX(0) @ qml.PauliX(1), ... qml.PauliY(0) @ qml.PauliY(1), ... qml.PauliZ(0) @ qml.PauliZ(1) ... ] >>> H = qml.Hamiltonian(coeffs, obs) >>> dev = qml.device("default.qubit", wires=2, shots=100) >>> cost = qml.ExpvalCost(qml.templates.StronglyEntanglingLayers, H, dev)
Once constructed, the cost function can be passed directly to the optimizer’s
step
method. The attributesopt.shots_used
andopt.total_shots_used
can be used to track the number of shots per iteration, and across the life of the optimizer, respectively.>>> params = qml.init.strong_ent_layers_uniform(n_layers=2, n_wires=2) >>> opt = qml.ShotAdaptiveOptimizer(min_shots=10) >>> for i in range(60): ... params = opt.step(cost, params) ... print(f"Step {i}: cost = {cost(params):.2f}, shots_used = {opt.total_shots_used}") Step 0: cost = 5.69, shots_used = 240 Step 1: cost = 2.98, shots_used = 336 Step 2: cost = 4.97, shots_used = 624 Step 3: cost = 5.53, shots_used = 1054 Step 4: cost = 6.50, shots_used = 1798 Step 5: cost = 6.68, shots_used = 2942 Step 6: cost = 6.99, shots_used = 4350 Step 7: cost = 6.97, shots_used = 5814 Step 8: cost = 7.00, shots_used = 7230 Step 9: cost = 6.69, shots_used = 9006 Step 10: cost = 6.85, shots_used = 11286 Step 11: cost = 6.63, shots_used = 14934 Step 12: cost = 6.86, shots_used = 17934 Step 13: cost = 7.19, shots_used = 22950 Step 14: cost = 6.99, shots_used = 28302 Step 15: cost = 7.38, shots_used = 34134 Step 16: cost = 7.66, shots_used = 41022 Step 17: cost = 7.21, shots_used = 48918 Step 18: cost = 7.53, shots_used = 56286 Step 19: cost = 7.46, shots_used = 63822 Step 20: cost = 7.31, shots_used = 72534 Step 21: cost = 7.23, shots_used = 82014 Step 22: cost = 7.31, shots_used = 92838
Usage Details
The shot adaptive optimizer is based on the iCANS1 optimizer by Kübler et al. (2020), and works as follows:
The initial step of the optimizer is performed with some specified minimum number of shots, \(s_{min}\), for all partial derivatives.
The parametershift rule is then used to estimate the gradient \(g_i\) with \(s_i\) shots for each parameter \(\theta_i\), parameters, as well as the variances \(v_i\) of the estimated gradients.
Gradient descent is performed for each parameter \(\theta_i\), using the predefined learning rate \(\eta\) and the gradient information \(g_i\): \(\theta_i \rightarrow \theta_i  \eta g_i\).
A maximum shot number is set by maximizing the improvement in the expected gain per shot. For a specific parameter value, the improvement in the expected gain per shot is then calculated via
\[\gamma_i = \frac{1}{s_i} \left[ \left(\eta  \frac{1}{2} L\eta^2\right) g_i^2  \frac{L\eta^2}{2s_i}v_i \right],\]where:
\(L \leq \sum_ic_i\) is the bound on the Lipschitz constant of the variational quantum algorithm objective function,
\(c_i\) are the coefficients of the Hamiltonian, and
\(\eta\) is the learning rate, and must be bound such that \(\eta < 2/L\) for the above expression to hold.
Finally, the new values of \(s_{i+1}\) (shots for partial derivative of parameter \(\theta_i\)) is given by:
\[s_{i+1} = \frac{2L\eta}{2L\eta}\left(\frac{v_i}{g_i^2}\right)\propto \frac{v_i}{g_i^2}.\]
In addition to the above, to counteract the presence of noise in the system, a running average of \(g_i\) and \(s_i\) (\(\chi_i\) and \(\xi_i\) respectively) are used when computing \(\gamma_i\) and \(s_i\).
For more details, see:
Andrew Arrasmith, Lukasz Cincio, Rolando D. Somma, and Patrick J. Coles. “Operator Sampling for Shotfrugal Optimization in Variational Algorithms.” arXiv:2004.06252 (2020).
Jonas M. Kübler, Andrew Arrasmith, Lukasz Cincio, and Patrick J. Coles. “An Adaptive Optimizer for MeasurementFrugal Variational Algorithms.” Quantum 4, 263 (2020).
Methods
apply_grad
(grad, args)Update the variables to take a single optimization step.
check_device
(dev)Verifies that the device used by the objective function is nonanalytic.
check_learning_rate
(coeffs)Verifies that the learning rate is less than 2 over the Lipschitz constant, where the Lipschitz constant is given by \(\sum c_i\) for Hamiltonian coefficients \(c_i\).
compute_grad
(objective_fn, args, kwargs)Compute gradient of the objective function, as well as the variance of the gradient, at the given point.
step
(objective_fn, *args, **kwargs)Update trainable arguments with one step of the optimizer.
step_and_cost
(objective_fn, *args, **kwargs)Update trainable arguments with one step of the optimizer and return the corresponding objective function value prior to the step.
update_stepsize
(stepsize)Update the initialized stepsize value \(\eta\).
weighted_random_sampling
(qnodes, coeffs, …)Returns an array of length
shots
containing singleshot estimates of the Hamiltonian gradient.
apply_grad
(grad, args)¶ Update the variables to take a single optimization step. Flattens and unflattens the inputs to maintain nested iterables as the parameters of the optimization.
 Parameters
grad (tuple [array]) – the gradient of the objective function at point \(x^{(t)}\): \(\nabla f(x^{(t)})\)
args (tuple) – the current value of the variables \(x^{(t)}\)
 Returns
the new values \(x^{(t+1)}\)
 Return type
list [array]

static
check_device
(dev)[source]¶ Verifies that the device used by the objective function is nonanalytic.
 Parameters
dev (Device) – the device to verify
 Raises
ValueError – if the device is analytic

check_learning_rate
(coeffs)[source]¶ Verifies that the learning rate is less than 2 over the Lipschitz constant, where the Lipschitz constant is given by \(\sum c_i\) for Hamiltonian coefficients \(c_i\).
 Parameters
coeffs (Sequence[float]) – the coefficients of the terms in the Hamiltonian
 Raises
ValueError – if the learning rate is large than \(2/\sum c_i\)

compute_grad
(objective_fn, args, kwargs)[source]¶ Compute gradient of the objective function, as well as the variance of the gradient, at the given point.
 Parameters
objective_fn (function) – the objective function for optimization
args – arguments to the objective function
kwargs – keyword arguments to the objective function
 Returns
a tuple of NumPy arrays containing the gradient \(\nabla f(x^{(t)})\) and the variance of the gradient
 Return type
tuple[array[float], array[float]]

step
(objective_fn, *args, **kwargs)[source]¶ Update trainable arguments with one step of the optimizer.
 Parameters
objective_fn (function) – the objective function for optimization
*args – variable length argument list for objective function
**kwargs – variable length of keyword arguments for the objective function
 Returns
The new variable values \(x^{(t+1)}\). If single arg is provided, list[array] is replaced by array.
 Return type
list[array]

step_and_cost
(objective_fn, *args, **kwargs)[source]¶ Update trainable arguments with one step of the optimizer and return the corresponding objective function value prior to the step.
The objective function will be evaluated using the maximum number of shots across all parameters as determined by the optimizer during the optimization step.
Warning
Unlike other gradient descent optimizers, the objective function will be evaluated separately to the gradient computation, and will result in extra device evaluations.
 Parameters
objective_fn (function) – the objective function for optimization
*args – variable length argument list for objective function
**kwargs – variable length of keyword arguments for the objective function
 Returns
the new variable values \(x^{(t+1)}\) and the objective function output prior to the step. If single arg is provided, list [array] is replaced by array.
 Return type
tuple[list [array], float]

update_stepsize
(stepsize)¶ Update the initialized stepsize value \(\eta\).
This allows for techniques such as learning rate scheduling.
 Parameters
stepsize (float) – the userdefined hyperparameter \(\eta\)

static
weighted_random_sampling
(qnodes, coeffs, shots, argnums, *args, **kwargs)[source]¶ Returns an array of length
shots
containing singleshot estimates of the Hamiltonian gradient. The shots are distributed randomly over the terms in the Hamiltonian, as per a multinomial distribution. Parameters
qnodes (Sequence[QNode]) – Sequence of QNodes, each one when evaluated returning the corresponding expectation value of a term in the Hamiltonian.
coeffs (Sequence[float]) – Sequences of coefficients corresponding to each term in the Hamiltonian. Must be the same length as
qnodes
.shots (int) – The number of shots used to estimate the Hamiltonian expectation value. These shots are distributed over the terms in the Hamiltonian, as per a Multinomial distribution.
argnums (Sequence[int]) – the QNode argument indices which are trainable
*args – Arguments to the QNodes
**kwargs – Keyword arguments to the QNodes
 Returns
the singleshot gradients of the Hamiltonian expectation value
 Return type
array[float]
Contents
Using PennyLane
Development
API
Downloads