batch_vjp(tapes, dys, gradient_fn, reduction='append', gradient_kwargs=None)[source]

Generate the gradient tapes and processing function required to compute the vector-Jacobian products of a batch of tapes.

Consider a function $$\mathbf{f}(\mathbf{x})$$. The Jacobian is given by

$\begin{split}\mathbf{J}_{\mathbf{f}}(\mathbf{x}) = \begin{pmatrix} \frac{\partial f_1}{\partial x_1} &\cdots &\frac{\partial f_1}{\partial x_n}\\ \vdots &\ddots &\vdots\\ \frac{\partial f_m}{\partial x_1} &\cdots &\frac{\partial f_m}{\partial x_n}\\ \end{pmatrix}.\end{split}$

During backpropagation, the chain rule is applied. For example, consider the cost function $$h = y\circ f: \mathbb{R}^n \rightarrow \mathbb{R}$$, where $$y: \mathbb{R}^m \rightarrow \mathbb{R}$$. The gradient is:

$\nabla h(\mathbf{x}) = \frac{\partial y}{\partial \mathbf{f}} \frac{\partial \mathbf{f}}{\partial \mathbf{x}} = \frac{\partial y}{\partial \mathbf{f}} \mathbf{J}_{\mathbf{f}}(\mathbf{x}).$

Denote $$d\mathbf{y} = \frac{\partial y}{\partial \mathbf{f}}$$; we can write this in the form of a matrix multiplication:

$\left[\nabla h(\mathbf{x})\right]_{j} = \sum_{i=0}^m d\mathbf{y}_i ~ \mathbf{J}_{ij}.$

Thus, we can see that the gradient of the cost function is given by the so-called vector-Jacobian product; the product of the row-vector $$d\mathbf{y}$$, representing the gradient of subsequent components of the cost function, and $$\mathbf{J}$$, the Jacobian of the current node of interest.

Parameters
• tapes (Sequence[QuantumTape]) – sequence of quantum tapes to differentiate

• dys (Sequence[tensor_like]) – Sequence of gradient-output vectors dy. Must be the same length as tapes. Each dy tensor should have shape matching the output shape of the corresponding tape.

• gradient_fn (callable) – the gradient transform to use to differentiate the tapes

• reduction (str) – Determines how the vector-Jacobian products are returned. If append, then the output of the function will be of the form List[tensor_like], with each element corresponding to the VJP of each input tape. If extend, then the output VJPs will be concatenated.

• gradient_kwargs (dict) – dictionary of keyword arguments to pass when determining the gradients of tapes

Returns

list of vector-Jacobian products. None elements corresponds to tapes with no trainable parameters.

Return type

List[tensor_like or None]

Example

Consider the following Torch-compatible quantum tapes:

x = torch.tensor([[0.1, 0.2, 0.3], [0.4, 0.5, 0.6]], requires_grad=True, dtype=torch.float64)

def ansatz(x):
qml.RX(x[0, 0], wires=0)
qml.RY(x[0, 1], wires=1)
qml.RZ(x[0, 2], wires=0)
qml.CNOT(wires=[0, 1])
qml.RX(x[1, 0], wires=1)
qml.RY(x[1, 1], wires=0)
qml.RZ(x[1, 2], wires=1)

with qml.tape.QuantumTape() as tape1:
ansatz(x)
qml.expval(qml.PauliZ(0))
qml.probs(wires=1)

with qml.tape.QuantumTape() as tape2:
ansatz(x)
qml.expval(qml.PauliZ(0) @ qml.PauliZ(1))

tapes = [tape1, tape2]


Both tapes share the same circuit ansatz, but have different measurement outputs.

We can use the batch_vjp function to compute the vector-Jacobian product, given a list of gradient-output vectors dys per tape:

>>> dys = [torch.tensor([1., 1., 1.], dtype=torch.float64),
...  torch.tensor([1.], dtype=torch.float64)]


Note that each dy has shape matching the output dimension of the tape (tape1 has 1 expectation and 2 probability values — 3 outputs — and tape2 has 1 expectation value).

Executing the VJP tapes, and applying the processing function:

>>> dev = qml.device("default.qubit", wires=2)
>>> vjps
[tensor([-1.1562e-01, -1.3862e-02, -9.0841e-03, -1.3878e-16, -4.8217e-01,
tensor([ 1.7393e-01, -1.6412e-01, -5.3983e-03, -2.9366e-01, -4.0083e-01,

>>> cost = torch.sum(vjps + vjps)