` .. figure:: ../_static/quantum_gradient.svg :align: center :width: 70% :target: javascript:void(0); Decomposing the gradient of a quantum circuit function as a linear combination of quantum circuit functions. :html:`

` Making a rough analogy to classically computable functions, this is similar to how the derivative of the function :math:`f(x)=\sin(x)` is identical to :math:`\frac{1}{2}\sin(x+\frac{\pi}{2}) - \frac{1}{2}\sin(x-\frac{\pi}{2})`. So the same underlying algorithm can be reused to compute both :math:`\sin(x)` and its derivative (by evaluating at :math:`x\pm\frac{\pi}{2}`). This intuition holds for many quantum functions of interest: *the same circuit can be used to compute both the quantum function and the gradient of the quantum function* [#]_. A more technical explanation ---------------------------- Circuits in PennyLane are specified by a sequence of gates. The unitary transformation carried out by the circuit can thus be broken down into a product of unitaries: .. math:: U(x; \bm{\theta}) = U_N(\theta_{N}) U_{N-1}(\theta_{N-1}) \cdots U_i(\theta_i) \cdots U_1(\theta_1) U_0(x). Each of these gates is unitary, and therefore must have the form :math:`U_{j}(\gamma_j)=\exp{(i\gamma_j H_j)}` where :math:`H_j` is a Hermitian operator which generates the gate and :math:`\gamma_j` is the gate parameter. We have omitted which wire each unitary acts on, since it is not necessary for the following discussion. .. note:: In this example, we have used the input :math:`x` as the argument for gate :math:`U_0` and the parameters :math:`\bm{\theta}` for the remaining gates. This is not required. Inputs and parameters can be arbitrarily assigned to different gates. A single parameterized gate ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Let us single out a single parameter :math:`\theta_i` and its associated gate :math:`U_i(\theta_i)`. For simplicity, we remove all gates except :math:`U_i(\theta_i)` and :math:`U_0(x)` for the moment. In this case, we have a simplified quantum circuit function .. math:: f(x; \theta_i) = \langle 0 | U_0^\dagger(x)U_i^\dagger(\theta_i)\hat{B}U_i(\theta_i)U_0(x) | 0 \rangle = \langle x | U_i^\dagger(\theta_i)\hat{B}U_i(\theta_i) | x \rangle. For convenience, we rewrite the unitary conjugation as a linear transformation :math:`\mathcal{M}_{\theta_i}` acting on the operator :math:`\hat{B}`: .. math:: U_i^\dagger(\theta_i)\hat{B}U_i(\theta_i) = \mathcal{M}_{\theta_i}(\hat{B}). The transformation :math:`\mathcal{M}_{\theta_i}` depends smoothly on the parameter :math:`\theta_i`, so this quantum function will have a well-defined gradient: .. math:: \nabla_{\theta_i}f(x; \theta_i) = \langle x | \nabla_{\theta_i}\mathcal{M}_{\theta_i}(\hat{B}) | x \rangle \in \mathbb{R}. The key insight is that we can, in many cases of interest, express this gradient as a linear combination of the same transformation :math:`\mathcal{M}`, but with different parameters. Namely, .. math:: \nabla_{\theta_i}\mathcal{M}_{\theta_i}(\hat{B}) = c[\mathcal{M}_{\theta_i + s}(\hat{B}) - \mathcal{M}_{\theta_i - s}(\hat{B})], where the multiplier :math:`c` and the shift :math:`s` are determined completely by the type of transformation :math:`\mathcal{M}` and independent of the value of :math:`\theta_i`. .. note:: While this construction bears some resemblance to the numerical finite-difference method for computing derivatives, here :math:`s` is finite rather than infinitesimal. Multiple parameterized gates ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To complete the story, we now go back to the case where there are many gates in the circuit. We can absorb any gates applied before gate :math:`i` into the initial state: :math:`|\psi_{i-1}\rangle = U_{i-1}(\theta_{i-1}) \cdots U_{1}(\theta_{1})U_{0}(x)|0\rangle`. Similarly, any gates applied after gate :math:`i` are combined with the observable :math:`\hat{B}`: :math:`\hat{B}_{i+1} = U_{N}^\dagger(\theta_{N}) \cdots U_{i+1}^\dagger(\theta_{i+1}) \hat{B} U_{i+1}(\theta_{i+1}) \cdots U_{N}(\theta_{N})`. With this simplification, the quantum circuit function becomes .. math:: f(x; \bm{\theta}) = \langle \psi_{i-1} | U_i^\dagger(\theta_i) \hat{B}_{i+1} U_i(\theta_i) | \psi_{i-1} \rangle = \langle \psi_{i-1} | \mathcal{M}_{\theta_i} (\hat{B}_{i+1}) | \psi_{i-1} \rangle, and its gradient is .. math:: \nabla_{\theta_i}f(x; \bm{\theta}) = \langle \psi_{i-1} | \nabla_{\theta_i}\mathcal{M}_{\theta_i} (\hat{B}_{i+1}) | \psi_{i-1} \rangle. This gradient has the exact same form as the single-gate case, except we modify the state :math:`|x\rangle \rightarrow |\psi_{i-1}\rangle` and the measurement operator :math:`\hat{B}\rightarrow\hat{B}_{i+1}`. In terms of the circuit, this means we can leave all other gates as they are, and only modify gate :math:`U(\theta_i)` when we want to differentiate with respect to the parameter :math:`\theta_i`. .. note:: Sometimes we may want to use the same classical parameter with multiple gates in the circuit. Due to the `product rule