Quantum embeddings

A quantum embedding represents classical data as quantum states in a Hilbert space via a quantum feature map. It takes a classical datapoint \(x\) and translates it into a set of gate parameters in a quantum circuit, creating a quantum state \(| \psi_x \rangle\). This process is a crucial part of designing quantum algorithms and affects their computational power—for more details, see [schuld2018supervised] and [schuld2018quantum].

Let’s consider classical input data consisting of \(M\) examples, with \(N\) features each,

\[\mathcal{D}=\{x^{(1)}, \ldots, x^{(m)}, \ldots, x^{(M)}\},\]

where \(x^{(m)}\) is a \(N\)-dimensional vector for \(m=1,\ldots,M\). To embed this data into \(n\) quantum subsystems (\(n\) qubits or \(n\) qumodes for discrete- and continuous-variable quantum computing, respectively), we can use various embedding techniques, some of which are explained briefly below.

Basis Embedding

Basis embedding associates each input with a computational basis state of a qubit system. Therefore, classical data has to be in the form of binary strings. The embedded quantum state is the bit-wise translation of a binary string to the corresponding states of the quantum subsystems. For example, \(x=1001\) is represented by the 4-qubit quantum state \(| 1001 \rangle\). Hence, one bit of classical information is represented by one quantum subsystem.

Let’s consider the classical dataset \(\mathcal{D}\) mentioned above. For basis embedding, each example has to be a N-bit binary string; \(x^{(m)}=(b_1,\ldots,b_N)\) with \(b_i \in \{0,1\}\) for \(i=1,\ldots,N\). Assuming all features are represented with unit binary precision (one bit), each input example \(x^{(m)}\) can be directly mapped to the quantum state \(| x^{(m)}\rangle\). This means that the number of quantum subsystems, \(\bm{n}\) , must be at least equal to \(\bm{N}\). An entire dataset can be represented in superpositions of computational basis states as

\[| \mathcal{D} \rangle = \frac{1}{\sqrt{M}} \sum_{m=1}^{M} |x^{(m)} \rangle.\]

For example, let’s say we have a classical dataset containing two examples \(x^{(1)}=01\) and \(x^{(2)}=11\). The corresponding basis encoding uses two qubits to represent \(| x^{(1)} \rangle=|01 \rangle\) and \(| x^{(2)} \rangle=|11 \rangle\), resulting in

\[| \mathcal{D} \rangle = \frac{1}{\sqrt{2}}|01 \rangle + \frac{1}{\sqrt{2}} |11 \rangle.\]


For \(N\) bits, there are \(2^N\) possible basis states. Given \(M \ll 2^N\), the basis embedding of \(\mathcal{D}\) will be sparse.

Amplitude Embedding

In the amplitude-embedding technique, data is encoded into the amplitudes of a quantum state. A normalized classical \(N\)-dimensional datapoint \(x\) is represented by the amplitudes of a \(n\)-qubit quantum state \(| \psi_x \rangle\) as

\[| \psi_x \rangle = \sum_{i=1}^{N} x_i |i \rangle,\]

where \(N=2^n\), \(x_i\) is the \(i\)-th element of \(x\), and \(| i \rangle\) is the \(i\)-th computational basis state. In this case, however, \(x_i\) can have different numeric data types, e.g., integer or floating point. For example, let’s say we want to encode the four-dimensional floating-point array \(x=(1.0, 0.0, -5.5, 0.0)\) using amplitude embedding. The first step is to normalize it, i.e., \(x_{norm}=\frac{1}{\sqrt{31.25}}(1.0, 0.0, -5.5, 0.0)\). The corresponding amplitude encoding uses two qubits to represent \(x_{norm}\) as

\[| \psi_{x_{norm}} \rangle = \frac{1}{\sqrt{31.25}}\left[|00 \rangle - 5.5|10 \rangle\right].\]

Let’s consider the classical dataset \(\mathcal{D}\) mentioned above. Its amplitude embedding can be easily understood if we concatenate all the input examples \(x^{(m)}\) together into one vector, i.e.,

\[\alpha = C_{norm} \{ x^{(1)}_1, \ldots, x^{(1)}_N, x^{(2)}_1, \ldots, x^{(2)}_N, \ldots, x^{(M)}_1, \ldots, x^{(M)}_N \},\]

where \(C_{norm}\) is the normalization constant; this vector must be normalized \(|\alpha|^2=1\). The input dataset can now be represented in the computational basis as

\[| \mathcal{D} \rangle = \sum_{i=1}^{2^n} \alpha_i |i \rangle,\]

where \(\alpha_i\) are the elements of the amplitude vector \(\alpha\) and \(| i \rangle\) are the computational basis states. The number of amplitudes to be encoded is \(N \times M\). As a system of \(n\) qubits provides \(2^n\) amplitudes, amplitude embedding requires \(\bm{n \geq \log_2({NM})}\) qubits.


If the total number of amplitudes to embed, i.e., \(N \times M\), is less than \(2^n\), non-informative constants can be padded to \(\alpha\) [schuld2018supervised].

For example, if we have 3 examples with 2 features each, we have \(3\times 2= 6\) amplitudes to embed. However, we have to use at least \(\lceil \log_2(6)\rceil = 3\) qubits, with \(2^3=8\) available states. We therefore have to concatenate \(2^3-6=2\) constants at the end of \(\alpha\).


There are many other embedding techniques with similar embedding protocols. For example, squeezing embedding and displacement embedding are used with continuous-variable quantum computing models, where classical information is encoded in the squeezing and displacement operator parameters. Hamiltonian embedding uses an implicit technique by encoding information in the evolution of a quantum system [schuld2018supervised].

See also

PennyLane provides built-in embedding templates; see pennylane.templates.embeddings for more details.