Theory

Foundational Material

A certain amount of prerequisite knowledge is necessary to utilize and understand quantum computational algorithms and processes. Someday this material may be substantially diminished by intelligently chosen abstractions, but today quantum systems are still heavily dependent on an understanding of the underlying mathematical principles.

It is outside the scope of this document to cover that material. However, a set of references have been collected here. These materials provide a sufficient foundation for onboarding a new engineer or scientist.

Quantum Bit Simulation

Quantum bits are simulated by recording the complex number amplitude of a wave function solution to Schrödinger’s equation. All this means is, we record one complex number with legnth between 0 and 1 correspendonding to each possible permutation of bits in the “coherent” set of quantum bits. The sum of the norms of all permutation amplitudes must sum to 1. The norm is given by the complex number times its “complex conjugate.” The complex conjugate of a number is given by flipping the plus/minus sign on its imaginary component. Any possible state of a three qubit system can be expressed as follows:

(1)\[ \lvert\psi\rangle = x_0 \lvert 000\rangle + x_1 \lvert 001\rangle + x_2 \lvert 010\rangle + x_3 \lvert 011\rangle + \ x_4 \lvert 100\rangle + x_5 \lvert 101\rangle + x_6 \lvert 110\rangle + x_7 \lvert 111\rangle\]

Each of the \(x_n\) are complex numbers. These are the amplitudes of the quantum system, multiplied times the “eigenstates.” If all bits are measured simultaneously to check if they are 0 or 1, the norm of an amplitude gives its probability on measurement, resulting in a particular bit pattern based on the amplitudes and a randomly generated number. While a real quantum mechanical system would produce probabilistic measurements based on these amplitudes, a simulation like this is deterministic, but pseudo-random.

Note

Probability vs Amplitude It is a common misconception that the defining characteristic of a quantum computer, compared to a classical computer, is that a quantum computer is probabilistic. Except, an \(x_n\) eigenstate has both probability and a phase. It is not a bit with just a dimension of probability, but rather a bit with two dimensions, one of probablity and one of phase, an “amplitude.” The root of this misconception lies in the measurement operation, which can have a probabilistic outcome. But the \(x_n\) coefficients are not probabilistic values - rather, they are the amplitude of complex number wave function. If we both represent and measure the state as a permutation of 0 and 1 bits, the value of the wave function for any state is the square root of its probability times a phase factor.

By collecting \(x_n\) into a complex number array (called Qrack::QInterface::stateVec), the full quantum representation of the system can be recorded using \(2^N\) complex number variables for N quantum bits:

std::unique_ptr<Complex16[]> sv(new Complex16[1 << qBitCount]);

Given a standard \(X\) gate matrix,

(2)\[\begin{split}\begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}\end{split}\]

one might ask, how can this \(2\times2\) matrix be applied against the \(1xN\) vector for N arbitrary entangled qubits, where the vector is the \(x_n\) amplitudes from (1)?

\[\begin{split}\begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} ??? \begin{bmatrix} x_{000} \\ x_{001} \\ x_{010} \\ x_{011} \\ x_{100} \\ x_{101} \\ x_{110} \\ x_{111} \end{bmatrix}\end{split}\]

To do so, we apply a Kronecker product to the gate matrix. This expands the matrix out to the appropriate number of dimensions - in this case we would need to perform two Kronecker products for each of the two bits whose values are irrelevant to the result:

(3)\[\left(X \otimes I \otimes I\right) \times M\]
(4)\[\begin{split}\left(\begin{bmatrix} 0 & 1 \\\ 1 & 0 \end{bmatrix} \otimes \begin{bmatrix} 1 & 0 \\\ 0 & 1 \end{bmatrix} \otimes \begin{bmatrix} 1 & 0 \\\ 0 & 1 \end{bmatrix}\right) \times \begin{bmatrix} x_{000} \\ x_{001} \\ x_{010} \\ x_{011} \\ x_{100} \\ x_{101} \\ x_{110} \\ x_{111} \end{bmatrix}\end{split}\]
(5)\[\begin{split}\begin{bmatrix} 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \end{bmatrix} \times \begin{bmatrix} x_{000} \\ x_{001} \\ x_{010} \\ x_{011} \\ x_{100} \\ x_{101} \\ x_{110} \\ x_{111} \end{bmatrix}\end{split}\]
(6)\[\begin{split} (X \otimes I \otimes I) \times \begin{bmatrix} x_{000} \\ x_{001} \\ x_{010} \\ x_{011} \\ x_{100} \\ x_{101} \\ x_{110} \\ x_{111} \end{bmatrix} = \begin{bmatrix} x_{001} \\ x_{000} \\ x_{011} \\ x_{010} \\ x_{101} \\ x_{100} \\ x_{111} \\ x_{110} \end{bmatrix}\end{split}\]

The operation in (3) swaps the amplitudes of 0 and 1 for the first bit out of three, but leaves the second and third bits alone. Using the identity matrix \(I\) preserves the amplitudes of the \(x_{0nn}\) and \(x_{1nn}\) positions. The expanded matrix in (5) now has the proper dimensionality to be multiplied directly against the amplitude vector.

Note

It’s important to remember here that, unlike a classical \(NOT\) which directly inverts a bit, the \(X\) gate swaps the amplitudes for the states where the qubit is 1 with the amplitudes where the qubit is 0. If the value of \(M[0]\) is \(\lvert100\rangle\), then a subsequent \(X[0]\) gate would exchange \(x_{100}\) and \(x_{000}\) and therefore leave the state as \(\lvert000\rangle\). See Quantum Logic Gates for more information.

Implementing this naively would require matrices sized at \(2^{2N}\) complex numbers for \(N\) bits (as illustrated above in (5)). This rapidly grows prohibitive in memory usage, and this is the primary limitation for simulating quantum systems using classical components. Fortunately, these types of matrix operations are easily optimized for both memory usage and parallelization.

There are two immediate optimizations that can be performed. The first is an optimization on the matrix size: by performing the math with only a \(2\times2\) matrix, the amount of memory allocated is substantially reduced. The Qrack::QInterface::Apply2x2() method utilizes this optimization.

In shorthand for clarity, an optimized \(X\) gate is calculated using the following linear algebra:

(7)\[\begin{split}\begin{bmatrix} { \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \times \begin{bmatrix} x_{000} \\ x_{001} \end{bmatrix} }\\ { \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \times \begin{bmatrix} x_{010} \\ x_{011} \end{bmatrix} }\\ { \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \times \begin{bmatrix} x_{100} \\ x_{101} \end{bmatrix} }\\ { \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} \times \begin{bmatrix} x_{110} \\ x_{111} \end{bmatrix} } \end{bmatrix} = \begin{bmatrix} { \begin{bmatrix} x_{001} \\ x_{000} \end{bmatrix} } \\ { \begin{bmatrix} x_{011} \\ x_{010} \end{bmatrix} } \\ { \begin{bmatrix} x_{101} \\ x_{100} \end{bmatrix} } \\ { \begin{bmatrix} x_{111} \\ x_{110} \end{bmatrix} } \end{bmatrix}\end{split}\]

And, fully decomposing (7):

\[\begin{split}\begin{bmatrix} { \begin{bmatrix} 0 & 1 \end{bmatrix} \times \begin{bmatrix} x_{000} \\ x_{001} \end{bmatrix} } \\ { \begin{bmatrix} 1 & 0 \end{bmatrix} \times \begin{bmatrix} x_{000} \\ x_{001} \end{bmatrix} } \\ { \begin{bmatrix} 0 & 1 \end{bmatrix} \times \begin{bmatrix} x_{010} \\ x_{011} \end{bmatrix} } \\ { \begin{bmatrix} 1 & 0 \end{bmatrix} \times \begin{bmatrix} x_{010} \\ x_{011} \end{bmatrix} } \\ { \begin{bmatrix} 0 & 1 \end{bmatrix} \times \begin{bmatrix} x_{100} \\ x_{101} \end{bmatrix} } \\ { \begin{bmatrix} 1 & 0 \end{bmatrix} \times \begin{bmatrix} x_{100} \\ x_{101} \end{bmatrix} } \\ { \begin{bmatrix} 0 & 1 \end{bmatrix} \times \begin{bmatrix} x_{110} \\ x_{111} \end{bmatrix} } \\ { \begin{bmatrix} 1 & 0 \end{bmatrix} \times \begin{bmatrix} x_{110} \\ x_{111} \end{bmatrix} } \end{bmatrix} = \begin{bmatrix} x_{001} \\ x_{000} \\ x_{011} \\ x_{010} \\ x_{101} \\ x_{100} \\ x_{111} \\ x_{110} \end{bmatrix}\end{split}\]

It’s worth pointing out that the operation detailed in (7) is heavily parallelize-able, yielding substantial benefits when working with gates spanning more than just one register (e.g. \(CNOT\) and \(CCNOT\) gates). In C++, this would be implemented like so:

// Create a three qubit register.
Qrack::QInterface qReg(3);

// X-gate the bit at index 0
qReg->X(0);

The second optimization is to maintain separability of state vectors between bits where entanglement is not necessary. See IBM’s article and related publication for details on how to optimize these operations in more detail. The Qrack::QUnit and Qrack::QInterface register-wide operations (e.g. Qrack::QInterface::X()) leverage these types of optimizations, with parallelization provided through threading and OpenCL, as supported.

LDA,X Unitary Matrix

Note that the VM6502Q X-addressed LDA, ADC, and SBC operations can load, add, or subtract with a superposed X register. If the permutation states of the classical memory addressed by the X register are treated as quantum degrees of freedom, these operations are unitary. A simplified example of the unitary matrix or operator for 2 qubits and a “lookup table” of two independent bits is given below. The least significant bit is the index (or X register), the second least significant bit is the value (or accumulator), and the third and fourth bits are the 0 and 1 indexed classical bits in the “lookup table,” treated as quantum degrees of freedom. The rows and columns of the matrix proceed in bit signifance permutation order from \(\lvert0000\rangle\) to \(\lvert1111\rangle\).

\[\begin{split}\begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \end{bmatrix}\end{split}\]

This allows search of a “real unstructured database” or unstructured lookup table, per [Broda2016]. That paper also proposes a model for the memory of the lookup table.

6502 Reference Documents

[MOS-6502]The 6502 CPU - https://en.wikipedia.org/wiki/MOS_Technology_6502
[6502ASM]6502 Assembly Reference - http://www.6502.org/tutorials/6502opcodes.html

For details on the added opcodes supported by vm6502q, see MOS-6502Q Opcodes.