# TN Quantum-Circuit Simulation¶

Here we show how to simulate quantum circuits using the tensor network
capabilities of `quimb`

. We’ll do this by taking one of the google random
circuits from here for 7x7 = 49
qubits with a gate depth of 1 + 30 + 1, and exactly computing a single output
amplitude.

The main relevant object is `Circuit`

.
Under the hood this uses `gate_TN_1D()`

, which
applies an operator on some number of sites to any notionally 1D tensor network
(not just an MPS), whilst maintaining the outer indices (e.g. `'k0', 'k1', 'k2', ...`

).
. The various options for applying the operator
and propagating tags to it (if not contracted in) can be found in
`gate_TN_1D()`

. Note that the ‘1D’ nature of the
TN is just for indexing, gates can be applied to arbitrary combinations of
sites within this ‘register’.

```
[1]:
```

```
%matplotlib inline
import quimb as qu
import quimb.tensor as qtn
```

## Basic Usage¶

First of all here is a basic example of using the
`Circuit`

object with nearest neighbour only
gates:

```
[2]:
```

```
# 10 qubits and tag the initial wavefunction tensors
circ = qtn.Circuit(N=10, tags='PSI0')
# initial layer of hadamards
for i in range(10):
circ.apply_gate('H', i, gate_round=0)
# 8 rounds of entangling gates
for r in range(1, 9):
# even pairs
for i in range(0, 10, 2):
circ.apply_gate('CNOT', i, i + 1, gate_round=r)
# Y-rotations
for i in range(10):
circ.apply_gate('RY', 1.234, i, gate_round=r)
# odd pairs
for i in range(1, 9, 2):
circ.apply_gate('CZ', i, i + 1, gate_round=r)
# final layer of hadamards
for i in range(10):
circ.apply_gate('H', i, gate_round=r + 1)
circ
```

```
[2]:
```

```
<Circuit(n=10, n_gates=172, gate_opts={'contract': 'split-gate', 'propagate_tags': 'register'})>
```

The tensor network representing the state is stored in the `.psi`

attribute, which we can then visualize:

```
[3]:
```

```
circ.psi.graph(color=['PSI0', 'H', 'CNOT', 'RY', 'CZ'])
```

Note by default the CNOT gates have been split into two parts acting on each site seperately. We can also graph the default (`propagate_tags='register'`

) method for adding site tags to the applied operators:

```
[4]:
```

```
circ.psi.graph(color=[f'I{i}' for i in range(10)])
```

Or since we supplied `gate_round`

as an keyword (which is optional), the tensors are also tagged in that way:

```
[5]:
```

```
circ.psi.graph(color=['PSI0'] + [f'ROUND_{i}' for i in range(10)])
```

All of these are helpful when addressing only certain tensors:

```
[6]:
```

```
# select the subnetwork of tensors with *all* following tags
circ.psi.select(['CNOT', 'I3', 'ROUND_3'], which='all')
```

```
[6]:
```

```
<TensorNetwork(tensors=1, structure='I{}', nsites=10)>
```

The full list of implemented gates is here:

```
[7]:
```

```
qtn.circuit.APPLY_GATES.keys()
```

```
[7]:
```

```
dict_keys(['RZ', 'RY', 'RX', 'H', 'X', 'Y', 'Z', 'S', 'T', 'X_1_2', 'Y_1_2', 'CX', 'CY', 'CZ', 'CNOT'])
```

To check the gates were entangling, we can contract the whole network into a dense representation of the state:

```
[8]:
```

```
psi_dense = circ.psi ^ all
# check the half chain entanglement
psi_dense.entropy(left_inds=['k0', 'k1', 'k2', 'k3', 'k4'])
```

```
[8]:
```

```
2.660737743656919
```

## QASM Format and Advanced Manipulation¶

A common format to specifying quatum circuits is the QASM format. An example from
https://github.com/sboixo/GRCS can be found in `quimb/docs/examples/inst_7x7_31_0.txt`

for a 7x7 qubit circuit of depth 1 + 30 + 1 (the 0th and 31st round of gates
being un-entangling). We will exactly compute a single amplitude for this circuit.

A `Circuit`

can be instantiated directly from a QASM
file or URL, which automatically applies all the gates:

```
[9]:
```

```
circ = qtn.Circuit.from_qasm_file('inst_7x7_31_0.txt', tags='PSI_i')
```

We’ll compute the output amplitude of a random bitstring:

```
[10]:
```

```
import random
random.seed(42)
bitstring = "".join(random.choice('01') for _ in range(49))
print(bitstring)
# the squeeze removes all size 1 bonds
psi_sample = qtn.MPS_computational_state(bitstring, tags='PSI_f').squeeze()
```

```
0010000010000000101100111001001011101010110000100
```

The amplitude tensor network is then given by:

```
[11]:
```

```
c_tn = circ.psi & psi_sample
c_tn
```

```
[11]:
```

```
<TensorNetwork(tensors=1402, structure='I{}', nsites=49)>
```

(Note if we instead left some indices open we could compute *slices* of amplitudes.)

We can now view the same three tagging schemes as earlier to visualize the network with:

```
[12]:
```

```
gate_tags = ['PSI_i', 'H', 'CZ', 'T', 'X_1/2', 'Y_1/2', 'PSI_f']
site_tags = [c_tn.site_tag(i) for i in range(c_tn.nsites)]
round_tags = ['PSI_i'] + [f"ROUND_{i}" for i in range(31)]
```

```
[13]:
```

```
c_tn.graph(color=gate_tags) # could be slow
```

```
[14]:
```

```
c_tn.graph(color=site_tags) # could be slow
```

Clearly this is a complex network (1363 tensors) which it will be difficult to find a contraction scheme for! One thing we might try is contracting all the rank-1 and rank-2 tensors. Doing this never increases the complexity and it is in this sense that single qubit operations are ‘free’. This can be done with the `.rank_simplify`

method:

```
[15]:
```

```
c_tn_derank = c_tn.rank_simplify()
c_tn_derank
```

```
[15]:
```

```
<TensorNetwork(tensors=534, structure='I{}', nsites=49)>
```

```
[16]:
```

```
c_tn_derank.graph(color=site_tags)
```

Every tensor is now rank-3 or more. We can check \(\log_2\) of the size of the largest tensor produced in the contraction by calling:

```
[17]:
```

```
c_tn_derank.contraction_width(optimize='random-greedy')
```

```
[17]:
```

```
43.0
```

(Note that for this particular contraction the `'random-greedy'`

approach is quite un-optimal, and one could use more advanced methods).

However this corresponds to far too much memory to naively contract so instead we will give the contraction some manual help, following the rough ideas set out in https://arxiv.org/abs/1811.09599. This involves several steps:

Contract vertical slices for each site, yielding a simple flat network

‘Cutting’ some bonds of the network to trade memory for time

We’ll also initially take of copy of the network in single precision:

```
[18]:
```

```
c_tn_flat = c_tn.astype('complex64')
# inplace contract every tensor tagged with each site tag
for i in range(c_tn.nsites):
c_tn_flat ^= i
# inplace fuse multi-bonds between tensors
c_tn_flat.squeeze_(fuse=True)
# compress every bond to about 'complex64' precision
c_tn_flat.compress_all(cutoff_mode='rel', cutoff=1e-6)
```

We should now have a ‘flat’ tensor network of 49 tensors joined in a grid:

```
[19]:
```

```
# we can now easily specify grid positions as well
fix = {
c_tn_flat.site_tag(i): (i // 7, i % 7)
for i in range(c_tn_flat.nsites)
}
c_tn_flat.graph(site_tags, fix=fix)
```

Note that the new bonds are larger (and variable) but the sizes of the tensors is still manageable:

```
[20]:
```

```
{qu.log2(tensor.size) for tensor in c_tn_flat}
```

```
[20]:
```

```
{6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0}
```

And we have reduced the contraction width as well:

```
[21]:
```

```
c_tn_flat.contraction_width(optimize='random-greedy')
```

```
[21]:
```

```
27.0
```

Although a single tensor with `dtype='complex64'`

of this size only takes
2GB of memory, taking overheads into account we probably still want to reduce
this memory requirement a bit - to do this we will ‘cut’ some bonds.

### Cutting bonds¶

We can think of any bond in a tensor network as a trace over the rest of the network. In other words, we can enumerate then sum all the different values that bond can take, turning the network into a sum of simpler networks (as long as the tensor network with that bond removed has better contraction width).

For example lets make two incisions at the sides (an educated guess to make the ‘cut’ network simpler):

```
[22]:
```

```
# get bonds between 14-21 and 45-46
bnds = (
qtn.bonds(c_tn_flat[14], c_tn_flat[21]) |
qtn.bonds(c_tn_flat[45], c_tn_flat[46])
)
bnds
```

```
[22]:
```

```
{'_63081700005Ca', '_63081700005E7'}
```

We can view what it would look like to cut these bonds by ‘selecting’ values for them, here 0:

```
[23]:
```

```
selector = {ix: 0 for ix in bnds}
c_tn_cut = c_tn_flat.isel(selector)
c_tn_cut.graph(color=site_tags, fix=fix)
```

The product of the sizes of the bonds is the number of tensor networks we will need to sum over:

```
[24]:
```

```
d_cut = qu.prod(c_tn_flat.ind_size(ix) for ix in bnds)
d_cut
```

```
[24]:
```

```
64
```

And we can check this ‘cut’ network has better contraction width:

```
[25]:
```

```
c_tn_cut.contraction_width(optimize='random-greedy')
```

```
[25]:
```

```
21.0
```

In fact, this cut is optimum in the sense that the reduction in contraction
complexity (\(2^{27} - 2^{21} = 64\)) exactly matches the number of independant tensor networks
produced, `d_cut`

.

To exactly compute the amplitude \(c = \langle \psi_i | \psi_f \rangle\) we need to sum over
every combination of values the bonds can take. The `cut_iter()`

method automatically generates all the tensor networks we need:

```
[26]:
```

```
gen_cuts = c_tn_flat.cut_iter(*bnds)
# use tqdm to show progress / time (can comment this out)
from tqdm import tqdm
gen_cuts = tqdm(gen_cuts, total=d_cut)
# sum over each value the bond can take
c = sum([
c_tn_cut.contract(all, optimize='random-greedy')
for c_tn_cut in gen_cuts
])
```

```
100%|██████████| 64/64 [00:27<00:00, 2.43it/s]
```

```
[27]:
```

```
c
```

```
[27]:
```

```
(-5.526265972610744e-10-2.40785672650512e-08j)
```

### Parallel Contract¶

Since each contraction is independant, we can also trivially parallelize the work. The above calculation would have already been multi-threaded so we don’t expect much speedup, but we could for example distribute the work across many computers.

Here, we’ll just use a `loky`

pool (rather than the `mpi4py`

pool
from `quimb.get_mpi_pool`

which can’t pickle interactively defined,
local functions):

```
[28]:
```

```
from joblib.externals import loky
# make sure we single-thread BLAS on each process
def initializer():
import os
os.environ['OMP_NUM_THREADS'] = '1'
pool = loky.get_reusable_executor(initializer=initializer)
```

```
[29]:
```

```
%%time
c = sum(pool.map(
lambda c_tn_cut: c_tn_cut.contract(all, optimize='random-greedy'),
c_tn_flat.cut_iter(*bnds)
))
```

```
CPU times: user 1.22 s, sys: 83.8 ms, total: 1.31 s
Wall time: 17.7 s
```

```
[30]:
```

```
c
```

```
[30]:
```

```
(-5.526280266732186e-10-2.407847733004731e-08j)
```

Which as expected matches (to single precision) the non-parallel version, and in fact is faster.

### Dask Distributed Contract¶

Another convenient way to parallelize our computation and reduce its memory footprint is to use dask. Instead of many independant parallel runs, we then get a single task graph meaning that no bits of the calculation are ever redundantly computed.

The memory reduction comes from dask’s ability to slice, or ‘chunk’, arrays across specified dimensions such that the whole array is never needed in memory at once. Which dimensions should we chunk? Well, we know that the indices we chose to ‘cut’ earlier are critical indices that probably appear on the largest intermediate tensors, so those are a wise choice.

Finally how `dask`

stores it blocks of arrays is also customizable, and one
possibility here, if the blocks are not too big, is to use the GPU.
Here we’ll make use of cupy arrays which
should make all the numeric operations much faster! (You can comment out
the relevant lines if you have no GPU).

```
[31]:
```

```
import dask.array as da
import cupy as cp
# copy the *uncut* network to modify each tensor
dc_tn_flat = c_tn_flat.copy()
for t in dc_tn_flat:
# get the current tensor data
data = t.data
# convert data to GPU
data = cp.asarray(data)
# convert to dask array - only chunk along 'cut' bonds
chunks = [1 if ix in bnds else None for ix in t.inds]
data = da.from_array(data, chunks, asarray=False)
# modify the tensor's data inplace
t.modify(data=data)
```

We now have a tensor network of tensors backed by `dask`

arrays, each of which is itself backed by possibly many `cupy`

arrays.

To perform the full contraction however, we just need to specify `'dask'`

as the backend to `opt_einsum`

:

```
[32]:
```

```
c_graph = dc_tn_flat.contract(all, optimize='random-greedy', backend='dask')
```

You will notice this was essentially instant- so far `c`

this is just a lazily represented computation graph with unknown value:

```
[33]:
```

```
c_graph
```

```
[33]:
```

```
dask.array<sum-aggregate, shape=(), dtype=complex64, chunksize=()>
```

In this way, `dask`

abstracts the computational graph from the method used to compute it, meaning we can choose to actually perform the calcuation in a number of ways (for example on a distributed server). Here we’ll just use the simplest, in process, scheduler:

```
[34]:
```

```
%%time
c = c_graph.compute(scheduler='sync')
```

```
CPU times: user 353 ms, sys: 144 ms, total: 497 ms
Wall time: 498 ms
```

```
[35]:
```

```
c
```

```
[35]:
```

```
array(-5.5257476e-10-2.407854e-08j, dtype=complex64)
```

So about 30x faster than the parallel contract!