4. Optimization¶
quimb
supports optimizing an abtirary tensor network with respect to an arbitrary ‘loss’ function using automatic differentiation. This is encapsulated in the TNOptimizer
object. Internally, this makes use of one of various autodiff libraries to compute the neccesary tensor gradients, then maps these tensor parameters into a ‘ravelled’ single real vector for optimization in an outer loop by scipy.optimize.minimize()
.
Note
You can also use quimb
simply as a way to orchestrate operations on e.g. torch
or jax
arrays, and then use optimiztion libraries from those frameworks directly. This is demonstrated in the examples Using quimb within torch and Using quimb within jax, flax and optax.
The basic steps are:
Define a target tensor network (or pytree[1] of
TensorNetwork
,Tensor
, raw array, orCircuit
objects).Optionally if its a single tensor network, define sets of tags that specify which tensors to optimize, keep constant, or share parameters among.
Optionally define a
norm_fn
that takes the target and projects or constrains it to some valid form, e.g. normalizing or mapping into unitary form. This is the identity by default.Define a loss function that takes
tn
(ornorm_fn(tn)
) and returns a single real scalar to minimize.
Here we’ll demonstrate this with a PBC MPS optimization for the Heisenberg model.
%config InlineBackend.figure_formats = ['svg']
import quimb as qu
import quimb.tensor as qtn
L = 64
D = 16
pbc = True
# create a random MPS as our initial target to optimize
psi = qtn.MPS_rand_state(L, bond_dim=D, cyclic=pbc)
# create the hamiltonian MPO, this is a constant TN not to be optimized
ham = qtn.MPO_ham_heis(L, cyclic=pbc)
Next we define our norm_fn
, which here just normalizes the MPS, and our loss_fn
, which computes the energy of the Heisenberg model by exactly contracting an MPS-MPO-MPS overlap.
def norm_fn(psi):
# we could always define this within the loss function, but separating it
# out can be clearer - it's also called before returning the optimized TN
nfact = (psi.H @ psi)**0.5
return psi.multiply(1 / nfact, spread_over='all')
def loss_fn(psi, ham):
b, h, k = qtn.tensor_network_align(psi.H, ham, psi)
energy_tn = b | h | k
return energy_tn ^ ...
We can check the initial loss value with:
loss_fn(norm_fn(psi), ham)
-0.16003514944501307
Next we supply these to a TNOptimizer
object. Since we have an extra tensor object ham
that is needed to compute the loss, but should not be optimized, we pass it in loss_constants
, that allows it to be converted to the correct backend etc.
tnopt = qtn.TNOptimizer(
# the tensor network we want to optimize
psi,
# the functions specfying the loss and normalization
loss_fn=loss_fn,
norm_fn=norm_fn,
# we specify constants so that the arguments can be converted
# to the desired autodiff backend automatically
loss_constants={"ham": ham},
# the underlying algorithm to use for the optimization
# 'l-bfgs-b' is the default and often good for fast initial progress
optimizer="adam",
# which gradient computation backend to use
autodiff_backend="jax",
)
tnopt
<TNOptimizer(d=32768, backend=jax)>
Hint
You can also pass general non-numeric or tensor options in loss_kwargs
.
We can see there are 32,768 parameters to optimize, which would be tricky without gradients. We are ready to start optimizing (note for backens like jax
which compile the computation by default, there will be some initial overhead):
psi_opt = tnopt.optimize(1000)
-28.295518875122 [best: -28.295518875122] : : 1001it [01:03, 15.81it/s]
There is a simple tnopt.plot
method to visualize the loss progress (note by default the first 20 points are shown on a linear plot, the rest on a log plot):
tnopt.plot(hlines={'analytic': qu.heisenberg_energy(L)})
(<Figure size 640x480 with 1 Axes>, <Axes: xlabel='Iteration', ylabel='Loss'>)
We can check the returned psi_opt
optimized target indeed matches loss:
loss_fn(psi_opt, ham)
-28.295518668947235
Note this TN (which can be retrieved from tnopt.get_tn_opt
) is a copy of the original target TN, with the optimized parameters set. It has also been passed through norm_fn
so is in normalized/projected form, and converted back to numpy
backed arrays.
4.2. Optimizing Circuit
objects¶
One special case of optimizing pytrees, is when Circuit
objects are encountered. These contain a tensor network representation of the quantum circuit, but only the gates/tensors which have been specified as parametrized
are optimized.
import autoray as ar
rng = ar.do('random.default_rng', 42, like="numpy")
circ = qtn.Circuit(2)
circ.u3(*rng.uniform(size=3, high=2 * qu.pi), 0, parametrize=True)
circ.u3(*rng.uniform(size=3, high=2 * qu.pi), 1, parametrize=True)
circ.cnot(0, 1)
circ.u3(*rng.uniform(size=3, high=2 * qu.pi), 0, parametrize=True)
circ.u3(*rng.uniform(size=3, high=2 * qu.pi), 1, parametrize=True)
H = qu.ham_heis(2).astype("complex128")
def loss(circ, H):
en = circ.local_expectation(H, (0, 1), simplify_sequence="ADCRS")
# we use `autoray.do` to allow arbitrary autodiff backends
return ar.do("real", en)
tnopt = qtn.TNOptimizer(
circ,
loss,
loss_constants=dict(H=H),
# because we are using dynamic (entry dependent) simplification
autodiff_backend="autograd",
)
circ_opt = tnopt.optimize(10)
-0.749999999208 [best: -0.749999999208] : : 13it [00:00, 39.17it/s]
The returned circuit now has the optimized parameters set.
circ_opt.gates
(<Gate(label=U3, params=[4.71234134 2.55777369 5.39472984], qubits=(0,), parametrize=True))>,
<Gate(label=U3, params=[3.14157063 0.52032894 6.13001603], qubits=(1,), parametrize=True))>,
<Gate(label=CNOT, params=[], qubits=(0, 1))>,
<Gate(label=U3, params=[4.20005643 5.20517656 0.60518082], qubits=(0,), parametrize=True))>,
<Gate(label=U3, params=[2.08314126 2.06360383 6.30453863], qubits=(1,), parametrize=True))>)
loss(circ_opt, H)
-0.7499999992075422
But the initial state and constant gates have not been changed, and becuase the parametrized tensors are manifestly unitary, it is always normalized.
Note
Note also, that because we used simplify_sequence="ADCRS"
, which performs TN simplifications that depend on the tensor entries, we cannot use a statically compiled autodiff backend like jax
. Instead we use an ‘eager’ library autograd
here. Another option would be to instead use simplify_sequence="R"
.
If you want fine control over which gates to optimize and share parameters among in the circuit, it is best to extract the TN representation first, and optimize it directly. You can then call circ.update_params_from(tn)
to update the circuit parameters from an optimized tensor network.
4.3. Optimizing PTensor
objects¶
The circuit object paramterized gates behind the scenes use a ‘paramterized’ tensor object, PTensor
, which holds a PArray
. This is a generalization of Tensor
whose data is defined by a function and some parameters (kind of like a local norm_fn
that is always applied). You can use these directly for even finer control.
Here we show a very roundabout way of trying to diagonalize a non-symmetric matrix using two orthogonal matrices U
and V
.
# each parametrized tensor has implicit `data = fn(params)`
Ua = qtn.PTensor(
# project into isometric / unitary / isometric form
fn=qtn.decomp.isometrize_cayley,
# parameters here are some arbitrary array
params=qu.identity(10, dtype="float64"),
inds=["w", "x"],
tags="Ua",
)
G = qtn.Tensor(
data=qu.randn((10, 10)),
inds=['x', 'y'],
tags="G",
)
Ub = qtn.PTensor(
fn=qtn.decomp.isometrize_cayley,
params=qu.identity(10, dtype="float64"),
inds=["y", "z"],
tags="Ub",
)
(Ua | G | Ub).draw(["Ua", "G", "Ub"])
(Ua | G | Ub).contract().visualize()
(<Figure size 500x500 with 1 Axes>, <Axes: >)
def loss(target, G):
Ua, Ub = target["Ua"], target["Ub"]
UGU = (Ua | G | Ub).contract(output_inds=['w', 'z']).data
# minimize off-diagonal elements of UGU
return ar.do("linalg.norm", UGU - ar.do('diag', ar.do('diag', UGU)))
We’ll also here make use of automatic hessian-vector product computation by some backends (e.g. jax
), which is can be used with second order optimization methods like Newton-CG
.
tnopt = qtn.TNOptimizer(
{"Ua": Ua, "Ub": Ub},
loss,
loss_constants=dict(G=G),
optimizer="newton-cg",
autodiff_backend="jax",
)
to = tnopt.optimize(1000, hessp=True)
+0.000001728355 [best: +0.000001728355] : 60%|██████ | 601/1000 [00:01<00:01, 307.78it/s]
tnopt.plot()
(<Figure size 640x480 with 1 Axes>, <Axes: xlabel='Iteration', ylabel='Loss'>)
Uao, Ubo = to["Ua"], to["Ub"]
Uao
PTensor(shape=(10, 10), inds=[w, x], tags={Ua}),
backend=numpy, dtype=None, data=array([[ 0.32642075, -0.6051291 , 0.2978906 , 0.39854315, 0.02511937, 0.0838748 , -0.29238603, 0.0739949 , -0.21712632, -0.36594826], [ 0.0908875 , 0.33829898, 0.42404962, -0.4448276 , 0.18260145, 0.42017654, 0.08901135, 0.26795745, -0.29496846, -0.35068703], [-0.19292217, -0.27753505, 0.0356112 , 0.15946661, 0.47263956, 0.2622383 , 0.54696834, -0.28224143, -0.32863796, 0.282954 ], [ 0.07516661, 0.08561171, -0.151499 , 0.3938459 , -0.45643058, 0.49859524, 0.24238792, 0.48964697, -0.01888227, 0.23057617], [ 0.09385245, -0.0597567 , -0.7957913 , -0.08307468, 0.39757654, 0.09958041, -0.1210546 , 0.2673096 , -0.14329785, -0.26983204], [-0.34093434, -0.35484943, -0.00559698, -0.38959286, -0.08984588, 0.3742761 , -0.5319735 , 0.02843514, -0.0771872 , 0.41001788], [-0.35343474, -0.46307853, -0.07277285, -0.31267455, -0.3303163 , -0.0156951 , 0.4368167 , 0.0749092 , 0.23010063, -0.44593632], [ 0.18945608, -0.24993688, 0.17366748, -0.21469343, 0.22132577, -0.4433413 , 0.16982655, 0.63924885, 0.03606365, 0.3755888 ], [ 0.46199533, -0.12276153, 0.02312034, -0.13623571, 0.23001385, 0.3858519 , 0.06653047, -0.15374139, 0.71851975, 0.07920738], [ 0.5856843 , -0.11639551, -0.19472294, -0.37635726, -0.39644542, -0.05807046, 0.16411862, -0.30522007, -0.40060037, 0.15082498]], dtype=float32)# still orthogonal
Uao.norm()**2, Ubo.norm()**2
(10.000000000000002, 10.000000000000002)
# check gauged G
(Uao | G | Ubo).contract().visualize()
(<Figure size 500x500 with 1 Axes>, <Axes: >)