# How to diagonalize a matrix of 4,722,366,482,869,645,213,696 elements? Part 2

###### Friday Nov 24, 2017 01:12

##### Mind your type #2

In the previous part, I was talk about how one shall be careful at the size of integer types. There are more considerations to take into account, in particular integral promotion. I have been been surprised to see that the return type of arithmetic operations are not the type of type of the two operands. For example, in the following code

```
unsigned char a;
```

unsigned char b;

auto c = a + b;

`c`

is not an `unsigned char`

.
This is surprising because if both operands have the same type, then no further conversion is needed.
**However**, and Andrey Chubukov, whom I had as a teacher during a workshop, was telling us that

*All great stories begin with*. Notice that it is not bijective. However, And indeed, one can check that

*However*`c`

is an `int`

.
###### A concluding word

Implicit conversions
makes the use of `auto`

more complicated because for types smaller than `int`

the prototype is not

```
T T::operator+(const T2 & b) const;
```

but
```
int T::operator+(const int & b) const;
```

In our case, it can be problematic because memory consumption is the limiting factor.
##### Sparse symmetric matrix

Let us consider the antiferromagnetic Heisenberg model on one triangle at zero-field.
The matrix representation of the Hamiltonian is

###### Sparse matrix

Apart from the diagonal terms, only few off-diagonal terms are non zero

Let us consider that we need maximum machine precision,
then, all elements are `long double`

.
The size of the matrix of is then 2^{2×Ns}×`sizeof(long double)`

, with Ns the number of sites.
There are 3 sites on a triangle, and `long double`

is 16Bytes, so, this matrix is 1kB (1024Bytes).

With a sparse matrix, one need only to store non-zero elements and their position.
There are 8 diagonal terms, plus 6 on the upper triangle, and 6 on the lower triangle, so 20 non-zero elements.
We know that eventually, we will deal with numbers up to 2^{72} for the index in the matrix, so the index will be store on a 16Bytes integer type, called `index`

in the following, that we will define later.

The size in memory is then 20×(`sizeof(long double)+sizeof(index)`

) = 640Bytes.
So we saved memory by a factor 1.6.

Notice that in this case, the indices can be stored in `unsigned char`

and the values in `signed char`

, which are both 1Bytes, thus this matrix can be stored in 40Bytes.

One can also notice that many elements share the same value,
it can be useful to safe more memory by storing only different value.
In addition, many values does not need to be stored in `long double`

, but in smaller floating-point types to save even more memory.
To achieve this, we can rewrite the Hamiltonian to favor `signed char`

and then, apply a function to restore the actual physical values.

Sparse matrices take advance of the size. Indeed, the ratio between the memory consumption of the dense matrix and the sparse matrix increases with the size. In other words, sparse matrix will make you save relatively more space while the size increases, e.g. the ration is about 2.3 for Ns = 4.

###### Hermitian matrix

The Hamiltonian is an Hermitian operator.
Thus, one need only the diagonal and either the upper or lower triangle of the matrix to have a complete knowledge of the matrix.

`sizeof(long double)+sizeof(index)`

) = 448Bytes, or 28Bytes will all optimizations, so about 2.3 times less memory.
##### Symmetries

The next step, is to use symmetries to block-diagonalize the Hamiltonian.
It is introduced in Section 4.4.1 Symmetries of my PhD thesis.
The short version is that one can diagonalize independently each block, and so save more memory because one works only in a subspace of the Hilbert space.

##### Algorithm

There are many algorithm to solve eigen-problems. But most of them need to know the matrix. So we want to focus on algorithm where the matrix can be implicit.

###### Power iteration

The power iteration is one of them.
It is a very simple algorithm to implement
and probably the one using the less memory because one need only to store one vector.
Assuming elements are `double`

, and applying the up/down symmetry,
it takes 256GB of memory.
Applying any other symmetry, e.g. translations or rotation C_{2},
will make it fit.
The problem is that it converge to the greatest (in absolute value) eigenvalue.
We are looking for the ground state, which is the smallest eigenvalue,
but there are always a degenerate state of higher absolute eigenvalue.

###### Locally Optimal Block Preconditioned Conjugate Gradient

Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG)
is a pretty good candidate.
One can directly compute the smallest eigenvalue.
The first step is to minimize the Rayleigh quotient,
which can be done by only storing two floating-point variables, so at most 4Bytes.
Then, one applies a simple iterative method to find the eigenvector.
I did not try yet this method, but it looks quite promising.
My main concern, and I guess that is why it is not used for very large matrix,
is that one need to minimize the eigenvalue over 2^{Ns} variables,
in my case, 68,719,476,736 independent variables.
So it will definitely takes a very long time to converge,
but I will give it a try eventually.

###### Lanczos algorithm

Lanczos algorithm is wildly use in condensed matter physics to find the ground state of a system.
It is a very fast algorithm.
But one may need to store more informations.
I said *may* because now that I am having a fresh look at the algorithm,
it may be possible to store only one vector.
I have to think about it.

In any case,
a common way, because very easy, to write the routine is

- Build the reduced sparse matrix (sparse matrix + Hermitian + symmetries)
- Send the matrix to the routine
- In the routine, at each iteration, apply the Lanczos vector to the matrix

##### Random numbers

Lanczos algorithm begins by initializing a random vector.

###### Broken random numbers generators on Apple products

Two weeks ago, I wrote about bad practices in random number generation, where the actual good practices are inside the code sources and slightly documented. I wrote It turned out that it always gave 0. The reason behind is beyond the scope of this post but I discover after some web research that it is a well-known issue that Apple products have a zero-probably for some numbers, e.g. the range [0, 7], on their implementation of what is suppose to be a uniform distribution. So,**never use an Apple product to perform scientific computation involving**

`rand()`

!
On a homemade computer, which has a true random generator, using stochastic processes, I obtain good results.

**It proves that**

`rand()%range`

is not uniform, shows that random generators of C++11 standard are indeed uniform, and demonstrates that MaxBook Pro have a broken random number generator.###### Lanczos vector

Because the range of random numbers do not span the whole range of numbers that they are suppose to,
the initial vector is *de facto* constrain to a subset of the Hilbert space.
In some cases, maybe most of them, this effect will cancel by the iterative method,
but it can also be too restive and stuck the iterative process.
In particular, it will be problematic on very small systems.