Dr Clèm's Blog

Tags: Security Coding C++ Optimization C++11 Physics

How to diagonalize a matrix of 4,722,366,482,869,645,213,696 elements? Part 2

Friday Nov 24, 2017 01:12
Mind your type #2

In the previous part, I was talk about how one shall be careful at the size of integer types. There are more considerations to take into account, in particular integral promotion. I have been been surprised to see that the return type of arithmetic operations are not the type of type of the two operands. For example, in the following code

unsigned char a;
unsigned char b;
auto c = a + b;
c is not an unsigned char. This is surprising because if both operands have the same type, then no further conversion is needed. However, and Andrey Chubukov, whom I had as a teacher during a workshop, was telling us that All great stories begin with However. Notice that it is not bijective. However,

"Arithmetic operators do not accept types smaller than int as arguments, and integral promotions are automatically applied after lvalue-to-rvalue conversion, if applicable."


And indeed, one can check that c is an int.

A concluding word

Implicit conversions makes the use of auto more complicated because for types smaller than int the prototype is not

T T::operator+(const T2 & b) const;
int T::operator+(const int & b) const;
In our case, it can be problematic because memory consumption is the limiting factor.

Sparse symmetric matrix

Let us consider the antiferromagnetic Heisenberg model on one triangle at zero-field. The matrix representation of the Hamiltonian is

Antiferromagnetic Heisenberg model on one triangle

Sparse matrix

Apart from the diagonal terms, only few off-diagonal terms are non zero

Antiferromagnetic Heisenberg model on one triangle
It is a sparse matrix. We can use this property to save memory by storing only finite elements.

Let us consider that we need maximum machine precision, then, all elements are long double. The size of the matrix of is then 22×Ns×sizeof(long double), with Ns the number of sites. There are 3 sites on a triangle, and long double is 16Bytes, so, this matrix is 1kB (1024Bytes).

With a sparse matrix, one need only to store non-zero elements and their position. There are 8 diagonal terms, plus 6 on the upper triangle, and 6 on the lower triangle, so 20 non-zero elements. We know that eventually, we will deal with numbers up to 272 for the index in the matrix, so the index will be store on a 16Bytes integer type, called index in the following, that we will define later.
The size in memory is then 20×(sizeof(long double)+sizeof(index)) = 640Bytes. So we saved memory by a factor 1.6.

Notice that in this case, the indices can be stored in unsigned char and the values in signed char, which are both 1Bytes, thus this matrix can be stored in 40Bytes.

One can also notice that many elements share the same value, it can be useful to safe more memory by storing only different value. In addition, many values does not need to be stored in long double, but in smaller floating-point types to save even more memory. To achieve this, we can rewrite the Hamiltonian to favor signed char and then, apply a function to restore the actual physical values.

Sparse matrices take advance of the size. Indeed, the ratio between the memory consumption of the dense matrix and the sparse matrix increases with the size. In other words, sparse matrix will make you save relatively more space while the size increases, e.g. the ration is about 2.3 for Ns = 4.

Hermitian matrix

The Hamiltonian is an Hermitian operator. Thus, one need only the diagonal and either the upper or lower triangle of the matrix to have a complete knowledge of the matrix.

Antiferromagnetic Heisenberg model on one triangle
Now, one computes only these elements
Antiferromagnetic Heisenberg model on one triangle
and stores the non-zero ones, leading to a memory consumption of (8+6)×(sizeof(long double)+sizeof(index)) = 448Bytes, or 28Bytes will all optimizations, so about 2.3 times less memory.


The next step, is to use symmetries to block-diagonalize the Hamiltonian. It is introduced in Section 4.4.1 Symmetries of my PhD thesis. The short version is that one can diagonalize independently each block, and so save more memory because one works only in a subspace of the Hilbert space.

Antiferromagnetic Heisenberg model on one triangle


There are many algorithm to solve eigen-problems. But most of them need to know the matrix. So we want to focus on algorithm where the matrix can be implicit.

Power iteration

The power iteration is one of them. It is a very simple algorithm to implement and probably the one using the less memory because one need only to store one vector. Assuming elements are double, and applying the up/down symmetry, it takes 256GB of memory. Applying any other symmetry, e.g. translations or rotation C2, will make it fit. The problem is that it converge to the greatest (in absolute value) eigenvalue. We are looking for the ground state, which is the smallest eigenvalue, but there are always a degenerate state of higher absolute eigenvalue.

Locally Optimal Block Preconditioned Conjugate Gradient

Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) is a pretty good candidate. One can directly compute the smallest eigenvalue. The first step is to minimize the Rayleigh quotient, which can be done by only storing two floating-point variables, so at most 4Bytes. Then, one applies a simple iterative method to find the eigenvector. I did not try yet this method, but it looks quite promising. My main concern, and I guess that is why it is not used for very large matrix, is that one need to minimize the eigenvalue over 2Ns variables, in my case, 68,719,476,736 independent variables. So it will definitely takes a very long time to converge, but I will give it a try eventually.

Lanczos algorithm

Lanczos algorithm is wildly use in condensed matter physics to find the ground state of a system. It is a very fast algorithm. But one may need to store more informations. I said may because now that I am having a fresh look at the algorithm, it may be possible to store only one vector. I have to think about it.

In any case, a common way, because very easy, to write the routine is

  1. Build the reduced sparse matrix (sparse matrix + Hermitian + symmetries)
  2. Send the matrix to the routine
  3. In the routine, at each iteration, apply the Lanczos vector to the matrix
This is what I did during my PhD thesis. Now, I want to use the fact that one can build the vector v with just a function such that v ↦ H v, where H is the Hamiltonian. In this case, one do not need to store the Hamiltonian anymore. I did not finish yet to rewrite my code, so I will come back to this algorithm later.

Random numbers

Lanczos algorithm begins by initializing a random vector.

Broken random numbers generators on Apple products
Two weeks ago, I wrote about bad practices in random number generation, where the actual good practices are inside the code sources and slightly documented. I wrote

"After 107374182350 iterations, the probabilities are still 0. I launched it on my laptop, a MacBook Pro from my lab (without Mac OS, but Ubuntu instead), but it does not have a hardware random generator. Maybe the issue comes from here. Latter, I will try on my server which have one."

rand()%range, or the wrong way to generate random numbers

It turned out that it always gave 0. The reason behind is beyond the scope of this post but I discover after some web research that it is a well-known issue that Apple products have a zero-probably for some numbers, e.g. the range [0, 7], on their implementation of what is suppose to be a uniform distribution. So, never use an Apple product to perform scientific computation involving rand()!

On a homemade computer, which has a true random generator, using stochastic processes, I obtain good results.

As you can see, I did not wait for the true random generator part to end because even in an environment with a fair level of entropy, it took couple of days for these around 161061273600 iterations because the software pends all its time waiting for the hardware to accumulate enough entropy. In any case, the trend is already clear.
It proves that rand()%range is not uniform, shows that random generators of C++11 standard are indeed uniform, and demonstrates that MaxBook Pro have a broken random number generator.

Lanczos vector

Because the range of random numbers do not span the whole range of numbers that they are suppose to, the initial vector is de facto constrain to a subset of the Hilbert space. In some cases, maybe most of them, this effect will cancel by the iterative method, but it can also be too restive and stuck the iterative process. In particular, it will be problematic on very small systems.

Mastodon Follow me Mastodon Share

Re: promotion to a larger type
Posted by Shafik Yaghmour on Friday Nov 24, 2017 04:40.
Great article, thank you for sharing!

Note, the rationale for promotions to larger types is mainly about performance, it is covered in the C99 rationale and the most relevant quote is:

"Explicit license was added to perform calculations in a “wider” type than absolutely necessary, since this can sometimes produce smaller and faster code, not to mention the correct answer more often"

This is definitely surprising for most and the cases that involve mixed sign operations can cause some very puzzling results if you don't know the rules well.

See my Stackoverflow answer here for more details: https://stackoverflow.com/a/24372323/1708801

Note, rand() should be considered broken everywhere as this CERT posting explains: https://wiki.sei.cmu.edu/confluence/display/c/MSC30-C.+Do+not+use+the+rand()+function+for+generating+pseudorandom+numbers
Posted by Clément Février on Friday Nov 24, 2017 19:28.
Thank you, Shafik Yaghmour. This is interesting that promotion to a wider type can cause smaller and faster code. I guess that it is linked to the size of processor registry.
For the rand(), I will use the information in post for the codes that I share with people not using C++11. In C++11, there is the <random> header and it works fine
Post a comment

* required field.

Your comment


About you



Dr Clément Février

Bonjour, Je suis Clément Février, docteur en physique théorique de l’université de Grenoble Alpes, ingénieur Recherche et Développement dans le domaine de l’imagerie médicale et de la chirurgie mini-invasive chez Surgivisio et soutien du mouvement La France Insoumise.

Non, mais je rêve. La faille de sécurité Windows CVE-2021-40444 ne touche que le moteur de rendu d'Internet Explorer, pas celui de Edge. Donc, ça semblait cool sauf pour les très vieux Windows ? Sauf que Office utilise toujours le moteur de rendu d'Internet Explorer (oui, il y a un navigateur Web dans le tableur et dans le traitement de texte) et est donc vulnérable, et cette faille est exploitée (on ouvre le document vérolé et paf).