Friday Nov 24, 2017 01:12
Mind your type #2
In the previous part, I was talk about how one shall be careful at the size of integer types. There are more considerations to take into account, in particular integral promotion. I have been been surprised to see that the return type of arithmetic operations are not the type of type of the two operands. For example, in the following code
unsigned char b;
auto c = a + b;
cis not an
unsigned char. This is surprising because if both operands have the same type, then no further conversion is needed. However, and Andrey Chubukov, whom I had as a teacher during a workshop, was telling us that All great stories begin with However. Notice that it is not bijective. However,
"Arithmetic operators do not accept types smaller than
int as arguments, and integral promotions are automatically applied after lvalue-to-rvalue conversion, if applicable."
A concluding word
makes the use of
auto more complicated because for types smaller than
int the prototype is not
Sparse symmetric matrix
Apart from the diagonal terms, only few off-diagonal terms are non zero
Let us consider that we need maximum machine precision,
then, all elements are
The size of the matrix of is then 22×Ns×
sizeof(long double), with Ns the number of sites.
There are 3 sites on a triangle, and
long double is 16Bytes, so, this matrix is 1kB (1024Bytes).
With a sparse matrix, one need only to store non-zero elements and their position.
There are 8 diagonal terms, plus 6 on the upper triangle, and 6 on the lower triangle, so 20 non-zero elements.
We know that eventually, we will deal with numbers up to 272 for the index in the matrix, so the index will be store on a 16Bytes integer type, called
index in the following, that we will define later.
The size in memory is then 20×(
sizeof(long double)+sizeof(index)) = 640Bytes.
So we saved memory by a factor 1.6.
Notice that in this case, the indices can be stored in
unsigned char and the values in
signed char, which are both 1Bytes, thus this matrix can be stored in 40Bytes.
One can also notice that many elements share the same value,
it can be useful to safe more memory by storing only different value.
In addition, many values does not need to be stored in
long double, but in smaller floating-point types to save even more memory.
To achieve this, we can rewrite the Hamiltonian to favor
signed char and then, apply a function to restore the actual physical values.
Sparse matrices take advance of the size. Indeed, the ratio between the memory consumption of the dense matrix and the sparse matrix increases with the size. In other words, sparse matrix will make you save relatively more space while the size increases, e.g. the ration is about 2.3 for Ns = 4.
The Hamiltonian is an Hermitian operator.
Thus, one need only the diagonal and either the upper or lower triangle of the matrix to have a complete knowledge of the matrix.
sizeof(long double)+sizeof(index)) = 448Bytes, or 28Bytes will all optimizations, so about 2.3 times less memory.
The next step, is to use symmetries to block-diagonalize the Hamiltonian.
It is introduced in Section 4.4.1 Symmetries of my PhD thesis.
The short version is that one can diagonalize independently each block, and so save more memory because one works only in a subspace of the Hilbert space.
There are many algorithm to solve eigen-problems. But most of them need to know the matrix. So we want to focus on algorithm where the matrix can be implicit.
The power iteration is one of them.
It is a very simple algorithm to implement
and probably the one using the less memory because one need only to store one vector.
Assuming elements are
double, and applying the up/down symmetry,
it takes 256GB of memory.
Applying any other symmetry, e.g. translations or rotation C2,
will make it fit.
The problem is that it converge to the greatest (in absolute value) eigenvalue.
We are looking for the ground state, which is the smallest eigenvalue,
but there are always a degenerate state of higher absolute eigenvalue.
Locally Optimal Block Preconditioned Conjugate Gradient
Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) is a pretty good candidate. One can directly compute the smallest eigenvalue. The first step is to minimize the Rayleigh quotient, which can be done by only storing two floating-point variables, so at most 4Bytes. Then, one applies a simple iterative method to find the eigenvector. I did not try yet this method, but it looks quite promising. My main concern, and I guess that is why it is not used for very large matrix, is that one need to minimize the eigenvalue over 2Ns variables, in my case, 68,719,476,736 independent variables. So it will definitely takes a very long time to converge, but I will give it a try eventually.
Lanczos algorithm is wildly use in condensed matter physics to find the ground state of a system. It is a very fast algorithm. But one may need to store more informations. I said may because now that I am having a fresh look at the algorithm, it may be possible to store only one vector. I have to think about it.
In any case,
a common way, because very easy, to write the routine is
- Build the reduced sparse matrix (sparse matrix + Hermitian + symmetries)
- Send the matrix to the routine
- In the routine, at each iteration, apply the Lanczos vector to the matrix
Lanczos algorithm begins by initializing a random vector.
Broken random numbers generators on Apple productsTwo weeks ago, I wrote about bad practices in random number generation, where the actual good practices are inside the code sources and slightly documented. I wrote
"After 107374182350 iterations, the probabilities are still 0. I launched it on my laptop, a MacBook Pro from my lab (without Mac OS, but Ubuntu instead), but it does not have a hardware random generator. Maybe the issue comes from here. Latter, I will try on my server which have one."
On a homemade computer, which has a true random generator, using stochastic processes, I obtain good results.
It proves that
rand()%rangeis not uniform, shows that random generators of C++11 standard are indeed uniform, and demonstrates that MaxBook Pro have a broken random number generator.
Because the range of random numbers do not span the whole range of numbers that they are suppose to, the initial vector is de facto constrain to a subset of the Hilbert space. In some cases, maybe most of them, this effect will cancel by the iterative method, but it can also be too restive and stuck the iterative process. In particular, it will be problematic on very small systems.