# How to diagonalize a matrix of 4,722,366,482,869,645,213,696 elements? Part 2

###### Friday Nov 24, 2017 01:12

In the previous part, I was talk about how one shall be careful at the size of integer types. There are more considerations to take into account, in particular integral promotion. I have been been surprised to see that the return type of arithmetic operations are not the type of type of the two operands. For example, in the following code

unsigned char a;
unsigned char b;
auto c = a + b;
`c` is not an `unsigned char`. This is surprising because if both operands have the same type, then no further conversion is needed. However, and Andrey Chubukov, whom I had as a teacher during a workshop, was telling us that All great stories begin with However. Notice that it is not bijective. However,

"Arithmetic operators do not accept types smaller than `int` as arguments, and integral promotions are automatically applied after lvalue-to-rvalue conversion, if applicable."

cppreference.com

And indeed, one can check that `c` is an `int`.

###### A concluding word

Implicit conversions makes the use of `auto` more complicated because for types smaller than `int` the prototype is not

T T::operator+(const T2 & b) const;
but
int T::operator+(const int & b) const;
In our case, it can be problematic because memory consumption is the limiting factor.

##### Sparse symmetric matrix

Let us consider the antiferromagnetic Heisenberg model on one triangle at zero-field. The matrix representation of the Hamiltonian is ###### Sparse matrix

Apart from the diagonal terms, only few off-diagonal terms are non zero It is a sparse matrix. We can use this property to save memory by storing only finite elements.

Let us consider that we need maximum machine precision, then, all elements are `long double`. The size of the matrix of is then 22×Ns×`sizeof(long double)`, with Ns the number of sites. There are 3 sites on a triangle, and `long double` is 16Bytes, so, this matrix is 1kB (1024Bytes).

With a sparse matrix, one need only to store non-zero elements and their position. There are 8 diagonal terms, plus 6 on the upper triangle, and 6 on the lower triangle, so 20 non-zero elements. We know that eventually, we will deal with numbers up to 272 for the index in the matrix, so the index will be store on a 16Bytes integer type, called `index` in the following, that we will define later.
The size in memory is then 20×(`sizeof(long double)+sizeof(index)`) = 640Bytes. So we saved memory by a factor 1.6.

Notice that in this case, the indices can be stored in `unsigned char` and the values in `signed char`, which are both 1Bytes, thus this matrix can be stored in 40Bytes.

One can also notice that many elements share the same value, it can be useful to safe more memory by storing only different value. In addition, many values does not need to be stored in `long double`, but in smaller floating-point types to save even more memory. To achieve this, we can rewrite the Hamiltonian to favor `signed char` and then, apply a function to restore the actual physical values.

Sparse matrices take advance of the size. Indeed, the ratio between the memory consumption of the dense matrix and the sparse matrix increases with the size. In other words, sparse matrix will make you save relatively more space while the size increases, e.g. the ration is about 2.3 for Ns = 4.

###### Hermitian matrix

The Hamiltonian is an Hermitian operator. Thus, one need only the diagonal and either the upper or lower triangle of the matrix to have a complete knowledge of the matrix. Now, one computes only these elements and stores the non-zero ones, leading to a memory consumption of (8+6)×(`sizeof(long double)+sizeof(index)`) = 448Bytes, or 28Bytes will all optimizations, so about 2.3 times less memory.

##### Symmetries

The next step, is to use symmetries to block-diagonalize the Hamiltonian. It is introduced in Section 4.4.1 Symmetries of my PhD thesis. The short version is that one can diagonalize independently each block, and so save more memory because one works only in a subspace of the Hilbert space. ##### Algorithm

There are many algorithm to solve eigen-problems. But most of them need to know the matrix. So we want to focus on algorithm where the matrix can be implicit.

###### Power iteration

The power iteration is one of them. It is a very simple algorithm to implement and probably the one using the less memory because one need only to store one vector. Assuming elements are `double`, and applying the up/down symmetry, it takes 256GB of memory. Applying any other symmetry, e.g. translations or rotation C2, will make it fit. The problem is that it converge to the greatest (in absolute value) eigenvalue. We are looking for the ground state, which is the smallest eigenvalue, but there are always a degenerate state of higher absolute eigenvalue.

###### Locally Optimal Block Preconditioned Conjugate Gradient

Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) is a pretty good candidate. One can directly compute the smallest eigenvalue. The first step is to minimize the Rayleigh quotient, which can be done by only storing two floating-point variables, so at most 4Bytes. Then, one applies a simple iterative method to find the eigenvector. I did not try yet this method, but it looks quite promising. My main concern, and I guess that is why it is not used for very large matrix, is that one need to minimize the eigenvalue over 2Ns variables, in my case, 68,719,476,736 independent variables. So it will definitely takes a very long time to converge, but I will give it a try eventually.

###### Lanczos algorithm

Lanczos algorithm is wildly use in condensed matter physics to find the ground state of a system. It is a very fast algorithm. But one may need to store more informations. I said may because now that I am having a fresh look at the algorithm, it may be possible to store only one vector. I have to think about it.

In any case, a common way, because very easy, to write the routine is

1. Build the reduced sparse matrix (sparse matrix + Hermitian + symmetries)
2. Send the matrix to the routine
3. In the routine, at each iteration, apply the Lanczos vector to the matrix
This is what I did during my PhD thesis. Now, I want to use the fact that one can build the vector v with just a function such that v ↦ H v, where H is the Hamiltonian. In this case, one do not need to store the Hamiltonian anymore. I did not finish yet to rewrite my code, so I will come back to this algorithm later.

##### Random numbers

Lanczos algorithm begins by initializing a random vector.

###### Broken random numbers generators on Apple products
Two weeks ago, I wrote about bad practices in random number generation, where the actual good practices are inside the code sources and slightly documented. I wrote

"After 107374182350 iterations, the probabilities are still 0. I launched it on my laptop, a MacBook Pro from my lab (without Mac OS, but Ubuntu instead), but it does not have a hardware random generator. Maybe the issue comes from here. Latter, I will try on my server which have one."

rand()%range, or the wrong way to generate random numbers

It turned out that it always gave 0. The reason behind is beyond the scope of this post but I discover after some web research that it is a well-known issue that Apple products have a zero-probably for some numbers, e.g. the range [0, 7], on their implementation of what is suppose to be a uniform distribution. So, never use an Apple product to perform scientific computation involving `rand()`!

On a homemade computer, which has a true random generator, using stochastic processes, I obtain good results.

As you can see, I did not wait for the true random generator part to end because even in an environment with a fair level of entropy, it took couple of days for these around 161061273600 iterations because the software pends all its time waiting for the hardware to accumulate enough entropy. In any case, the trend is already clear.
It proves that `rand()%range` is not uniform, shows that random generators of C++11 standard are indeed uniform, and demonstrates that MaxBook Pro have a broken random number generator.

###### Lanczos vector

Because the range of random numbers do not span the whole range of numbers that they are suppose to, the initial vector is de facto constrain to a subset of the Hilbert space. In some cases, maybe most of them, this effect will cancel by the iterative method, but it can also be too restive and stuck the iterative process. In particular, it will be problematic on very small systems.

# How to diagonalize a matrix of 4,722,366,482,869,645,213,696 elements? Part 1

###### Introduction

I am trying to diagonalize a very large matrix (4,722,366,482,869,645,213,696 elements) in C++ and it leads to some issues. I noticed that I solved most of the issues that I encounter by myself by just pretending to explain the problem to someone. So I decided that for now on, I will share these notes. I will not explain everything, just the technical difficulties. I will give references for the rest, especially my PhD thesis.

Today, I will present you one of the first technical issue that I faced. It is linked to the type of variable.

A simple example is the results of 2N. In my code, I need to compute 236. The most straight forward way to achieve is by using bitwise operations. In fact, 2N can be written `1 << N`, for `N∈ℕ*`. Indeed, the operator `<<` move to the right all the bits by the number on the right hand. Few examples:
`1 << 0`. 1 is not changed, so `1 << 0` = 12 = 110 ≠ 20.
`1 << 1` moved the 1 to the right, so `1 << 1` = 102 = 210 = 21
`1 << 2` moved the 1 two times to the right, so `1 << 2` = 1002 = 410 = 22.
Here the issue. For C++, `1` is a `int`, which means that it is at least 16 bits, but the actual number of bits is not specified and is implementation dependent. Let us assume that `sizeof(int)` is equal to 4. It is the number of bytes, so it means that `int` is 32 bits (`sizeof(int)`×8).
××××××××××××××××××××××××××××××××
Only the first 31 bits (`sizeof(int)`×8-1) represent a positive number, which means that one can only move a bit up to 30 times (`sizeof(int)`×8-1-1) to the right and still get a positive number.
`1 << 30` = 010000000000000000000000000000002 = 107374182410 = 230
`1 << 31` = 100000000000000000000000000000002 = -214748364810 ≠ 231
`1 << 32` = 000000000000000000000000000000002 = 010 ≠ 232

If you try to compile a code with the last case, you will have a warning

main.cpp: In function ‘`int` main()’:
main.cpp:10:17: warning: left shift count >= width of type [-Wshift-count-overflow]
`auto` val = 1<<32;
^

As you can see `auto` does not become `long long int` but will be `int`, i.e. the return type of `1 << 32`.

###### What can we do?

The case `1 << 31` is pretty easy. We just need to explicitly declare `1 << 31` to be `unsigned int`. In this case, the last bit is not for the sign, so `(unsigned int)1 << 31` = 2147483648 = 231

What about `1 << 32`? Can I write `(unsigned long long int)1 << 32`? According to C++ standard, `unsigned long long int` is at least 64 bits, so one can think that `(unsigned long long int)1 << 32` will do the job. Let us see what happens. It is first `1 << 32` that is computed, which is an int, so it returns `0`, which is cast into an unsigned long long int. The result is then 0, so, no, it is not a workaround.
Notice that in this case, the compiler will not give you a warning, which is a bug, I think.

The solution is to actually to recast `1` into a type that stores enough bits for the bitwise operation and then apply the operator. Let us do it step by step
`1` is an `int`, so its binary representation is
000000000000000000000000000000012
Now, we add enough bits with the cast `((unsigned long long int) 1)`. It has the following binary representation
00000000000000000000000000000000000000000000000000000000000000012
`((unsigned long long int) 1) << 32` moves the first 1 32 times
00000000000000000000000000000001000000000000000000000000000000002 = 429496729610 = 232. We now have a safe 2N for N∈[1, 63]. It is enough for 236, which was the first thing that I had difficulties with.

Notice the largest number in C++ is 264-1 = 18446744073709551615, which is smaller than the number of elements in my matrix, 4722366482869645213696 = 272. We will see that it will be another problem to tackle.

###### What's next?

The main problem that I am facing is the memory consumption. I am using Lanczos algorithm do solve the eigen-problem. Right now, I am rewriting my code to use a very nice property of this method, which is the fact that the matrix can be defined implicitly.

Edit: The source code that I used for the tests. To compile

% g++ -std=gnu++14 main.cpp

###### main.cpp
```     1 #include <string>   // std::string
2 #include <iostream> // ostream (std::ostream, std::endl)
3                     // std::cout
4 #include <typeinfo> // typeid
5 #include <bitset>   // std::bitset
6 #include <array>    // std::array
7 #include <memory>   // std::shared_ptr
8
9 // Reviewed
10 const std::string exec(const auto * cmd)
+   -   11 +-- 14 lines: {
11 {
|   12  std::array<char, 128> buffer;
|   13  std::string result;
|   14  std::shared_ptr< FILE > pipe(popen(cmd, "r"), pclose);
|   15  if(!pipe) throw std::runtime_error("popen() failed!");
|   16  while(!feof(pipe.get()))
|+  |-  17 +---  6 lines: {
17  {
||  18   if(fgets(buffer.data(), 128, pipe.get()) != nullptr)
||+ ||- 19 +----  3 lines: {
19   {
||| 20    result += buffer.data();
||| 21   }
||  22  }
|   23  return result;
|   24 }
25
26 // Reviewed
27 // No auto for std::ostream
28 constexpr void display_type(
29  const auto & name,
30  const auto & var,
31  std::ostream & s = std::cout
32 )
+   -   33 +--  8 lines: {
33 {
|   34  // No auto
|   35  std::string command = "c++filt -t ";
|   36  command += typeid(var).name();
|   37  s<<"Type of \""<<name<<"\""<<std::endl;
|   38  auto stype = exec(command.data());
|   39  s<<stype<<std::endl;
|   40 }
41
42 int main()
+   -   43 +-- 31 lines: {
43 {
|   44  std::cout<<"* 1<<32"<<std::endl;
|   45  auto val1 = 1<<32;
|   46  display_type("1<<32", val1);
|   47  std::cout<<"1<<32 in base 2";
|   48  std::<<std::endl;
|   49  unsigned char * binary1 = reinterpret_cast< decltype(binary1) >(&val1);
|   50  for(auto i = (signed) sizeof(val1) - 1; i >= 0; --i)
|+  |-  51 +---  3 lines: {
51  {
||  52   std::cout<<std::bitset< 8 >(binary1[i]);
||  53  }
|   54  std::cout<<std::endl;
|   55  std::cout<<"1<<32 in base 10";
|   56  std::cout<<std::endl<<val1<<std::endl;
|   57
|   58  std::cout<<std::endl;
|   59  std::cout<<"* ((unsigned long long int) 1)<<32"<<std::endl;
|   60  auto val2 = ((unsigned long long int) 1)<<32;
|   61  display_type("((unsigned long long int) 1)<<32", val2);
|   62  std::cout<<"((unsigned long long int) 1)<<32 in base 2";
|   63  std::cout<<std::endl;
|   64  unsigned char * binary2 = reinterpret_cast< decltype(binary2) >(&val2);
|   65  for(auto i = (signed) sizeof(val2) - 1; i >= 0; --i)
|+  |-  66 +---  3 lines: {
66  {
||  67   std::cout<<std::bitset< 8 >(binary2[i]);
||  68  }
|   69  std::cout<<std::endl;
|   70  std::cout<<"((unsigned long long int) 1)<<32 in base 10";
|   71  std::cout<<std::endl<<val2<<std::endl;
|   72  return 0;
|   73 }
```

# rand()%range, or the wrong way to generate random numbers

##### The mathematics

In C++, `rand()` returns a random number between `0` and `RAND_MAX`, included. It is a uniform distribution, meaning that the probability Pn to have n ∈ [0, RAND_MAX] is Pn = 1∕(RAND_MAX+1) = P. Now, a common practice is to generate random numbers using the modulo operator `%`. `rand()%range` returns a number x ∈ [0, range-1]. The distribution is not uniform and favors smaller numbers. Indeed, the probabilities become

 Px = ∑n=0RAND_MAX Pn δn%range, x Px = P × (floor(RAND_MAX∕range) + 1), for x ≤ RAND_MAX%range P × floor(RAND_MAX∕range), otherwise.
One can see that Px is not a constant and the desired behavior is usually a uniform distribution with a probability Pu = 1/range.

##### Unrealistic working example

Let us assume that `RAND_MAX = 2`. `rand()` can return `0`, `1` and `2`. the probability P to have n ∈ [0, 2] is P = 1∕(2+1) = 1∕3. If `range = 2`, `rand()%range` has 3 cases.

• `rand()` returns `0`, `0%2 = 0`.
• `rand()` returns `1`, `1%2 = 1`.
• `rand()` returns `2`, `2%2 = 0`.
Each with a probability P = 1∕3. So, Px=0 = Pn=0+Pn=2 = 2P = 2∕3 and Px=1 = Pn=1 = P = 1∕3, whereas a uniform probability would have been Pu = 1∕2. The ratio between the two probabilities is Px=0∕Px=1 = 2. It means that there is two times more chance to get 0 than 1. The relative error is ∣(Px=0-Px=1)∕Px=1∣ = 1.

##### Proof of concept

Here a small code that demonstrates the discussion above. It simulate the function `rand()` for `RAND_MAX = 3`.

###### main.cpp
```    1 #include <cstdio>   // NULL
2 #include <ctime>    // time
3 #include <cstdlib>  // srand
4 #include <random>   // std::random_device,
5                     // std::uniform_int_distribution
6                     // std::default_random_engine generator
7 #include <iostream> // std::cout, ostream (std::endl)
8
9 #include "functions.h"
10
11 int main()
+  -  12 +-- 72 lines: {
12 {
|  13  // Value of rand_max
|  14  constexpr unsigned long long int rand_max = 2;
|  15  // Set range to maximize the error.
|  16  constexpr unsigned long int range = rand_max;
|  17  // Natural scale is rand_max + 1
|  18  constexpr unsigned long long int scale = rand_max + 1;
|  19  // Number of itrations to accumulate statistic
|  20  // In the unit of the natural scale
|  21  constexpr unsigned long long int Niterations = (unsigned long long int)500000 * scale;
|  22  // Random seed
|  23  srand(time(NULL));
|  24  // True random generator
|  25  std::random_device rd;
|  26  // Entropy of the true random generator
|  27  const auto entropy = rd.entropy();
|  28  std::cout<<"Entropy: "<<entropy<<std::endl;
|  29  // If the entropy is 0, it means there is no true random generator available (e.g. Apple computers)
|  30  // so a pseudo random number generator will be sused instead
|  31  if(entropy > 0)
|+ |- 32 +---  3 lines: {
32  {
|| 33   std::cout<<"True random generator"<<std::endl;
|| 34  }
|  35  else
|+ |- 36 +---  3 lines: {
36  {
|| 37   std::cout<<"Pseudo random generator"<<std::endl;
|| 38  }
|  39  // Uniform distribution to simultate rand() with our custom RAND_MAX
|  40  std::uniform_int_distribution< unsigned long int > distribution1(0, rand_max);
|  41  // Uniform distribution in the range [0, range-1]
|  42  std::uniform_int_distribution< unsigned long int > distribution2(0, range - 1);
|  43  std::cout<<"RAND_MAX: "<<rand_max<<std::endl;
|  44  std::cout<<"range:    "<<range<<std::endl;
|  45  std::cout<<"The "<<(rand_max % range) + 1;
|  46  std::cout<<" first probabilities are greater than the ";
|  47  std::cout<<range - ((rand_max % range) + 1);
|  48  std::cout<<" other ones by a factor:"<<std::endl;
|  49  std::cout<<"Predicted p(0)/p(range-1):                               ";
|  50  std::cout<<(((rand_max + 1) / range) + 1) / (float)(rand_max / range);
|  51  std::cout<<std::endl;
|  52  std::cout<<"Predicted relative error (p(0) - p(range-1))/p(range-1): ";
|  53  std::cout<<(((scale / range) + 1) - (scale / range)) / (float)(scale / range);
|  54  std::cout<<std::endl;
|  55
|  56  std::cout<<" * Making statistic *"<<std::endl;
|  57
|  58  std::cout<<" * rand()%range *"<<std::endl;
|  59  constexpr char nameMod [] = "results/mod3.dat";
|  60  if(entropy > 0)
|+ |- 61 +---  3 lines: {
61  {
|| 62   statistic1(rand_max, range, Niterations, nameMod, rd, distribution1, scale);
|| 63  }
|  64  else
|+ |- 65 +---  4 lines: {
65  {
|| 66   std::default_random_engine generator;
|| 67   statistic1(rand_max, range, Niterations, nameMod, generator, distribution1, scale);
|| 68  }
|  69
|  70  std::cout<<" * uniform *"<<std::endl;
|  71  constexpr char nameUniform [] = "results/uniform3.dat";
|  72  if(entropy > 0)
|+ |- 73 +---  3 lines: {
73  {
|| 74   statistic2(rand_max, range, Niterations, nameUniform, rd, distribution2, scale);
|| 75  }
|  76  else
|+ |- 77 +---  4 lines: {
77  {
|| 78   std::default_random_engine generator;
|| 79   statistic2(rand_max, range, Niterations, nameUniform, generator, distribution2, scale);
|| 80  }
|  81
|  82  return 0;
|  83 }
```
###### functions.h
```       1 #ifndef _FUNCTIONS_H_
2 #define _FUNCTIONS_H_
3
4 #include <fstream>  // std::fstream
5 #include <iostream> // std::cout, ostream (std::endl)
6 #include <cmath>    // abs
7 #include <cstdlib>  // rand
8
9 void read(const auto name, auto & start, auto p)
+    -     10 +-- 23 lines: {
10 {
|     11  for(unsigned char it = 0; it < 2; ++it)
|+   |-    12 +---  3 lines: {
12  {
||    13   p[it] = 0;
||    14  }
|     15
|     16  std::fstream file(name);
|     17  if(file.is_open())
|+   |-    18 +---  9 lines: {
18  {
||    19   while(file.good())
||+  ||-   20 +----  5 lines: {
20   {
|||   21    file>>start;
|||   22    file>>p;
|||   23    file>>p;
|||   24   }
||    25   file.clear();
||    26  }
|     27  file.close();
|     28
|     29  std::cout<<"Start at trial number "<<start + 1;
|     30  std::cout<<" with probabilities "<<p / (float)(start + 1);
|     31  std::cout<<" and "<<p / (float)(start + 1)<<std::endl;
|     32 }
33
34 void statistic1
35 (
36  const auto & rand_max,
37  const auto & range,
38  const auto & Niterations,
39  const auto name,
40  const auto & scale
41 )
+    -     42 +-- 36 lines: {
42 {
|     43  unsigned long long int * p = new unsigned long long int ;
|     44  unsigned long int index = 0;
|     45  unsigned long long int start = 0;
|     46
|     48  std::ofstream file(name, std::ofstream::app);
|     49  for(unsigned long long int it = start + 1; it < Niterations; ++it)
|+   |-    50 +--- 21 lines: {
50  {
||    51   index = rand() % range;
||    52   if(index == 0)
||+  ||-   53 +----  3 lines: {
53   {
|||   54    ++p;
|||   55   }
||    56   else
||+  ||-   57 +----  6 lines: {
57   {
|||   58    if(index == (range - 1))
|||+ |||-  59 +-----  3 lines: {
59    {
||||  60     ++p;
||||  61    }
|||   62   }
||    63   if(it % scale == rand_max)
||+  ||-   64 +----  6 lines: {
64   {
|||   65    if(p != 0 && p != 0)
|||+ |||-  66 +-----  3 lines: {
66    {
||||  67     file<<it<<" "<<p<<" "<<p<<std::endl;
||||  68    }
|||   69   }
||    70  }
|     71  file.close();
|     72
|     73  std::cout<<"Computed p(0)/p(range-1):                                ";
|     74  std::cout<<p/(long double)p<<std::endl;
|     75  std::cout<<"Computed relative error (p(0) - p(range-1))/p(range-1):  ";
|     76  std::cout<<std::abs((float)p - p)/(long double)p<<std::endl;
|     77 }
78
79 void statistic2
80 (
81  const auto & rand_max,
82  const auto & range,
83  const auto & Niterations,
84  const auto name,
85  auto & generator,
86  auto & distribution,
87  const auto & scale
88 )
+    -     89 +-- 36 lines: {
89 {
|     90  unsigned long long int * p = new unsigned long long int ;
|     91  unsigned long int index = 0;
|     92  unsigned long long int start = 0;
|     93
|     95  std::ofstream file(name, std::ofstream::app);
|     96  for(unsigned long long int it = start + 1; it < Niterations; ++it)
|+   |-    97 +--- 21 lines: {
97  {
||    98   index = distribution(generator);
||    99   if(index == 0)
||+  ||-  100 +----  3 lines: {
100   {
|||  101    ++p;
|||  102   }
||   103   else
||+  ||-  104 +----  6 lines: {
104   {
|||  105    if(index == (range - 1))
|||+ |||- 106 +-----  3 lines: {
106    {
|||| 107     ++p;
|||| 108    }
|||  109   }
||   110   if(it % scale == rand_max)
||+  ||-  111 +----  6 lines: {
111   {
|||  112    if(p != 0 && p != 0)
|||+ |||- 113 +-----  3 lines: {
113    {
|||| 114     file<<it<<" "<<p<<" "<<p<<std::endl;
|||| 115    }
|||  116   }
||   117  }
|    118  file.close();
|    119
|    120  std::cout<<"Computed p(0)/p(range-1):                                ";
|    121  std::cout<<p/(long double)p<<std::endl;
|    122  std::cout<<"Computed relative error (p(0) - p(range-1))/p(range-1):  ";
|    123  std::cout<<std::abs((float)p - p)/(long double)p<<std::endl;
|    124 }
125
126 void statistic1
127 (
128  const auto & rand_max,
129  const auto & range,
130  const auto & Niterations,
131  const auto name,
132  auto & generator,
133  auto & distribution,
134  const auto & scale
135 )
+    -    136 +-- 36 lines: {
136 {
|    137  unsigned long long int * p = new unsigned long long int ;
|    138  unsigned long int index = 0;
|    139  unsigned long long int start = 0;
|    140
|    142  std::ofstream file(name, std::ofstream::app);
|    143  for(unsigned long long int it = start + 1; it < Niterations; ++it)
|+   |-   144 +--- 21 lines: {
144  {
||   145   index = distribution(generator) % range;
||   146   if(index == 0)
||+  ||-  147 +----  3 lines: {
147   {
|||  148    ++p;
|||  149   }
||   150   else
||+  ||-  151 +----  6 lines: {
151   {
|||  152    if(index == (range - 1))
|||+ |||- 153 +-----  3 lines: {
153    {
|||| 154     ++p;
|||| 155    }
|||  156   }
||   157   if(it % scale == rand_max)
||+  ||-  158 +----  6 lines: {
158   {
|||  159    if(p != 0 && p != 0)
|||+ |||- 160 +-----  3 lines: {
160    {
|||| 161     file<<it<<" "<<p<<" "<<p<<std::endl;
|||| 162    }
|||  163   }
||   164  }
|    165  file.close();
|    166
|    167  std::cout<<"Computed p(0)/p(range-1):                                ";
|    168  std::cout<<p/(long double)p<<std::endl;
|    169  std::cout<<"Computed relative error (p(0) - p(range-1))/p(range-1):  ";
|    170  std::cout<<std::abs((float)p - p)/(long double)p<<std::endl;
|    171 }
172
173 #endif
```
You can download the files by clicking on their name. To compile,
% g++ -std=gnu++14 main.cpp
Of course, you can add more options, like `-Ofast` and other that will speed up the code. Create a folder `results` before executing the code
% mkdir results
and finally, execute it
% ./a.out
It should last less than a second. It generate two files in the folder `results`. Each contains 3 columns, the number of iterations, Px=0, and Px=1. `mod.dat` contains the results when using `rand()%RAND_MAX` and `uniform.dat` the results for a uniform distribution.

Let us now look at the results. I plotted the expected probabilities for a uniform distribution Pu, P and 2P has explained in the previous section. I also plotted the probabilities Pm using the modulo operator and Pu using a uniform distribution. I plotted them in the natural scale in this context, `(RAND_MAX+1)`.

After only few tens of iterations per `(RAND_MAX+1)`, the probabilities are already stables and converged to the probabilities computed above. To ensure the stability, let us draw for a large number of iterations.

After 1500000 iterations, there is no doubt about the stability. It clearly shows that using modulo operator favors smaller number up to 2 times the largest.

Of course, each set of data will look different, but they will converge to the same results. I ran this code many times with the same outcome.

Here the gnuplot files to reproduce the plots

###### results3_200.gnu
set term svg enhanced mouse
set style fill transparent
set key at 190,0.66
set title 'Convergence after 600 iterations'
set xlabel 'Number of iterations / (RAND\_MAX + 1)'
set ylabel 'Probabilities'
set output "r3_200.svg"
set xrange[0:200]
set yrange[0.3:0.7]
plot 'results/mod3.dat' u (\$1/3.):(\$2/\$1) t 'P_m(x=0)' w lp, \
'results/mod3.dat' u (\$1/3.):(\$3/\$1) t 'P_m(x=1)' w lp, \
'results/uniform3.dat' u (\$1/3.):(\$3/\$1) t 'P_u(x=0)' w lp, \
'results/uniform3.dat' u (\$2/3.):(\$3/\$1) t 'P_u(x=1)' w lp, \
1./3. lw 4 t 'P', \
2./3. lw 4 t '2P', \
1./2. lw 4 t 'P_u'
###### results3.gnu
set term svg enhanced mouse
set style fill transparent
set key at 499000,0.66
set title 'Convergence after 1500000 iterations'
set xlabel 'Number of iterations / (RAND\_MAX + 1)'
set ylabel 'Probabilities'
set output "svg/r3.svg"
set xrange[0:500000]
set yrange[0.3:0.7]
plot 'results/mod3.dat' u (\$1/3.):(\$2/\$1) t 'P_m(x=0)' w l, \
'results/mod3.dat' u (\$1/3.):(\$3/\$1) t 'P_m(x=1)' w l, \
'results/uniform3.dat' u (\$1/3.):(\$3/\$1) t 'P_u(x=0)' w l, \
'results/uniform3.dat' u (\$2/3.):(\$3/\$1) t 'P_u(x=1)' w l, \
1./3. lw 1 t 'P', \
2./3. lw 1 t '2P', \
1./2. lw 1 t 'P_u'

##### General proof of concept

I entitled unrealistic example above because `RAND_MAX` is most likely defined as (232∕2)-1, i.e. the greatest `int`. The worst case happens when `range = RAND_MAX`. Indeed, the probability to have `0`, Pm(x=0), is twice the any other probability Pm(x≠0). Using the same notation as above, Pm(x=0) = 2P and Pm(x≠0) = P.
Notice that P = Pu.

I chose to compare Pm(x=0) with Pm(x=RAND_MAX-1). In the code above, one can just change the variable `rand_max` by `RAND_MAX` on line 14, and call the proper function `statistic1` that uses the actual `rand()` and that is enough. However, it will take too long to terminate (few years). So one should replace `Niterations` by a smaller number, for example `(unsigned long long int)50 * scale = 107374182350`. It should take couple of hours to run and should be enough to see the trend.

Let me show you the new main

###### main.cpp
```    1 #include <cstdio>   // NULL
2 #include <ctime>    // time
3 #include <cstdlib>  // srand, RAND_MAX
4 #include <random>   // std::random_device,
5                     // std::uniform_int_distribution
6                     // std::default_random_engine generator
7 #include <iostream> // std::cout, ostream (std::endl)
8
9 #include "functions.h"
10
11 int main()
+  -  12 +-- 62 lines: {
12 {
|  13  // Value of rand_max
|  14  constexpr unsigned long long int rand_max = RAND_MAX;
|  15  // Set range to maximize the error.
|  16  constexpr unsigned long int range = rand_max;
|  17  // Natural scale is rand_max + 1
|  18  constexpr unsigned long long int scale = rand_max + 1;
|  19  // Number of itrations to accumulate statistic
|  20  // In the unit of the natural scale
|  21  constexpr unsigned long long int Niterations = (unsigned long long int)50 * scale;
|  22  // Random seed
|  23  srand(time(NULL));
|  24  // True random generator
|  25  std::random_device rd;
|  26  // Entropy of the true random generator
|  27  const auto entropy = rd.entropy();
|  28  std::cout<<"Entropy: "<<entropy<<std::endl;
|  29  // If the entropy is 0, it means there is no true random generator available (e.g. Apple computers)
|  30  // so a pseudo random number generator will be sused instead
|  31  if(entropy > 0)
|+ |- 32 +---  3 lines: {
32  {
|| 33   std::cout<<"True random generator"<<std::endl;
|| 34  }
|  35  else
|+ |- 36 +---  3 lines: {
36  {
|| 37   std::cout<<"Pseudo random generator"<<std::endl;
|| 38  }
|  39  // Uniform distribution in the range [0, range-1]
|  40  std::uniform_int_distribution< unsigned long int > distribution2(0, range - 1);
|  41  std::cout<<"RAND_MAX: "<<rand_max<<std::endl;
|  42  std::cout<<"range:    "<<range<<std::endl;
|  43  std::cout<<"The "<<(rand_max % range) + 1;
|  44  std::cout<<" first probabilities are greater than the ";
|  45  std::cout<<range - ((rand_max % range) + 1);
|  46  std::cout<<" other ones by a factor:"<<std::endl;
|  47  std::cout<<"Predicted p(0)/p(range-1):                               ";
|  48  std::cout<<(((rand_max + 1) / range) + 1) / (float)(rand_max / range);
|  49  std::cout<<std::endl;
|  50  std::cout<<"Predicted relative error (p(0) - p(range-1))/p(range-1): ";
|  51  std::cout<<(((scale / range) + 1) - (scale / range)) / (float)(scale / range);
|  52  std::cout<<std::endl;
|  53
|  54  std::cout<<" * Making statistic *"<<std::endl;
|  55
|  56  std::cout<<" * rand()%range *"<<std::endl;
|  57  constexpr char nameMod [] = "results/mod.dat";
|  58  statistic1(rand_max, range, Niterations, nameMod, scale);
|  59
|  60  std::cout<<" * uniform *"<<std::endl;
|  61  constexpr char nameUniform [] = "results/uniform.dat";
|  62  if(entropy > 0)
|+ |- 63 +---  3 lines: {
63  {
|| 64   statistic2(rand_max, range, Niterations, nameUniform, rd, distribution2, scale);
|| 65  }
|  66  else
|+ |- 67 +---  4 lines: {
67  {
|| 68   std::default_random_engine generator;
|| 69   statistic2(rand_max, range, Niterations, nameUniform, generator, distribution2, scale);
|| 70  }
|  71
|  72  return 0;
|  73 }
```

At the time of writing, I launched the code a bit more than two hour ago, so only the part for `rand()%range` is done. The second part seems to have difficulties to accumulate statistic. After 107374182350 iterations, the probabilities are still 0. I launched it on my laptop, a MacBook Pro from my lab (without Mac OS, but Ubuntu instead), but it does not have a hardware random generator. Maybe the issue comes from here. Latter, I will try on my server which have one.

Anyway, let us look at what we obtained so far.

Well, it looks good. The probabilities are converging to the expected one.

It shows that even in the case of a real life RAND_MAX, in this case, there are two times more chance to obtain 0 than any other number.

Here the gnuplot file to reproduce the plot

###### results.gnu
set term svg enhanced mouse
set style fill transparent
#set key at 499000,0.66
set title 'Convergence after 107374182350 iterations'
set xlabel 'Number of iterations / (RAND\_MAX + 1)'
set ylabel 'Probabilities (10^-^9)'
set output "r.svg"
set xrange[0:50]
#set yrange[0.3:0.7]
plot 'results/mod.dat' u (\$1/2147483648.):(\$2/\$1*1000000000.) t 'P_m(x=0)' w lp lc 1, \
'results/mod.dat' u (\$1/2147483648.):(\$3/\$1*1000000000.) t 'P_m(x=RAND\_MAX-1)' w lp lc 2, \
'results/uniform.dat' u (\$1/2147483648.):(\$2/\$1*1000000000.) t 'P_u(x=0)' w lp lc 3, \
'results/uniform.dat' u (\$1/2147483648.):(\$3/\$1*1000000000.) t 'P_u(x=1)' w lp lc 4, \
1./2147483648.*1000000000. lw 1 t 'P, P_u', \
2./2147483648.*1000000000. lw 1 t '2P'

##### The worst and the best, a short story on prime numbers

Let us see what are the best and the worst cases.

###### The best

Let us assume that `RAND_MAX` is an odd number, which most likely the case because probably defined as (28×2n)∕2-1. It implies that `RAND_MAX%2 = 1`. So, Pm(x=0) = Pm(x=1). It means that for an odd `RAND_MAX`, it is always safe to generate random numbers x ∈[0, 1]. It is not the only case. As a general rule, it is always safe to generate random numbers when `(RAND_MAX+1)%range = 0`.

###### The worst

The worst is if `RAND_MAX+1` is a prime number. Indeed, the condition `(RAND_MAX+1)%range = 0` can be never be satisfied to generate random numbers using the modulo operator.
In practice, it is unlikely to happen because of the definition above implies that `RAND_MAX+1` will be even.

But as a general rule, I use two criteria, the rarity and the ratio between the two different possible probabilities. With these two criteria, the worst is considered when a single number has a bigger probability than the other ones. In fact, it will be always 0 and it happens each time the following condition is fulfilled `RAND_MAX%range = 0`. The ratio increases as `range` increases and thus is maximum for `range = RAND_MAX`.

##### A concluding word

I hope that you will be careful next time that you will generate random numbers. C++, and especially C++11, provides great useful tools to tackles this problems. Using former standards or C, you will just need to write a simple function that will make your distribution uniform again. I advice you to look at the default one, It is pretty simple, easy to understand, and quick to code.

I hope that you enjoyed it. There is comment section yet, I am writing my own, but feel free to contact me for review, comment, thoughts, using social media like Mastodon, emails, XMPP, etc.
Share if you liked it :)

# SHA1BruteForce

###### Wednesday Nov 08, 2017 00:05

I'm proud to announce my first (free) software, SHA1BruteForce, that performs brute-force attack to crack SHA-1 hash.
Page of the project

###### Why?

After my 10yo Firefox session crashed, I lost a password stored in it. But, I managed to find the hash and it turns out to be a SHA-1 hash (software installed in 2009 on my server). I could change it, I guess, but I knew that SHA-1 is now considered as a weak encryption (although the first real collision is from February), so I challenged myself to recover it by writing a piece of code that do the job. It took me a bit more than a day to achieve a working code.
After, I began to optimized it and to have fundamental questioning about C++.

It is pretty simple and it seems to perform well. It takes about 4h to crack any 6 characters password on my computer. So I decided to publish it on my server, which was not as simple as I expected.
But I also have account in the main git platforms.
So I also published it on GitHub, GitLab and FramaGit, more know by the French.

It performs the tasks on the CPU only. GPU implementation does not seems possible at the time using only free software. Indeed, CUDA required the proprietary drivers and OpenCL does not seems to work properly with Nouveau (last version of the Linux kernel, i.e.4.13, on Ubuntu 16.04). But I want to use only free software (and I cannot install Nvidia drivers anyway, they do not work on my system).

It is not a revolutionary tools that intends to bit existing ones. I did it for myself and share it for anyone interested.

It is my first published code, so there are most likely some improvements to do on how to write the manual, how to write the code so it can be used by others, how I should comment it, and so on. The same goes for the code itself. Feel free to comment, share, submit commits, report bugs, etc.

# Are ^ and != operators equivalent on unsigned char in C++?

###### Wednesday Oct 25, 2017 03:51, last edition on Wednesday Oct 25, 2017 03:57

I was trying to optimize a C++ code where I check if two `unsigned char` are different. So the first thing that I wrote was `n1 != n2`. `unsigned char` are an integer type, and an efficient way to perform this operation is an exclusive or using bitwise operators, which is `^` in C++. So I could simply change my code to `n1 ^ n2`. But I was wondering if there are equivalent or if one of them is more efficient. I wrote the simplest functions for each operation in separate files.

% cat neq.cpp
bool f(unsigned char n1, unsigned char n2)
{
return n1 != n2;
}
% cat xor.cpp
bool f(unsigned char n1, unsigned char n2)
{
return n1 ^ n2;
}
Then, I compiled them to produce assembler code with the `-S` option of GCC
% g++ -S neq.cpp xor.cpp
For each source file .cpp, the corresponding assembler code is store in a .s file. If we compare the two file, they are equivalent, of course apart for the file name of the source code
% diff neq.s xor.s
1c1
< .file "neq.cpp"
---
> .file "xor.cpp"
Using `^` or `!=` are strictly equivalent for `unsigned char`.

My initial goal was to optimize my code but there is nothing that we can do to improve it in C++. To perform optimizations, we will use the compiler. Let us take a look at the assembler code, where the first line is skipped because irrelevant

.text .globl _Z1fhh .type _Z1fhh, @function _Z1fhh: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 movl %edi, %edx movl %esi, %eax movb %dl, -4(%rbp) movb %al, -8(%rbp) movzbl -4(%rbp), %eax cmpb -8(%rbp), %al setne %al popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE0: .size _Z1fhh, .-_Z1fhh .ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609" .section .note.GNU-stack,"",@progbits
Let us compile with different sets of flats, -O, or -O1 which is equivalent, will drastically reduce the number of instruction passed to the CPU. Indeed, the block between .cfi_startproc and .cfi_endproc will be reduced to 3 instructions, against 15 without flag.
.text .globl _Z1fhh .type _Z1fhh, @function _Z1fhh: .LFB0: .cfi_startproc cmpb %sil, %dil setne %al ret .cfi_endproc .LFE0: .size _Z1fhh, .-_Z1fhh .ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609" .section .note.GNU-stack,"",@progbits
`-O2` can be used, `-O3` and `-Ofast` will not affect the assembler code. In my case, the option `-march` (`-march=bdver2`) also affects the generate assembler code when combine with `-O2`. There is also `-Os` to optimize for size. Because I cannot judge which one is the best based on the assembler code, let us benchmark the operator with a dummy program
% cat main.cpp
int main()
{
for(unsigned char n1 = 0; n1 < 255; ++n1)
{
for(unsigned char n2 = 0; n2 < 255; ++n2)
{
for(unsigned short int i = 0; i < 65535; ++i)
{
n1 ^ n2;
}
}
}
unsigned char n1 = 255;
for(unsigned char n2 = 0; n2 < 255; ++n2)
{
for(unsigned short int i = 0; i < 65535; ++i)
{
n1 ^ n2;
}
}
}
Then, let us write a script to display the computation time in each case
% cat script.sh
#!/bin/bash

printf "g++ main.cpp; time ./a.out"
g++ main.cpp; time ./a.out
printf "\n"
printf "g++ -O1 main.cpp -o o1; time ./o1"
g++ -O1 main.cpp -o o1; time ./o1
printf "\n"
printf "g++ -O2 main.cpp -o o2; time ./o2"
g++ -O2 main.cpp -o o2; time ./o2
printf "\n"
printf "g++ -O2 -march=bdver2 main.cpp -o march; time ./march"
g++ -O2 -march=bdver2 main.cpp -o march; time ./march
printf "\n"
printf "g++ -Os main.cpp -o os; time ./os"
g++ -Os main.cpp -o os; time ./os
We now makethis script executable
% chmod +x script.sh
and we execute it
% ./script.sh g++ main.cpp; time ./a.out real 0m11.138s user 0m11.138s sys 0m0.000s g++ -O1 main.cpp -o o1; time ./o1 real 0m2.649s user 0m2.645s sys 0m0.004s g++ -O2 main.cpp -o o2; time ./o2 real 0m0.002s user 0m0.001s sys 0m0.000s g++ -O2 -march=bdver2 main.cpp -o march; time ./march real 0m0.002s user 0m0.001s sys 0m0.000s g++ -Os main.cpp -o os; time ./os real 0m0.002s user 0m0.001s sys 0m0.000s
With no doubt, the flags optimizes the code. But we don't have enough data to distinguish between `-O2`, `-O2 -march=bdver2` and `-Os`, so let us change the code to deasable some optimizations closer to real life case and perform it only on these three last cases because the first two cases might take a very long time.
% cat main2.cpp
#include <iostream> // std::cout
#include <ostream> // std::endl

bool f(const unsigned char & n1, const unsigned char & n2)
{
return n1 ^ n2;
}

int main()
{
bool * tmp = new bool ;
for(unsigned short int i = 0; i < 256; ++i)
{
tmp[i] = false;
}
for(unsigned char n1 = 0; n1 < 255; ++n1)
{
for(unsigned char n2 = 0; n2 < 255; ++n2)
{
{
for(unsigned short int i = 0; i < 65535; ++i)
{
tmp[n1] += f(n1, n2);
}
}
}
}
unsigned char n1 = 255;
for(unsigned char n2 = 0; n2 < 255; ++n2)
{
{
for(unsigned short int i = 0; i < 65535; ++i)
{
tmp[n1] += f(n1, n2);
}
}
}
std::cout<  return 0;
}
Here the new version of the script to test only the relevent flags
% cat script2.sh
#!/bin/bash

printf "g++ -O2 main2.cpp -o o2; time ./o2"
g++ -O2 main2.cpp -o o2; time ./o2
printf "\n"
printf "g++ -O2 -march=bdver2 main2.cpp -o march; time ./march"
g++ -O2 -march=bdver2 main2.cpp -o march; time ./march
printf "\n"
printf "g++ -Os main2.cpp -o os; time ./os"
g++ -Os main2.cpp -o os; time ./os
Now, let us execute this script
% ./script2.sh g++ -O2 main2.cpp -o o2; time ./o2 0x13cfc20 real 0m3.803s user 0m3.803s sys 0m0.001s g++ -O2 -march=bdver2 main2.cpp -o march; time ./march 0x16d0c20 real 0m3.843s user 0m3.841s sys 0m0.000s g++ -Os main2.cpp -o os; time ./os 0x1cc1c20 real 0m4.962s user 0m4.953s sys 0m0.008s
Running several times gives similar results. `-Os` made a code a little bit slower than the ones produced by `-O2` and `-O2 -march=bdver2`. Notice that forcing inlining of `f` improves the computation time for `-Os` and give similar results than for the other flags.

About 50% of the time`-O2` is faster `-O2 -march=bdver2`, but the relative difference is less than 1%. There is no clear gain from `-march=bdver2` in this case.

.section .text.unlikely,"ax",@progbits .LCOLDB0: .text .LHOTB0: .p2align 4,,15 .globl _Z1fhh .type _Z1fhh, @function _Z1fhh: .LFB0: .cfi_startproc cmpb %sil, %dil setne %al ret .cfi_endproc .LFE0: .size _Z1fhh, .-_Z1fhh .section .text.unlikely .LCOLDE0: .text .LHOTE0: .ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609" .section .note.GNU-stack,"",@progbits
Here the difference in the `f` function
% diff o2.s march.s
6c6,7
< .p2align 4,,15
---
> .p2align 4,,10
> .p2align 3
To conclude, we first showed that `^` and `!=` are strictly equivalent for `unsigned char`, then that the maximum of optimization for this operator is achived with `-O2` and finally that `-march=bdver2` does not provide noticeable improvement. In a real life code, other optimizations can make `-Os` performing better than `-O2` and other flags, such as `-O3`, might affect the executable. Bonjour, Je suis Clément Février, docteur en physique théorique de l’université de Grenoble Alpes, ingénieur Recherche et Développement dans le domaine de l’imagerie médicale et de la chirurgie mini-invasive chez Surgivisio et soutien du mouvement La France Insoumise.

On notera les 7 jours entre les premiers symptômes et les résultats.

Dites les velotaffeur, vous avez entendu parler d'un casque qui a un feu avant, un feu arrière et des clignotants sur les cotés / avant arrière ? (Enfin un simple bandeau led de chaque coté se gère hein)
Genre un truc pour que les voitures voient qu'on tourne à droite ou gauche sans qu'on ai besoin de tendre le bras ?

Parce que bon, j'ai testé ce soir, passer sur les marquages au sol + les rails du tramway avec un bras levé sous la pluie, niveau stabilité on repassera ^^'

J'arrive pas à trouver :/ et les solutions existantes de clignotants sont seulement pour l'arrière

Si le coeur vous en dit vous pouvez partager ;)

#velotaff #boostappréciés :)

I also struggles with a global "std::vector< std::any > g", but I end up with run time crash when calling "*dynamic_cast< decltype(g[index]) * >(p)" (probably for good reason? I guess it casts to std::any)

The only thing that really matter is, at the end of the day, to choose the correct method automatically without conditional statement at runtime. I really wonder is a workaround is possible.
I already tried to add "std::tuple< d1*, d2 > b_tuple" to class b, using some forward declaration, but, of course, std::get< i >(b_tuple) cannot compile since it will be known at runtime only.

But I allow complete refactoring code, even the b and d1 & d2. Templates, new classes and sub classes and even are fine (I already put 100-lines define to add iterator to enum and another one to allow enum inheritance, and my coworkers begin to hate me ^^)
For example, method "create" can become a class, or class b can be construct using a macro to make somehow the base class aware of derived ones
BASE(b, ...) std::tuple< __VA_ARGS__ > g_ ## b; class b

How to choose a function at run time using overloaded resolution in ?
I feel it's an old known issue without (obvious) answer, but many things changed since c++98
stackoverflow.com/questions/64

I add the new following constrain to my original question, it can be up to C++14, no more (although if a solution in c++17 or C++20 exist, I'm interested) because of compiler limitation (appears to be based on gcc 5.14 from what I understood).

@kensanata @cadadr @celia it's been 2 years I didn't try. I was trying once a year since they changed the api. I gave up eventually. I'll take a look again.