Dr Clèm's Blog

How to diagonalize a matrix of 4,722,366,482,869,645,213,696 elements? Part 2

Friday Nov 24, 2017 01:12
Mind your type #2

In the previous part, I was talk about how one shall be careful at the size of integer types. There are more considerations to take into account, in particular integral promotion. I have been been surprised to see that the return type of arithmetic operations are not the type of type of the two operands. For example, in the following code

unsigned char a;
unsigned char b;
auto c = a + b;
c is not an unsigned char. This is surprising because if both operands have the same type, then no further conversion is needed. However, and Andrey Chubukov, whom I had as a teacher during a workshop, was telling us that All great stories begin with However. Notice that it is not bijective. However,

"Arithmetic operators do not accept types smaller than int as arguments, and integral promotions are automatically applied after lvalue-to-rvalue conversion, if applicable."

cppreference.com

And indeed, one can check that c is an int.

A concluding word

Implicit conversions makes the use of auto more complicated because for types smaller than int the prototype is not

T T::operator+(const T2 & b) const;
but
int T::operator+(const int & b) const;
In our case, it can be problematic because memory consumption is the limiting factor.

Sparse symmetric matrix

Let us consider the antiferromagnetic Heisenberg model on one triangle at zero-field. The matrix representation of the Hamiltonian is

Antiferromagnetic Heisenberg model on one triangle

Sparse matrix

Apart from the diagonal terms, only few off-diagonal terms are non zero

Antiferromagnetic Heisenberg model on one triangle
It is a sparse matrix. We can use this property to save memory by storing only finite elements.

Let us consider that we need maximum machine precision, then, all elements are long double. The size of the matrix of is then 22×Ns×sizeof(long double), with Ns the number of sites. There are 3 sites on a triangle, and long double is 16Bytes, so, this matrix is 1kB (1024Bytes).

With a sparse matrix, one need only to store non-zero elements and their position. There are 8 diagonal terms, plus 6 on the upper triangle, and 6 on the lower triangle, so 20 non-zero elements. We know that eventually, we will deal with numbers up to 272 for the index in the matrix, so the index will be store on a 16Bytes integer type, called index in the following, that we will define later.
The size in memory is then 20×(sizeof(long double)+sizeof(index)) = 640Bytes. So we saved memory by a factor 1.6.

Notice that in this case, the indices can be stored in unsigned char and the values in signed char, which are both 1Bytes, thus this matrix can be stored in 40Bytes.

One can also notice that many elements share the same value, it can be useful to safe more memory by storing only different value. In addition, many values does not need to be stored in long double, but in smaller floating-point types to save even more memory. To achieve this, we can rewrite the Hamiltonian to favor signed char and then, apply a function to restore the actual physical values.

Sparse matrices take advance of the size. Indeed, the ratio between the memory consumption of the dense matrix and the sparse matrix increases with the size. In other words, sparse matrix will make you save relatively more space while the size increases, e.g. the ration is about 2.3 for Ns = 4.

Hermitian matrix

The Hamiltonian is an Hermitian operator. Thus, one need only the diagonal and either the upper or lower triangle of the matrix to have a complete knowledge of the matrix.

Antiferromagnetic Heisenberg model on one triangle
Now, one computes only these elements
Antiferromagnetic Heisenberg model on one triangle
and stores the non-zero ones, leading to a memory consumption of (8+6)×(sizeof(long double)+sizeof(index)) = 448Bytes, or 28Bytes will all optimizations, so about 2.3 times less memory.

Symmetries

The next step, is to use symmetries to block-diagonalize the Hamiltonian. It is introduced in Section 4.4.1 Symmetries of my PhD thesis. The short version is that one can diagonalize independently each block, and so save more memory because one works only in a subspace of the Hilbert space.

Antiferromagnetic Heisenberg model on one triangle

Algorithm

There are many algorithm to solve eigen-problems. But most of them need to know the matrix. So we want to focus on algorithm where the matrix can be implicit.

Power iteration

The power iteration is one of them. It is a very simple algorithm to implement and probably the one using the less memory because one need only to store one vector. Assuming elements are double, and applying the up/down symmetry, it takes 256GB of memory. Applying any other symmetry, e.g. translations or rotation C2, will make it fit. The problem is that it converge to the greatest (in absolute value) eigenvalue. We are looking for the ground state, which is the smallest eigenvalue, but there are always a degenerate state of higher absolute eigenvalue.

Locally Optimal Block Preconditioned Conjugate Gradient

Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) is a pretty good candidate. One can directly compute the smallest eigenvalue. The first step is to minimize the Rayleigh quotient, which can be done by only storing two floating-point variables, so at most 4Bytes. Then, one applies a simple iterative method to find the eigenvector. I did not try yet this method, but it looks quite promising. My main concern, and I guess that is why it is not used for very large matrix, is that one need to minimize the eigenvalue over 2Ns variables, in my case, 68,719,476,736 independent variables. So it will definitely takes a very long time to converge, but I will give it a try eventually.

Lanczos algorithm

Lanczos algorithm is wildly use in condensed matter physics to find the ground state of a system. It is a very fast algorithm. But one may need to store more informations. I said may because now that I am having a fresh look at the algorithm, it may be possible to store only one vector. I have to think about it.

In any case, a common way, because very easy, to write the routine is

  1. Build the reduced sparse matrix (sparse matrix + Hermitian + symmetries)
  2. Send the matrix to the routine
  3. In the routine, at each iteration, apply the Lanczos vector to the matrix
This is what I did during my PhD thesis. Now, I want to use the fact that one can build the vector v with just a function such that v ↦ H v, where H is the Hamiltonian. In this case, one do not need to store the Hamiltonian anymore. I did not finish yet to rewrite my code, so I will come back to this algorithm later.

Random numbers

Lanczos algorithm begins by initializing a random vector.

Broken random numbers generators on Apple products
Two weeks ago, I wrote about bad practices in random number generation, where the actual good practices are inside the code sources and slightly documented. I wrote

"After 107374182350 iterations, the probabilities are still 0. I launched it on my laptop, a MacBook Pro from my lab (without Mac OS, but Ubuntu instead), but it does not have a hardware random generator. Maybe the issue comes from here. Latter, I will try on my server which have one."

rand()%range, or the wrong way to generate random numbers

It turned out that it always gave 0. The reason behind is beyond the scope of this post but I discover after some web research that it is a well-known issue that Apple products have a zero-probably for some numbers, e.g. the range [0, 7], on their implementation of what is suppose to be a uniform distribution. So, never use an Apple product to perform scientific computation involving rand()!

On a homemade computer, which has a true random generator, using stochastic processes, I obtain good results.


As you can see, I did not wait for the true random generator part to end because even in an environment with a fair level of entropy, it took couple of days for these around 161061273600 iterations because the software pends all its time waiting for the hardware to accumulate enough entropy. In any case, the trend is already clear.
It proves that rand()%range is not uniform, shows that random generators of C++11 standard are indeed uniform, and demonstrates that MaxBook Pro have a broken random number generator.

Lanczos vector

Because the range of random numbers do not span the whole range of numbers that they are suppose to, the initial vector is de facto constrain to a subset of the Hilbert space. In some cases, maybe most of them, this effect will cancel by the iterative method, but it can also be too restive and stuck the iterative process. In particular, it will be problematic on very small systems.

Captain's Log #2

Tuesday Nov 21, 2017 17:38
To Do List
TootLine

I am working on TootLine, the PHP code that allows you to share your TootLine on your blog, like the one there is on the right or bottom, depending on the size of your screen. I have couple of issues to address before publishing it, which are proper word wrapping, create a cache for the media in order to solve CSP issue, handle the NSFW content that is displayed so far.

Tags' categories

I am planning on adding click-able tags to fetch all articles with matching tags.
Done

Translation

I already published a note in French and I began to write couple of drafts in French, so I would like to make a French version of this blog, with most of the articles translated.

Content Security Policy (CSP) headers

I added Content Security Policy (CSP) headers.

RSS feed

I added few elements to MySQLiToRSS, I add proper end of file to the generated RSS feed and I used the fact that I made tags click-able to add domain to the category element.

Better looking links

I changed the URL of some links to make them better looking.

Future

I addition to what remains on the To Do List, I want to add couple of improvement.

More restrictive CSP headers

I want to rewrite some part of the web site to be able to provide more secure CSP headers.

Comments

I am planning on adding a RSS feed for each commentary section so it will be easy to follow. I will also add a cookie to auto fill the fields Name and Website. I will put a check box if you want to add the cookie when you comment.

Tags

I will add the list of all tags on the right panel.

SEO

I will add proper Open Graph protocol and Twitter cards in the headers. I already updated the MySQL database, so everything is ready on this side, I just need to rewrite the headers that I include to make them dynamic.

Better looking links

I already changed the URL of some links to make them better looking, but I did not finish yet. The rewriting rules are not as simple as I expected, if you want to make them SEO compliant.

Better CMS

My work-flow is not the most efficient. Each article points to an actual file, which is not so good, because I need to create a file each I add an entry in this blog. I will improve this soon.

Trackback, impossible?

I tried to add trackback, but the most up to date page about it is more than 5 years old and all links are dead links, including the sample codes to make it work. If someone can explain me how to implement it, I will be pleased :)

How to set up Content Security Policy headers?

Tuesday Nov 21, 2017 04:45

I think a have a simple methodology to build Content Security Policy (CSP) headers. I did it with Apache HTTP Server and Firefox, but it is a generic methodology. I assume that you know what are CSP headers.

Before we start
Add-ons

First, you might want to disable most of your extensions. Indeed, some will add scripts or rewrite part of the displayed HTML code making it hard to distinguish warnings and errors form your code than from the extensions. I kept Test Pilot , Multi-Account Containers and Firefox Lightbeam by Mozilla, Privacy Badger and HTTPS Everywhere by the Electronic Frontier Foundation, and DuckDuckGo Plus by DuckDuckGo.

A useful tool

Laboratory by April King is an extension that allows you to record your website in order to provide a ready CSP header. But it is no fun and it does not push you think about how to rewrite some parts of website in order to improve security. You can also enforce CSP, which is useful if you add another component, for example if you add your Twitter timeline, you will need to adjust your CSP headers. With this tool, it is easy to check, try, add CSP headers, that you can later add to you web server. But for the purpose of this small how to, do not use this extension. If installed, please make sure that none of the check boxes are enabled and that Generated CSP configuration: is set to default-src 'none' if not, click on Delete All Settings or it will be a nightmare.

Laboratory extension

Always keep Developer Tools opened

Another step to avoid spend plenty of time looking at irrelevant answers on Stack Exchange is to open the Developer Tools, go on the options tab and check Disable HTTP Cache (when toolbox is open), and then keep it open at all time. Otherwise, some requests can be cached and you will not understand why your brand new configuration is not taking into account.

Building your CSP headers without blocking content
Report only

We will use the Content-Security-Policy-Report-Only header. When the web browser receive the content of a web page, it will display warnings for each violated CSP directive. We will see later that it can be used in a better way.

Be verbose

I strongly advise to add all possible directives to the header because any warning or error not related to an explicitly defined directive will be reported as violating default-src. This is because directives inherit from default-src if not explicitly set. If you only have something like default-src 'none'; style-src 'self';, then an unsafe-inline for script-src will be reported as violating default-src.

So our starting point will be default-src 'none'; child-src 'none'; connect-src 'none'; font-src 'none'; frame-src 'none'; img-src 'none'; manifest-src 'none'; media-src 'none'; object-src 'none'; script-src 'none'; style-src 'none'; worker-src 'none'; base-uri 'none'; frame-ancestors 'self'; and from this, we will allow one by one what we need.

Send report to your website

We will use a very nice feature of CSP, report-uri. It makes web browsers send a JavaScript Object Notation (JSON) file to the web server at the specified Uniform Resource Identifier (URI). This file will give you the basic instructions to build the correct CSP header. In my case, I created a folder /csp/ with the proper ownership and write rights containing one file index.php

<?php
// Start configure
$log_file dirname(__FILE__) . '/csp-violations.log';
$log_file_size_limit 1000000// bytes - once exceeded no further entries are added
// End configuration
$current_domain $_SERVER['SERVER_NAME'];
http_response_code(204); // HTTP 204 No Content
$json_data file_get_contents('php://input');
// We pretty print the JSON before adding it to the log file
if ($json_data json_decode($json_data)) {
  
$json_data json_encode($json_dataJSON_PRETTY_PRINT JSON_UNESCAPED_SLASHES);
  
// Do not write is file size exceeded
  
if (filesize($log_file) > $log_file_size_limit) {
    exit(
0);
  }
  
file_put_contents($log_file$json_dataFILE_APPEND LOCK_EX);
}
?>
It is a simplified and adapted version of the code you can find here. It will log errors and warnings in /csp/csp-violations.log.

All together

In your virtual host, add the following line

Header set Content-Security-Policy-Report-Only "report-uri /csp/; default-src 'none'; child-src 'none'; connect-src 'none'; font-src 'none'; frame-src 'none'; img-src 'none'; manifest-src 'none'; media-src 'none'; object-src 'none'; script-src 'none'; style-src 'none'; worker-src 'none'; base-uri 'none'; frame-ancestors 'self';"
Mind the ' and the ". The set of directives must start and end with ". All arguments of each directive must be surrounded by ' if, and only if, they are keywords. The corollary is do not use ' to surround Uniform Resource Locator (URL) and URI.

Restart Apache HTTP Server

# systemctl restart apache2

Debugging

Visit your website. You should see in the Console tab of the Developer Tools.

In the /csp/ folder of you virtual host, you should see the csp-violations.log file. It contains lines like

{
     "csp-report": {
         "blocked-uri": "self",
         "document-uri": "https://clementfevrier.fr/images/r3.svg",
         "original-policy": "report-uri https://clementfevrier.fr/csp/ https://clementfevrier.fr/images/default-src https://clementfevrier.fr/images/'none'; child-src 'none'; connect-src 'none'; font-src 'none'; frame-src 'none'; img-src 'none'; manifest-src 'none'; media-src 'none'; object-src 'none'; script-src 'none'; style-src 'none'; worker-src 'none'",
         "referrer": "https://clementfevrier.fr/articles/11_rand.php",
         "script-sample": "onclick attribute on g element",
         "source-file": "https://clementfevrier.fr/images/r3.svg",
         "violated-directive": "script-src 'none'"
     }
 }
original-policy displays the CSP header. It is useful to ensure that it matches what you set in our web server. violated-directive tells you with directive you should adjust, in this example script-src and it also remind you its arguments 'none' because it is our starting point. blocked-uri tells you what it blocked, here it is self. So, you just need to replace script-src 'none' by script-src 'self' and the warning will disappear.

Repeat this for each violated directive.

Notice that without explicitly setting all directives, violated-directive will always report to default-src which makes it more difficult to debug.

For each directive that I set, I restart the web server, delete the log file, reload a page from my website in my web browser, and look at the new log.

Don't forget to check different pages of your web site to check all cases.

Your CSP log file does not report anymore violated directive? Let us add the real header.

Apply your CSP

Your header should look like

Header set Content-Security-Policy-Report-Only "report-uri /csp/; default-src 'none'; connect-src 'self'; font-src 'self'; frame-src 'none'; img-src 'self' data: https://toot.forumanalogue.fr; manifest-src 'none'; media-src 'self'; object-src 'self'; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline'; worker-src 'none'; base-uri 'none'; frame-ancestors 'self';
Just remove the -Report-Only and you are done.
Header set Content-Security-Policy "report-uri /csp/; default-src 'none'; connect-src 'self'; font-src 'self'; frame-src 'none'; img-src 'self' data: https://toot.forumanalogue.fr; manifest-src 'none'; media-src 'self'; object-src 'self'; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline'; worker-src 'none'; base-uri 'none'; frame-ancestors 'self';

Restart Apache HTTP Server

# systemctl restart apache2

Final checks

First, check that your website renders as you wish.

Then, there are couple of useful tools to perform checks on you CSP headers.

Check their recommendations and the links, they are full of useful informations.

As you can see, I still have to work to achieve a good CSP, but I know you to eliminate most of the so-called insecure inline code. I say so-called because for most of it, if not all of it, it is perfectly secure since I am not in the cases where it can be potentially insecure.

Changing you website and modifying your CSP headers

You can have both Content-Security-Policy-Report-Only and Content-Security-Policy at the time. It is useful to make a more restrictive CSP without blocking content. You can have one report-uri for the block content and one for the report only, which will simplify debugging.

Further reading

MDN Web Docs, by Mozilla, is the only website that I found with proper explanation and reference of CSP.

rand()%range, or the wrong way to generate random numbers

Saturday Nov 11, 2017 03:44
The mathematics

In C++, rand() returns a random number between 0 and RAND_MAX, included. It is a uniform distribution, meaning that the probability Pn to have n ∈ [0, RAND_MAX] is Pn = 1∕(RAND_MAX+1) = P. Now, a common practice is to generate random numbers using the modulo operator %. rand()%range returns a number x ∈ [0, range-1]. The distribution is not uniform and favors smaller numbers. Indeed, the probabilities become

Px = n=0RAND_MAX Pn δn%range, x
Px = P × (floor(RAND_MAX∕range) + 1), for x ≤ RAND_MAX%range
P × floor(RAND_MAX∕range), otherwise.
One can see that Px is not a constant and the desired behavior is usually a uniform distribution with a probability Pu = 1/range.

Unrealistic working example

Let us assume that RAND_MAX = 2. rand() can return 0, 1 and 2. the probability P to have n ∈ [0, 2] is P = 1∕(2+1) = 1∕3. If range = 2, rand()%range has 3 cases.

  • rand() returns 0, 0%2 = 0.
  • rand() returns 1, 1%2 = 1.
  • rand() returns 2, 2%2 = 0.
Each with a probability P = 1∕3. So, Px=0 = Pn=0+Pn=2 = 2P = 2∕3 and Px=1 = Pn=1 = P = 1∕3, whereas a uniform probability would have been Pu = 1∕2. The ratio between the two probabilities is Px=0∕Px=1 = 2. It means that there is two times more chance to get 0 than 1. The relative error is ∣(Px=0-Px=1)∕Px=1∣ = 1.

Proof of concept

Here a small code that demonstrates the discussion above. It simulate the function rand() for RAND_MAX = 3.

main.cpp
    1 #include <cstdio>   // NULL
    2 #include <ctime>    // time
    3 #include <cstdlib>  // srand
    4 #include <random>   // std::random_device,
    5                     // std::uniform_int_distribution
    6                     // std::default_random_engine generator
    7 #include <iostream> // std::cout, ostream (std::endl)
    8 
    9 #include "functions.h"
   10 
   11 int main()
+  -  12 +-- 72 lines: {
12 {
|  13  // Value of rand_max
|  14  constexpr unsigned long long int rand_max = 2;
|  15  // Set range to maximize the error.
|  16  constexpr unsigned long int range = rand_max;
|  17  // Natural scale is rand_max + 1
|  18  constexpr unsigned long long int scale = rand_max + 1;
|  19  // Number of itrations to accumulate statistic
|  20  // In the unit of the natural scale
|  21  constexpr unsigned long long int Niterations = (unsigned long long int)500000 * scale;
|  22  // Random seed
|  23  srand(time(NULL));
|  24  // True random generator
|  25  std::random_device rd;
|  26  // Entropy of the true random generator
|  27  const auto entropy = rd.entropy();
|  28  std::cout<<"Entropy: "<<entropy<<std::endl;
|  29  // If the entropy is 0, it means there is no true random generator available (e.g. Apple computers)
|  30  // so a pseudo random number generator will be sused instead
|  31  if(entropy > 0)
|+ |- 32 +---  3 lines: {
32  {
|| 33   std::cout<<"True random generator"<<std::endl;
|| 34  }
|  35  else
|+ |- 36 +---  3 lines: {
36  {
|| 37   std::cout<<"Pseudo random generator"<<std::endl;
|| 38  }
|  39  // Uniform distribution to simultate rand() with our custom RAND_MAX
|  40  std::uniform_int_distribution< unsigned long int > distribution1(0, rand_max);
|  41  // Uniform distribution in the range [0, range-1]
|  42  std::uniform_int_distribution< unsigned long int > distribution2(0, range - 1);
|  43  std::cout<<"RAND_MAX: "<<rand_max<<std::endl;
|  44  std::cout<<"range:    "<<range<<std::endl;
|  45  std::cout<<"The "<<(rand_max % range) + 1;
|  46  std::cout<<" first probabilities are greater than the ";
|  47  std::cout<<range - ((rand_max % range) + 1);
|  48  std::cout<<" other ones by a factor:"<<std::endl;
|  49  std::cout<<"Predicted p(0)/p(range-1):                               ";
|  50  std::cout<<(((rand_max + 1) / range) + 1) / (float)(rand_max / range);
|  51  std::cout<<std::endl;
|  52  std::cout<<"Predicted relative error (p(0) - p(range-1))/p(range-1): ";
|  53  std::cout<<(((scale / range) + 1) - (scale / range)) / (float)(scale / range);
|  54  std::cout<<std::endl;
|  55 
|  56  std::cout<<" * Making statistic *"<<std::endl;
|  57 
|  58  std::cout<<" * rand()%range *"<<std::endl;
|  59  constexpr char nameMod [] = "results/mod3.dat";
|  60  if(entropy > 0)
|+ |- 61 +---  3 lines: {
61  {
|| 62   statistic1(rand_max, range, Niterations, nameMod, rd, distribution1, scale);
|| 63  }
|  64  else
|+ |- 65 +---  4 lines: {
65  {
|| 66   std::default_random_engine generator;
|| 67   statistic1(rand_max, range, Niterations, nameMod, generator, distribution1, scale);
|| 68  }
|  69 
|  70  std::cout<<" * uniform *"<<std::endl;
|  71  constexpr char nameUniform [] = "results/uniform3.dat";
|  72  if(entropy > 0)
|+ |- 73 +---  3 lines: {
73  {
|| 74   statistic2(rand_max, range, Niterations, nameUniform, rd, distribution2, scale);
|| 75  }
|  76  else
|+ |- 77 +---  4 lines: {
77  {
|| 78   std::default_random_engine generator;
|| 79   statistic2(rand_max, range, Niterations, nameUniform, generator, distribution2, scale);
|| 80  }
|  81 
|  82  return 0;
|  83 }
functions.h
       1 #ifndef _FUNCTIONS_H_
       2 #define _FUNCTIONS_H_
       3 
       4 #include <fstream>  // std::fstream
       5 #include <iostream> // std::cout, ostream (std::endl)
       6 #include <cmath>    // abs
       7 #include <cstdlib>  // rand
       8 
       9 void read(const auto name, auto & start, auto p)
+    -     10 +-- 23 lines: {
 10 {
|     11  for(unsigned char it = 0; it < 2; ++it)
|+   |-    12 +---  3 lines: {
 12  {
||    13   p[it] = 0;
||    14  }
|     15 
|     16  std::fstream file(name);
|     17  if(file.is_open())
|+   |-    18 +---  9 lines: {
 18  {
||    19   while(file.good())
||+  ||-   20 +----  5 lines: {
 20   {
|||   21    file>>start;
|||   22    file>>p[0];
|||   23    file>>p[1];
|||   24   }
||    25   file.clear();
||    26  }
|     27  file.close();
|     28 
|     29  std::cout<<"Start at trial number "<<start + 1;
|     30  std::cout<<" with probabilities "<<p[0] / (float)(start + 1);
|     31  std::cout<<" and "<<p[1] / (float)(start + 1)<<std::endl;
|     32 }
      33 
      34 void statistic1
      35 (
      36  const auto & rand_max,
      37  const auto & range,
      38  const auto & Niterations,
      39  const auto name,
      40  const auto & scale
      41 )
+    -     42 +-- 36 lines: {
 42 {
|     43  unsigned long long int * p = new unsigned long long int [2];
|     44  unsigned long int index = 0;
|     45  unsigned long long int start = 0;
|     46 
|     47  read(name, start, p);
|     48  std::ofstream file(name, std::ofstream::app);
|     49  for(unsigned long long int it = start + 1; it < Niterations; ++it)
|+   |-    50 +--- 21 lines: {
 50  {
||    51   index = rand() % range;
||    52   if(index == 0)
||+  ||-   53 +----  3 lines: {
 53   {
|||   54    ++p[0];
|||   55   }
||    56   else
||+  ||-   57 +----  6 lines: {
 57   {
|||   58    if(index == (range - 1))
|||+ |||-  59 +-----  3 lines: {
 59    {
||||  60     ++p[1];
||||  61    }
|||   62   }
||    63   if(it % scale == rand_max)
||+  ||-   64 +----  6 lines: {
 64   {
|||   65    if(p[0] != 0 && p[1] != 0)
|||+ |||-  66 +-----  3 lines: {
 66    {
||||  67     file<<it<<" "<<p[0]<<" "<<p[1]<<std::endl;
||||  68    }
|||   69   }
||    70  }
|     71  file.close();
|     72 
|     73  std::cout<<"Computed p(0)/p(range-1):                                ";
|     74  std::cout<<p[0]/(long double)p[1]<<std::endl;
|     75  std::cout<<"Computed relative error (p(0) - p(range-1))/p(range-1):  ";
|     76  std::cout<<std::abs((float)p[0] - p[1])/(long double)p[1]<<std::endl;
|     77 }
      78 
      79 void statistic2
      80 (
      81  const auto & rand_max,
      82  const auto & range,
      83  const auto & Niterations,
      84  const auto name,
      85  auto & generator,
      86  auto & distribution,
      87  const auto & scale
      88 )
+    -     89 +-- 36 lines: {
 89 {
|     90  unsigned long long int * p = new unsigned long long int [2];
|     91  unsigned long int index = 0;
|     92  unsigned long long int start = 0;
|     93 
|     94  read(name, start, p);
|     95  std::ofstream file(name, std::ofstream::app);
|     96  for(unsigned long long int it = start + 1; it < Niterations; ++it)
|+   |-    97 +--- 21 lines: {
 97  {
||    98   index = distribution(generator);
||    99   if(index == 0)
||+  ||-  100 +----  3 lines: {
100   {
|||  101    ++p[0];
|||  102   }
||   103   else
||+  ||-  104 +----  6 lines: {
104   {
|||  105    if(index == (range - 1))
|||+ |||- 106 +-----  3 lines: {
106    {
|||| 107     ++p[1];
|||| 108    }
|||  109   }
||   110   if(it % scale == rand_max)
||+  ||-  111 +----  6 lines: {
111   {
|||  112    if(p[0] != 0 && p[1] != 0)
|||+ |||- 113 +-----  3 lines: {
113    {
|||| 114     file<<it<<" "<<p[0]<<" "<<p[1]<<std::endl;
|||| 115    }
|||  116   }
||   117  }
|    118  file.close();
|    119 
|    120  std::cout<<"Computed p(0)/p(range-1):                                ";
|    121  std::cout<<p[0]/(long double)p[1]<<std::endl;
|    122  std::cout<<"Computed relative error (p(0) - p(range-1))/p(range-1):  ";
|    123  std::cout<<std::abs((float)p[0] - p[1])/(long double)p[1]<<std::endl;
|    124 }
     125 
     126 void statistic1
     127 (
     128  const auto & rand_max,
     129  const auto & range,
     130  const auto & Niterations,
     131  const auto name,
     132  auto & generator,
     133  auto & distribution,
     134  const auto & scale
     135 )
+    -    136 +-- 36 lines: {
136 {
|    137  unsigned long long int * p = new unsigned long long int [2];
|    138  unsigned long int index = 0;
|    139  unsigned long long int start = 0;
|    140 
|    141  read(name, start, p);
|    142  std::ofstream file(name, std::ofstream::app);
|    143  for(unsigned long long int it = start + 1; it < Niterations; ++it)
|+   |-   144 +--- 21 lines: {
144  {
||   145   index = distribution(generator) % range;
||   146   if(index == 0)
||+  ||-  147 +----  3 lines: {
147   {
|||  148    ++p[0];
|||  149   }
||   150   else
||+  ||-  151 +----  6 lines: {
151   {
|||  152    if(index == (range - 1))
|||+ |||- 153 +-----  3 lines: {
153    {
|||| 154     ++p[1];
|||| 155    }
|||  156   }
||   157   if(it % scale == rand_max)
||+  ||-  158 +----  6 lines: {
158   {
|||  159    if(p[0] != 0 && p[1] != 0)
|||+ |||- 160 +-----  3 lines: {
160    {
|||| 161     file<<it<<" "<<p[0]<<" "<<p[1]<<std::endl;
|||| 162    }
|||  163   }
||   164  }
|    165  file.close();
|    166 
|    167  std::cout<<"Computed p(0)/p(range-1):                                ";
|    168  std::cout<<p[0]/(long double)p[1]<<std::endl;
|    169  std::cout<<"Computed relative error (p(0) - p(range-1))/p(range-1):  ";
|    170  std::cout<<std::abs((float)p[0] - p[1])/(long double)p[1]<<std::endl;
|    171 }
     172 
     173 #endif
You can download the files by clicking on their name. To compile,
% g++ -std=gnu++14 main.cpp
Of course, you can add more options, like -Ofast and other that will speed up the code. Create a folder results before executing the code
% mkdir results
and finally, execute it
% ./a.out
It should last less than a second. It generate two files in the folder results. Each contains 3 columns, the number of iterations, Px=0, and Px=1. mod.dat contains the results when using rand()%RAND_MAX and uniform.dat the results for a uniform distribution.

Let us now look at the results. I plotted the expected probabilities for a uniform distribution Pu, P and 2P has explained in the previous section. I also plotted the probabilities Pm using the modulo operator and Pu using a uniform distribution. I plotted them in the natural scale in this context, (RAND_MAX+1).

After only few tens of iterations per (RAND_MAX+1), the probabilities are already stables and converged to the probabilities computed above. To ensure the stability, let us draw for a large number of iterations.

After 1500000 iterations, there is no doubt about the stability. It clearly shows that using modulo operator favors smaller number up to 2 times the largest.

Of course, each set of data will look different, but they will converge to the same results. I ran this code many times with the same outcome.

Here the gnuplot files to reproduce the plots

results3_200.gnu
set term svg enhanced mouse
set style fill transparent
set key at 190,0.66
set title 'Convergence after 600 iterations'
set xlabel 'Number of iterations / (RAND\_MAX + 1)'
set ylabel 'Probabilities'
set output "r3_200.svg"
set xrange[0:200]
set yrange[0.3:0.7]
plot 'results/mod3.dat' u ($1/3.):($2/$1) t 'P_m(x=0)' w lp, \
     'results/mod3.dat' u ($1/3.):($3/$1) t 'P_m(x=1)' w lp, \
     'results/uniform3.dat' u ($1/3.):($3/$1) t 'P_u(x=0)' w lp, \
     'results/uniform3.dat' u ($2/3.):($3/$1) t 'P_u(x=1)' w lp, \
     1./3. lw 4 t 'P', \
     2./3. lw 4 t '2P', \
     1./2. lw 4 t 'P_u'
results3.gnu
set term svg enhanced mouse
set style fill transparent
set key at 499000,0.66
set title 'Convergence after 1500000 iterations'
set xlabel 'Number of iterations / (RAND\_MAX + 1)'
set ylabel 'Probabilities'
set output "svg/r3.svg"
set xrange[0:500000]
set yrange[0.3:0.7]
plot 'results/mod3.dat' u ($1/3.):($2/$1) t 'P_m(x=0)' w l, \
     'results/mod3.dat' u ($1/3.):($3/$1) t 'P_m(x=1)' w l, \
     'results/uniform3.dat' u ($1/3.):($3/$1) t 'P_u(x=0)' w l, \
     'results/uniform3.dat' u ($2/3.):($3/$1) t 'P_u(x=1)' w l, \
     1./3. lw 1 t 'P', \
     2./3. lw 1 t '2P', \
     1./2. lw 1 t 'P_u'

General proof of concept

I entitled unrealistic example above because RAND_MAX is most likely defined as (232∕2)-1, i.e. the greatest int. The worst case happens when range = RAND_MAX. Indeed, the probability to have 0, Pm(x=0), is twice the any other probability Pm(x≠0). Using the same notation as above, Pm(x=0) = 2P and Pm(x≠0) = P.
Notice that P = Pu.

I chose to compare Pm(x=0) with Pm(x=RAND_MAX-1). In the code above, one can just change the variable rand_max by RAND_MAX on line 14, and call the proper function statistic1 that uses the actual rand() and that is enough. However, it will take too long to terminate (few years). So one should replace Niterations by a smaller number, for example (unsigned long long int)50 * scale = 107374182350. It should take couple of hours to run and should be enough to see the trend.

Let me show you the new main

main.cpp
    1 #include <cstdio>   // NULL
    2 #include <ctime>    // time
    3 #include <cstdlib>  // srand, RAND_MAX
    4 #include <random>   // std::random_device,
    5                     // std::uniform_int_distribution
    6                     // std::default_random_engine generator
    7 #include <iostream> // std::cout, ostream (std::endl)
    8 
    9 #include "functions.h"
   10 
   11 int main()
+  -  12 +-- 62 lines: {
12 {
|  13  // Value of rand_max
|  14  constexpr unsigned long long int rand_max = RAND_MAX;
|  15  // Set range to maximize the error.
|  16  constexpr unsigned long int range = rand_max;
|  17  // Natural scale is rand_max + 1
|  18  constexpr unsigned long long int scale = rand_max + 1;
|  19  // Number of itrations to accumulate statistic
|  20  // In the unit of the natural scale
|  21  constexpr unsigned long long int Niterations = (unsigned long long int)50 * scale;
|  22  // Random seed
|  23  srand(time(NULL));
|  24  // True random generator
|  25  std::random_device rd;
|  26  // Entropy of the true random generator
|  27  const auto entropy = rd.entropy();
|  28  std::cout<<"Entropy: "<<entropy<<std::endl;
|  29  // If the entropy is 0, it means there is no true random generator available (e.g. Apple computers)
|  30  // so a pseudo random number generator will be sused instead
|  31  if(entropy > 0)
|+ |- 32 +---  3 lines: {
32  {
|| 33   std::cout<<"True random generator"<<std::endl;
|| 34  }
|  35  else
|+ |- 36 +---  3 lines: {
36  {
|| 37   std::cout<<"Pseudo random generator"<<std::endl;
|| 38  }
|  39  // Uniform distribution in the range [0, range-1]
|  40  std::uniform_int_distribution< unsigned long int > distribution2(0, range - 1);
|  41  std::cout<<"RAND_MAX: "<<rand_max<<std::endl;
|  42  std::cout<<"range:    "<<range<<std::endl;
|  43  std::cout<<"The "<<(rand_max % range) + 1;
|  44  std::cout<<" first probabilities are greater than the ";
|  45  std::cout<<range - ((rand_max % range) + 1);
|  46  std::cout<<" other ones by a factor:"<<std::endl;
|  47  std::cout<<"Predicted p(0)/p(range-1):                               ";
|  48  std::cout<<(((rand_max + 1) / range) + 1) / (float)(rand_max / range);
|  49  std::cout<<std::endl;
|  50  std::cout<<"Predicted relative error (p(0) - p(range-1))/p(range-1): ";
|  51  std::cout<<(((scale / range) + 1) - (scale / range)) / (float)(scale / range);
|  52  std::cout<<std::endl;
|  53 
|  54  std::cout<<" * Making statistic *"<<std::endl;
|  55 
|  56  std::cout<<" * rand()%range *"<<std::endl;
|  57  constexpr char nameMod [] = "results/mod.dat";
|  58  statistic1(rand_max, range, Niterations, nameMod, scale);
|  59 
|  60  std::cout<<" * uniform *"<<std::endl;
|  61  constexpr char nameUniform [] = "results/uniform.dat";
|  62  if(entropy > 0)
|+ |- 63 +---  3 lines: {
63  {
|| 64   statistic2(rand_max, range, Niterations, nameUniform, rd, distribution2, scale);
|| 65  }
|  66  else
|+ |- 67 +---  4 lines: {
67  {
|| 68   std::default_random_engine generator;
|| 69   statistic2(rand_max, range, Niterations, nameUniform, generator, distribution2, scale);
|| 70  }
|  71 
|  72  return 0;
|  73 }

At the time of writing, I launched the code a bit more than two hour ago, so only the part for rand()%range is done. The second part seems to have difficulties to accumulate statistic. After 107374182350 iterations, the probabilities are still 0. I launched it on my laptop, a MacBook Pro from my lab (without Mac OS, but Ubuntu instead), but it does not have a hardware random generator. Maybe the issue comes from here. Latter, I will try on my server which have one.

Anyway, let us look at what we obtained so far.

Well, it looks good. The probabilities are converging to the expected one.

It shows that even in the case of a real life RAND_MAX, in this case, there are two times more chance to obtain 0 than any other number.

Here the gnuplot file to reproduce the plot

results.gnu
set term svg enhanced mouse
set style fill transparent
#set key at 499000,0.66
set title 'Convergence after 107374182350 iterations'
set xlabel 'Number of iterations / (RAND\_MAX + 1)'
set ylabel 'Probabilities (10^-^9)'
set output "r.svg"
set xrange[0:50]
#set yrange[0.3:0.7]
plot 'results/mod.dat' u ($1/2147483648.):($2/$1*1000000000.) t 'P_m(x=0)' w lp lc 1, \
     'results/mod.dat' u ($1/2147483648.):($3/$1*1000000000.) t 'P_m(x=RAND\_MAX-1)' w lp lc 2, \
     'results/uniform.dat' u ($1/2147483648.):($2/$1*1000000000.) t 'P_u(x=0)' w lp lc 3, \
     'results/uniform.dat' u ($1/2147483648.):($3/$1*1000000000.) t 'P_u(x=1)' w lp lc 4, \
     1./2147483648.*1000000000. lw 1 t 'P, P_u', \
     2./2147483648.*1000000000. lw 1 t '2P'

The worst and the best, a short story on prime numbers

Let us see what are the best and the worst cases.

The best

Let us assume that RAND_MAX is an odd number, which most likely the case because probably defined as (28×2n)∕2-1. It implies that RAND_MAX%2 = 1. So, Pm(x=0) = Pm(x=1). It means that for an odd RAND_MAX, it is always safe to generate random numbers x ∈[0, 1]. It is not the only case. As a general rule, it is always safe to generate random numbers when (RAND_MAX+1)%range = 0.

The worst

The worst is if RAND_MAX+1 is a prime number. Indeed, the condition (RAND_MAX+1)%range = 0 can be never be satisfied to generate random numbers using the modulo operator.
In practice, it is unlikely to happen because of the definition above implies that RAND_MAX+1 will be even.

But as a general rule, I use two criteria, the rarity and the ratio between the two different possible probabilities. With these two criteria, the worst is considered when a single number has a bigger probability than the other ones. In fact, it will be always 0 and it happens each time the following condition is fulfilled RAND_MAX%range = 0. The ratio increases as range increases and thus is maximum for range = RAND_MAX.

A concluding word

I hope that you will be careful next time that you will generate random numbers. C++, and especially C++11, provides great useful tools to tackles this problems. Using former standards or C, you will just need to write a simple function that will make your distribution uniform again. I advice you to look at the default one, It is pretty simple, easy to understand, and quick to code.

I hope that you enjoyed it. There is comment section yet, I am writing my own, but feel free to contact me for review, comment, thoughts, using social media like Mastodon, emails, XMPP, etc.
Share if you liked it :)

SHA1BruteForce

Wednesday Nov 08, 2017 00:05

I'm proud to announce my first (free) software, SHA1BruteForce, that performs brute-force attack to crack SHA-1 hash.
Page of the project

Why?

After my 10yo Firefox session crashed, I lost a password stored in it. But, I managed to find the hash and it turns out to be a SHA-1 hash (software installed in 2009 on my server). I could change it, I guess, but I knew that SHA-1 is now considered as a weak encryption (although the first real collision is from February), so I challenged myself to recover it by writing a piece of code that do the job. It took me a bit more than a day to achieve a working code.
After, I began to optimized it and to have fundamental questioning about C++.

It is pretty simple and it seems to perform well. It takes about 4h to crack any 6 characters password on my computer. So I decided to publish it on my server, which was not as simple as I expected.
But I also have account in the main git platforms.
So I also published it on GitHub, GitLab and FramaGit, more know by the French.

It is licensed under GPL3.

It performs the tasks on the CPU only. GPU implementation does not seems possible at the time using only free software. Indeed, CUDA required the proprietary drivers and OpenCL does not seems to work properly with Nouveau (last version of the Linux kernel, i.e.4.13, on Ubuntu 16.04). But I want to use only free software (and I cannot install Nvidia drivers anyway, they do not work on my system).

It is not a revolutionary tools that intends to bit existing ones. I did it for myself and share it for anyone interested.

It is my first published code, so there are most likely some improvements to do on how to write the manual, how to write the code so it can be used by others, how I should comment it, and so on. The same goes for the code itself. Feel free to comment, share, submit commits, report bugs, etc.

View server certificate in Thunderbird, a 11yo request.

Friday Oct 20, 2017 18:43, last edition on Friday Oct 20, 2017 19:23

While I was configuring the certificate for my mail server, I wanted to check if everything works fine in Thunderbird. It turns out that one cannot check the server certificates in Thunderbird. With a simple web search, I found a post from August 2006, i.e. more than 11 years ago, with the same question. The answer was that it was not possible and no extension doing it exists. I asked to see if things changed since them and the answer is no. This process is very simple in Firefox.

Check certificate #1
Check certificate #2
Check certificate #3
Check certificate #4

In my opinion, this should be a critical feature to have.


Dr Clément Février

Bonjour, Je suis Clément Février, docteur en physique théorique de l’université de Grenoble Alpes, ingénieur Recherche et Développement dans le domaine de l’imagerie médicale et de la chirurgie mini-invasive chez Surgivisio et soutien du mouvement La France Insoumise.


Dites les velotaffeur, vous avez entendu parler d'un casque qui a un feu avant, un feu arrière et des clignotants sur les cotés / avant arrière ? (Enfin un simple bandeau led de chaque coté se gère hein)
Genre un truc pour que les voitures voient qu'on tourne à droite ou gauche sans qu'on ai besoin de tendre le bras ?

Parce que bon, j'ai testé ce soir, passer sur les marquages au sol + les rails du tramway avec un bras levé sous la pluie, niveau stabilité on repassera ^^'

J'arrive pas à trouver :/ et les solutions existantes de clignotants sont seulement pour l'arrière

Si le coeur vous en dit vous pouvez partager ;)

#velotaff #boostappréciés :)

The only thing that really matter is, at the end of the day, to choose the correct method automatically without conditional statement at runtime. I really wonder is a workaround is possible.
I already tried to add "std::tuple< d1*, d2 > b_tuple" to class b, using some forward declaration, but, of course, std::get< i >(b_tuple) cannot compile since it will be known at runtime only.

But I allow complete refactoring code, even the b and d1 & d2. Templates, new classes and sub classes and even are fine (I already put 100-lines define to add iterator to enum and another one to allow enum inheritance, and my coworkers begin to hate me ^^)
For example, method "create" can become a class, or class b can be construct using a macro to make somehow the base class aware of derived ones
BASE(b, ...) std::tuple< __VA_ARGS__ > g_ ## b; class b

How to choose a function at run time using overloaded resolution in ?
I feel it's an old known issue without (obvious) answer, but many things changed since c++98
stackoverflow.com/questions/64

I add the new following constrain to my original question, it can be up to C++14, no more (although if a solution in c++17 or C++20 exist, I'm interested) because of compiler limitation (appears to be based on gcc 5.14 from what I understood).