Dr Clèm's Blog

Nous allons tuer la radio, et alors ?

Wednesday Jan 31, 2018 17:35

En regardant mes flux rss, je suis tombé sur une tribune parue sur le site de Libération le 22 janvier 2018 à 12h22, intitulée « Ils vont tuer la radio », de Fanch Langoët et Francine Leduc.

La première question que je me suis posé est « et alors ? » Du titre, j’ai imaginé deux scenarii. Le premier est que quelque chose vient de changer, un élément perturbateur, qui va injustement détruire ce média. Le second, plus probable compte tenu du titre généraliste, est qu’il est entrain de s’opérer un changement de paradigme, notamment compte tenu des changements de modèles économiques par le passage d’un modèle de rareté au modèle de quantité. Je vais résumer grossièrement ces deux notions. Dans le premier cas, le coup de production augmente avec la quantité, c’est par exemple le cas des livres, dont le principal coût de production est le papier, impliquant que pour un éditeur, produire deux livres coûte deux fois plus chers qu’en produire un seul. Dans le second cas, le coup de production est indépendant de la quantité, ce qui est typiquement le cas dans le monde numérique. Reprenons notre éditeur de livres. Le coût de mise à disposition d’un livre numérique à un lecteur, par exemple un fichier PDF sur un serveur web, revient au même prix que la mise à disposition de ce même fichier PDF à deux lecteurs.

« Ils » désigne les vilains qui cherchent à assassiner un média rendu obsolète par l’avènement des nouvelles technologies, ce sont donc toutes les personnes qui publient du contenu en ligne. Cette tribune est de mauvaise foi et les arguments exposés sont facilement réfutables.

« Au nom de la mutation technologique, du progrès et de la globalisation, les prêcheurs du tout-numérique prônent l’obsolescence du média radiophonique. » On sent tout de suite l’amertume, la subjectivité et la partialité du propos, notamment avec l’utilisation de « prêcheur », mot connoté négativement et qui laisse penser que la transition vers le numérique est un dogme, tout comme la désignation de la transition vers le numérique de « religion ».

Les auteurs de cette tribune nous parlent de « destruction massive » du média radiophonique et de ses « ravages », un registre de langage bien particulier et qui laisse entrevoir que les auteurs parlent avec les émotions. C’est ainsi que dès la fin du premier paragraphe, on peut se rendre compte que les auteurs se sont trompés de colère, pour paraphraser un auteur qui éprouvait un ressentiment difficilement caché à l’égard de son cancer. Je reprends donc le passage clé, les « un plan implacable de destruction massive qui [...] se pare des vertus supposées du tout-numérique, pour ubériser le secteur radio ». Je ne reviens pas sur les « vertus supposées du tout-numérique » étant donné qu’il y a beaucoup à dire, mais le « ubériser le secteur radio » n’est pas correct. Les auteurs blâment le vecteur au lieu de la cause. Nous n’accusons pas le facteur d’être la raison de notre malheur lorsque celui-ci nous apporte de mauvaises nouvelles. Premièrement, considérons que la radio est un média sujet à l’économique de la rareté. En effet, le nombre de stations radio disponibles est limité par le nombre de bandes de fréquences, appelées aussi porteuses en théorie du signal, disponibles et par le fait que l’onde hertzienne transmette un signal à un instant donné et que, encore une fois pour faire simple, nous ne pouvons pas réécouter une émission ou un chanson à partir de notre poste radio. Au contraire, sur l’Internet, il n’y a pas de limitation du nombre de diffuseurs, j’en suis moi-même un, ni de la disponibilité à n’importe quel instant d’un média publié. Si un diffuseur de contenu met en ligne un fichier audio, par exemple une chanson ou un podcast, alors il n’y a pas de raison technique pour empêcher que ce contenu soit à tout moment disponible, donc ré-écoutable à l’infini. Notons que ceci est générique, donc le fait qu’il y ait un changement de paradigme économique est indépendant du média impacté.
Il y a différents modèles économiques émergents pour rendre économiquement viable les diffuseurs dans cette nouvelle ère, par exemple les dons, la publicité, la revente des données personnelles des utilisateurs, etc. mais il n’y en n’a pas un unique comme celui « l’offre et la demande » de l’ère de la rareté. Le fait que cette tribune arrive aujourd’hui me fait penser que le secteur de la radio n’a pas assez anticipé ce changement et se retrouve donc dans une impasse financière avec très certainement des emplois en danger. « L’ubérisation » n’est pas dû à l’aspect numérique mais, dans le contexte d’un paradigme économique de la quantité, aux modèles économiques appliqués. C’est une conséquence en particulier du modèle néolibéral qui, en enlevant les droits des salariés, les rends précaires. C’est aussi une conséquences des choix des maisons de radio qui, comme les principaux journaux papiers, ont choisit les annonceurs comme investisseurs.

« S’il doit s’aligner sur le projet d’une radio intégralement délinéarisée (où chaque programme sera accessible à tout moment), ce big-bang sera, sans surprise, un rouleau compresseur qui transformera radio et télé en producteurs de contenus conçus dans les règles et l’esprit de la libre concurrence des réseaux sociaux. » Ceci est une mauvaise interprétation de la conséquence des réseaux sociaux sur les médias. Dans tous les cas, si nous admettons l’influence importante des réseaux sociaux sur l’économie des diffuseurs de médias, ce qui semble être une hypothèse raisonnable, les réseaux sociaux auront un impact. Si une station de radio est plus partagée sur les réseaux sociaux qu’une autre, alors son audience augmentera certainement. Les stations de radio étaient déjà des producteurs de contenu avant l’Internet, c’est pour cette raison que chaque station à un public cible qu’elle essaye de contenter pour garder et augmenter son audience. Il est certain qu’une station de radio qui permettra la rediffusion de contenu sera capable de proposer un autre modèle économique, et sera donc plus pérenne financièrement. Pour éviter « l’ubérisation », il conviendra à la station radio de choisir un bon modèle économique ou d’en créer un, car je suis persuadé que l’on peut trouver de meilleurs solutions que celles appliquées aujourd’hui.

« cette mutation technologique ne va pas sans un renouveau éditorial et semble paramétrée pour lisser les voix et les idées. » C’est faux, ce n’est pas à cause de la technologie. Ce sont les modèles dominants de cette transition qui en sont la cause, comme le « clic bait » qui change la manière de titrer, de résumer et de créer des aperçu des contenus pour attirer les utilisateurs sur des pages non pas pour le contenu, mais pour les annonceurs. C’est donc bien le néolibéralisme qui est la cause et en particulier le choix de l’investissement par des annonceurs.

Pour le troisième paragraphe, je me vois obligé d’être sarcastique car les auteurs voient la débauche ultime dans le mixte du son et de l’image, couramment appelé à l’époque moderne, la vidéo. Je n’imagine pas l’indignation des auteurs lors de l’apparition des premiers films sonores dans les années 30. L’époque nostalgique où il fallait savoir lire pour regarder un film, dans quelle époque antidémocratique vivons nous où les médias sont plus accessible de jamais ? Bref, je trouve l’argumentaire de ce paragraphe simplement stupide. Avec l’augmentation des débit de connexion à l’Internet, nous avons pu voir l’intérêt du monde pour la vidéo. J’ai trouvé l’idée de France Inter de filmer les émissions pour ensuite les mettre en lignes intéressante. Logiquement, je me dis « qui peut le plus, peu le moins », donc si une émission de radio est accompagnée de vidéo, on peut tout aussi bien écouter l’émission sans regarder la vidéo, ce que, à titre personnel, je fais la grande majorité du temps. « Les critères marketing se substituent à tous les autres et menacent doucement mais sûrement d’emporter le patrimoine radiophonique » Encore une fois, les auteurs confondent l’Internet avec le néolibéralisme.

Je trouve intéressant l’utilisation de « garde-fou », qui sont-ils ? Est-ce « eux », les animateurs radio ? C’est d’une part très prétentieux et d’autre part dangereux de se représenter en gardien de la société et de la débauche.

Dans le quatrième paragraphe, les auteurs partent dans des fabulations. À moins qu’ils aient connaissance d’un plan secret de déploiement de radio diffusée uniquement en 3G à la place de la diffusion hertzienne, car, je cite, les auteurs nous parlent de « la disparition de la radio hertzienne au profit d’une diffusion intégralement 3G », ils sont dans l’invention la plus complète. Cela montre une grande méconnaissance technique et de ce que sont les réseaux. Ils ne sont même pas au courant que la radio est aujourd’hui diffusée sur l’Internet. Ironiquement, leur propos étaient presque plus cohérent dans la version original de l’article où ils parlaient de « radio numérique terrestre ». Mais cela me simplifie la tâche étant donné que cela rend le reste du paragraphe complètement erroné alors qu’autrement il y aurait eu des choses à dire. Par exemple, les auteurs s’imaginent que les français prennent une connexion au réseau mobile pour écouter la radio et que « donc » il faudra poser des antennes 3G sur tout le territoire et équiper tout le monde de nouveau appareils capables de recevoir cette fameuse « radio numérique ». Comme si les il y avait des antennes 3G mobile et des antennes 3G « radio numérique » et des appareils spécifique pour la recevoir. Il semble qu’il y ait un amalgame de la part des auteurs entre la diffusion du contenu sur le réseau, ce qui est le cas de la radio, avec la création d’un nouveau système, comme ça a été le cas avec la TNT où il avait effectivement besoin de construire un réseau et appareils capable de recevoir le signal. Mis bout à bout, entre les réseaux mobiles et les connexions à l’Internet, l’accès à la radio numérique est certainement plus grand que si on le compare au nombre de foyers qui ont un poste radio à la maison (je n’ai pas réussi à trouver cette information, mais je n’ai pas beaucoup cherché non plus). Il y a une confusion évidente entre l’Internet et le réseau mobile, ce qu’ils font, et en quoi c’est fondamentalement différent de la radio, ou de la télévision.

Le dernier paragraphe montre bien les confusions entre le numérique et les modèles économique qui lui sont appliqués, ne voient pas que d’autres voies sont possibles, blâment le numérique pour les nouveaux formats au lieu du néolibéralisme.

« Le faux progressisme des prêcheurs du tout-numérique n’est d’ailleurs pas très loin d’une étrange «radiophobie» » Et cette tribune n’est d’ailleurs pas très loin d’une étrange technophobie.

Je suis conscient que mon propos est bourré de généralités et d’approximations, mais ayant déjà écrit un pavé, je ne voulais pas noyé mon propos dans trop de détails.

Captain's Log #3

Wednesday Jan 31, 2018 09:53
EnableSendfile Directive

I enabled the EnableSendfile Directive in Apache HTTP Server by adding EnableSendfile On to the Virtual Host.

To Do List
TootLine

I am working on TootLine, the PHP code that allows you to share your TootLine on your blog, like the one there is on the right or bottom, depending on the size of your screen. I have couple of issues to address before publishing it, which are proper word wrapping, create a cache for the media in order to solve CSP issue, handle the NSFW content that is displayed so far.

Translations

I would like to make a French version of this blog, with most of the articles translated.

I addition to what remains on the To Do List, I want to add couple of improvement.

More restrictive CSP headers

I want to rewrite some part of the web site to be able to provide more secure CSP headers.

Comments

I am planning on adding a RSS feed for each commentary section so it will be easy to follow. I will also add a cookie to auto fill the fields Name and Website. I will put a check box if you want to add the cookie when you comment.

Tags

I will add the list of all tags on the right panel.

SEO

I will add proper Open Graph protocol and Twitter cards in the headers. I already updated the MySQL database, so everything is ready on this side, I just need to rewrite the headers that I include to make them dynamic.

Better looking links

I already changed the URL of some links to make them better looking, but I did not finish yet. The rewriting rules are not as simple as I expected, if you want to make them SEO compliant.

Better CMS

My work-flow is not the most efficient. Each article points to an actual file, which is not so good, because I need to create a file each I add an entry in this blog. I will improve this soon.

Migration to LXD

I am not entirely sure that I will do it, but I am considering to migrate from Xen to LXD.

How to diagonalize a matrix of 4,722,366,482,869,645,213,696 elements? Part 2

Friday Nov 24, 2017 01:12
Mind your type #2

In the previous part, I was talk about how one shall be careful at the size of integer types. There are more considerations to take into account, in particular integral promotion. I have been been surprised to see that the return type of arithmetic operations are not the type of type of the two operands. For example, in the following code

unsigned char a;
unsigned char b;
auto c = a + b;
c is not an unsigned char. This is surprising because if both operands have the same type, then no further conversion is needed. However, and Andrey Chubukov, whom I had as a teacher during a workshop, was telling us that All great stories begin with However. Notice that it is not bijective. However,

"Arithmetic operators do not accept types smaller than int as arguments, and integral promotions are automatically applied after lvalue-to-rvalue conversion, if applicable."

cppreference.com

And indeed, one can check that c is an int.

A concluding word

Implicit conversions makes the use of auto more complicated because for types smaller than int the prototype is not

T T::operator+(const T2 & b) const;
but
int T::operator+(const int & b) const;
In our case, it can be problematic because memory consumption is the limiting factor.

Sparse symmetric matrix

Let us consider the antiferromagnetic Heisenberg model on one triangle at zero-field. The matrix representation of the Hamiltonian is

Antiferromagnetic Heisenberg model on one triangle

Sparse matrix

Apart from the diagonal terms, only few off-diagonal terms are non zero

Antiferromagnetic Heisenberg model on one triangle
It is a sparse matrix. We can use this property to save memory by storing only finite elements.

Let us consider that we need maximum machine precision, then, all elements are long double. The size of the matrix of is then 22×Ns×sizeof(long double), with Ns the number of sites. There are 3 sites on a triangle, and long double is 16Bytes, so, this matrix is 1kB (1024Bytes).

With a sparse matrix, one need only to store non-zero elements and their position. There are 8 diagonal terms, plus 6 on the upper triangle, and 6 on the lower triangle, so 20 non-zero elements. We know that eventually, we will deal with numbers up to 272 for the index in the matrix, so the index will be store on a 16Bytes integer type, called index in the following, that we will define later.
The size in memory is then 20×(sizeof(long double)+sizeof(index)) = 640Bytes. So we saved memory by a factor 1.6.

Notice that in this case, the indices can be stored in unsigned char and the values in signed char, which are both 1Bytes, thus this matrix can be stored in 40Bytes.

One can also notice that many elements share the same value, it can be useful to safe more memory by storing only different value. In addition, many values does not need to be stored in long double, but in smaller floating-point types to save even more memory. To achieve this, we can rewrite the Hamiltonian to favor signed char and then, apply a function to restore the actual physical values.

Sparse matrices take advance of the size. Indeed, the ratio between the memory consumption of the dense matrix and the sparse matrix increases with the size. In other words, sparse matrix will make you save relatively more space while the size increases, e.g. the ration is about 2.3 for Ns = 4.

Hermitian matrix

The Hamiltonian is an Hermitian operator. Thus, one need only the diagonal and either the upper or lower triangle of the matrix to have a complete knowledge of the matrix.

Antiferromagnetic Heisenberg model on one triangle
Now, one computes only these elements
Antiferromagnetic Heisenberg model on one triangle
and stores the non-zero ones, leading to a memory consumption of (8+6)×(sizeof(long double)+sizeof(index)) = 448Bytes, or 28Bytes will all optimizations, so about 2.3 times less memory.

Symmetries

The next step, is to use symmetries to block-diagonalize the Hamiltonian. It is introduced in Section 4.4.1 Symmetries of my PhD thesis. The short version is that one can diagonalize independently each block, and so save more memory because one works only in a subspace of the Hilbert space.

Antiferromagnetic Heisenberg model on one triangle

Algorithm

There are many algorithm to solve eigen-problems. But most of them need to know the matrix. So we want to focus on algorithm where the matrix can be implicit.

Power iteration

The power iteration is one of them. It is a very simple algorithm to implement and probably the one using the less memory because one need only to store one vector. Assuming elements are double, and applying the up/down symmetry, it takes 256GB of memory. Applying any other symmetry, e.g. translations or rotation C2, will make it fit. The problem is that it converge to the greatest (in absolute value) eigenvalue. We are looking for the ground state, which is the smallest eigenvalue, but there are always a degenerate state of higher absolute eigenvalue.

Locally Optimal Block Preconditioned Conjugate Gradient

Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) is a pretty good candidate. One can directly compute the smallest eigenvalue. The first step is to minimize the Rayleigh quotient, which can be done by only storing two floating-point variables, so at most 4Bytes. Then, one applies a simple iterative method to find the eigenvector. I did not try yet this method, but it looks quite promising. My main concern, and I guess that is why it is not used for very large matrix, is that one need to minimize the eigenvalue over 2Ns variables, in my case, 68,719,476,736 independent variables. So it will definitely takes a very long time to converge, but I will give it a try eventually.

Lanczos algorithm

Lanczos algorithm is wildly use in condensed matter physics to find the ground state of a system. It is a very fast algorithm. But one may need to store more informations. I said may because now that I am having a fresh look at the algorithm, it may be possible to store only one vector. I have to think about it.

In any case, a common way, because very easy, to write the routine is

  1. Build the reduced sparse matrix (sparse matrix + Hermitian + symmetries)
  2. Send the matrix to the routine
  3. In the routine, at each iteration, apply the Lanczos vector to the matrix
This is what I did during my PhD thesis. Now, I want to use the fact that one can build the vector v with just a function such that v ↦ H v, where H is the Hamiltonian. In this case, one do not need to store the Hamiltonian anymore. I did not finish yet to rewrite my code, so I will come back to this algorithm later.

Random numbers

Lanczos algorithm begins by initializing a random vector.

Broken random numbers generators on Apple products
Two weeks ago, I wrote about bad practices in random number generation, where the actual good practices are inside the code sources and slightly documented. I wrote

"After 107374182350 iterations, the probabilities are still 0. I launched it on my laptop, a MacBook Pro from my lab (without Mac OS, but Ubuntu instead), but it does not have a hardware random generator. Maybe the issue comes from here. Latter, I will try on my server which have one."

rand()%range, or the wrong way to generate random numbers

It turned out that it always gave 0. The reason behind is beyond the scope of this post but I discover after some web research that it is a well-known issue that Apple products have a zero-probably for some numbers, e.g. the range [0, 7], on their implementation of what is suppose to be a uniform distribution. So, never use an Apple product to perform scientific computation involving rand()!

On a homemade computer, which has a true random generator, using stochastic processes, I obtain good results.


As you can see, I did not wait for the true random generator part to end because even in an environment with a fair level of entropy, it took couple of days for these around 161061273600 iterations because the software pends all its time waiting for the hardware to accumulate enough entropy. In any case, the trend is already clear.
It proves that rand()%range is not uniform, shows that random generators of C++11 standard are indeed uniform, and demonstrates that MaxBook Pro have a broken random number generator.

Lanczos vector

Because the range of random numbers do not span the whole range of numbers that they are suppose to, the initial vector is de facto constrain to a subset of the Hilbert space. In some cases, maybe most of them, this effect will cancel by the iterative method, but it can also be too restive and stuck the iterative process. In particular, it will be problematic on very small systems.

How to diagonalize a matrix of 4,722,366,482,869,645,213,696 elements? Part 1

Wednesday Nov 22, 2017 23:22, last edition on Thursday Nov 23, 2017 00:20
Introduction

I am trying to diagonalize a very large matrix (4,722,366,482,869,645,213,696 elements) in C++ and it leads to some issues. I noticed that I solved most of the issues that I encounter by myself by just pretending to explain the problem to someone. So I decided that for now on, I will share these notes. I will not explain everything, just the technical difficulties. I will give references for the rest, especially my PhD thesis.

Mind your type
Today, I will present you one of the first technical issue that I faced. It is linked to the type of variable.

A simple example is the results of 2N. In my code, I need to compute 236. The most straight forward way to achieve is by using bitwise operations. In fact, 2N can be written 1 << N, for N∈*. Indeed, the operator << move to the right all the bits by the number on the right hand. Few examples:
1 << 0. 1 is not changed, so 1 << 0 = 12 = 110 ≠ 20.
1 << 1 moved the 1 to the right, so 1 << 1 = 102 = 210 = 21
1 << 2 moved the 1 two times to the right, so 1 << 2 = 1002 = 410 = 22.
Here the issue. For C++, 1 is a int, which means that it is at least 16 bits, but the actual number of bits is not specified and is implementation dependent. Let us assume that sizeof(int) is equal to 4. It is the number of bytes, so it means that int is 32 bits (sizeof(int)×8).
××××××××××××××××××××××××××××××××
Only the first 31 bits (sizeof(int)×8-1) represent a positive number, which means that one can only move a bit up to 30 times (sizeof(int)×8-1-1) to the right and still get a positive number.
1 << 30 = 010000000000000000000000000000002 = 107374182410 = 230
1 << 31 = 100000000000000000000000000000002 = -214748364810 ≠ 231
1 << 32 = 000000000000000000000000000000002 = 010 ≠ 232

If you try to compile a code with the last case, you will have a warning

main.cpp: In function ‘int main()’:
main.cpp:10:17: warning: left shift count >= width of type [-Wshift-count-overflow]
  auto val = 1<<32;
                 ^

As you can see auto does not become long long int but will be int, i.e. the return type of 1 << 32.

What can we do?

The case 1 << 31 is pretty easy. We just need to explicitly declare 1 << 31 to be unsigned int. In this case, the last bit is not for the sign, so (unsigned int)1 << 31 = 2147483648 = 231

What about 1 << 32? Can I write (unsigned long long int)1 << 32? According to C++ standard, unsigned long long int is at least 64 bits, so one can think that (unsigned long long int)1 << 32 will do the job. Let us see what happens. It is first 1 << 32 that is computed, which is an int, so it returns 0, which is cast into an unsigned long long int. The result is then 0, so, no, it is not a workaround.
Notice that in this case, the compiler will not give you a warning, which is a bug, I think.

The solution is to actually to recast 1 into a type that stores enough bits for the bitwise operation and then apply the operator. Let us do it step by step
1 is an int, so its binary representation is
000000000000000000000000000000012
Now, we add enough bits with the cast ((unsigned long long int) 1). It has the following binary representation
00000000000000000000000000000000000000000000000000000000000000012
((unsigned long long int) 1) << 32 moves the first 1 32 times
00000000000000000000000000000001000000000000000000000000000000002 = 429496729610 = 232.
Output of 1 << 32 and ((unsigned long long int) 1) << 32

We now have a safe 2N for N∈[1, 63]. It is enough for 236, which was the first thing that I had difficulties with.

Notice the largest number in C++ is 264-1 = 18446744073709551615, which is smaller than the number of elements in my matrix, 4722366482869645213696 = 272. We will see that it will be another problem to tackle.

What's next?

The main problem that I am facing is the memory consumption. I am using Lanczos algorithm do solve the eigen-problem. Right now, I am rewriting my code to use a very nice property of this method, which is the fact that the matrix can be defined implicitly.

Edit: The source code that I used for the tests. To compile

% g++ -std=gnu++14 main.cpp

main.cpp
     1 #include <string>   // std::string
     2 #include <iostream> // ostream (std::ostream, std::endl)
     3                     // std::cout
     4 #include <typeinfo> // typeid
     5 #include <bitset>   // std::bitset
     6 #include <array>    // std::array
     7 #include <memory>   // std::shared_ptr
     8 
     9 // Reviewed
    10 const std::string exec(const auto * cmd)
+   -   11 +-- 14 lines: {
11 {
|   12  std::array<char, 128> buffer;
|   13  std::string result;
|   14  std::shared_ptr< FILE > pipe(popen(cmd, "r"), pclose);
|   15  if(!pipe) throw std::runtime_error("popen() failed!");
|   16  while(!feof(pipe.get()))
|+  |-  17 +---  6 lines: {
17  {
||  18   if(fgets(buffer.data(), 128, pipe.get()) != nullptr)
||+ ||- 19 +----  3 lines: {
19   {
||| 20    result += buffer.data();
||| 21   }
||  22  }
|   23  return result;
|   24 }
    25 
    26 // Reviewed
    27 // No auto for std::ostream
    28 constexpr void display_type(
    29  const auto & name,
    30  const auto & var,
    31  std::ostream & s = std::cout
    32 )
+   -   33 +--  8 lines: {
33 {
|   34  // No auto
|   35  std::string command = "c++filt -t ";
|   36  command += typeid(var).name();
|   37  s<<"Type of \""<<name<<"\""<<std::endl;
|   38  auto stype = exec(command.data());
|   39  s<<stype<<std::endl;
|   40 }
    41 
    42 int main()
+   -   43 +-- 31 lines: {
43 {
|   44  std::cout<<"* 1<<32"<<std::endl;
|   45  auto val1 = 1<<32;
|   46  display_type("1<<32", val1);
|   47  std::cout<<"1<<32 in base 2";
|   48  std::<<std::endl;
|   49  unsigned char * binary1 = reinterpret_cast< decltype(binary1) >(&val1);
|   50  for(auto i = (signed) sizeof(val1) - 1; i >= 0; --i)
|+  |-  51 +---  3 lines: {
51  {
||  52   std::cout<<std::bitset< 8 >(binary1[i]);
||  53  }
|   54  std::cout<<std::endl;
|   55  std::cout<<"1<<32 in base 10";
|   56  std::cout<<std::endl<<val1<<std::endl;
|   57 
|   58  std::cout<<std::endl;
|   59  std::cout<<"* ((unsigned long long int) 1)<<32"<<std::endl;
|   60  auto val2 = ((unsigned long long int) 1)<<32;
|   61  display_type("((unsigned long long int) 1)<<32", val2);
|   62  std::cout<<"((unsigned long long int) 1)<<32 in base 2";
|   63  std::cout<<std::endl;
|   64  unsigned char * binary2 = reinterpret_cast< decltype(binary2) >(&val2);
|   65  for(auto i = (signed) sizeof(val2) - 1; i >= 0; --i)
|+  |-  66 +---  3 lines: {
66  {
||  67   std::cout<<std::bitset< 8 >(binary2[i]);
||  68  }
|   69  std::cout<<std::endl;
|   70  std::cout<<"((unsigned long long int) 1)<<32 in base 10";
|   71  std::cout<<std::endl<<val2<<std::endl;
|   72  return 0;
|   73 }

Captain's Log #2

Tuesday Nov 21, 2017 17:38
To Do List
TootLine

I am working on TootLine, the PHP code that allows you to share your TootLine on your blog, like the one there is on the right or bottom, depending on the size of your screen. I have couple of issues to address before publishing it, which are proper word wrapping, create a cache for the media in order to solve CSP issue, handle the NSFW content that is displayed so far.

Tags' categories

I am planning on adding click-able tags to fetch all articles with matching tags.
Done

Translation

I already published a note in French and I began to write couple of drafts in French, so I would like to make a French version of this blog, with most of the articles translated.

Content Security Policy (CSP) headers

I added Content Security Policy (CSP) headers.

RSS feed

I added few elements to MySQLiToRSS, I add proper end of file to the generated RSS feed and I used the fact that I made tags click-able to add domain to the category element.

Better looking links

I changed the URL of some links to make them better looking.

Future

I addition to what remains on the To Do List, I want to add couple of improvement.

More restrictive CSP headers

I want to rewrite some part of the web site to be able to provide more secure CSP headers.

Comments

I am planning on adding a RSS feed for each commentary section so it will be easy to follow. I will also add a cookie to auto fill the fields Name and Website. I will put a check box if you want to add the cookie when you comment.

Tags

I will add the list of all tags on the right panel.

SEO

I will add proper Open Graph protocol and Twitter cards in the headers. I already updated the MySQL database, so everything is ready on this side, I just need to rewrite the headers that I include to make them dynamic.

Better looking links

I already changed the URL of some links to make them better looking, but I did not finish yet. The rewriting rules are not as simple as I expected, if you want to make them SEO compliant.

Better CMS

My work-flow is not the most efficient. Each article points to an actual file, which is not so good, because I need to create a file each I add an entry in this blog. I will improve this soon.

Trackback, impossible?

I tried to add trackback, but the most up to date page about it is more than 5 years old and all links are dead links, including the sample codes to make it work. If someone can explain me how to implement it, I will be pleased :)

How to set up Content Security Policy headers?

Tuesday Nov 21, 2017 04:45

I think a have a simple methodology to build Content Security Policy (CSP) headers. I did it with Apache HTTP Server and Firefox, but it is a generic methodology. I assume that you know what are CSP headers.

Before we start
Add-ons

First, you might want to disable most of your extensions. Indeed, some will add scripts or rewrite part of the displayed HTML code making it hard to distinguish warnings and errors form your code than from the extensions. I kept Test Pilot , Multi-Account Containers and Firefox Lightbeam by Mozilla, Privacy Badger and HTTPS Everywhere by the Electronic Frontier Foundation, and DuckDuckGo Plus by DuckDuckGo.

A useful tool

Laboratory by April King is an extension that allows you to record your website in order to provide a ready CSP header. But it is no fun and it does not push you think about how to rewrite some parts of website in order to improve security. You can also enforce CSP, which is useful if you add another component, for example if you add your Twitter timeline, you will need to adjust your CSP headers. With this tool, it is easy to check, try, add CSP headers, that you can later add to you web server. But for the purpose of this small how to, do not use this extension. If installed, please make sure that none of the check boxes are enabled and that Generated CSP configuration: is set to default-src 'none' if not, click on Delete All Settings or it will be a nightmare.

Laboratory extension

Always keep Developer Tools opened

Another step to avoid spend plenty of time looking at irrelevant answers on Stack Exchange is to open the Developer Tools, go on the options tab and check Disable HTTP Cache (when toolbox is open), and then keep it open at all time. Otherwise, some requests can be cached and you will not understand why your brand new configuration is not taking into account.

Building your CSP headers without blocking content
Report only

We will use the Content-Security-Policy-Report-Only header. When the web browser receive the content of a web page, it will display warnings for each violated CSP directive. We will see later that it can be used in a better way.

Be verbose

I strongly advise to add all possible directives to the header because any warning or error not related to an explicitly defined directive will be reported as violating default-src. This is because directives inherit from default-src if not explicitly set. If you only have something like default-src 'none'; style-src 'self';, then an unsafe-inline for script-src will be reported as violating default-src.

So our starting point will be default-src 'none'; child-src 'none'; connect-src 'none'; font-src 'none'; frame-src 'none'; img-src 'none'; manifest-src 'none'; media-src 'none'; object-src 'none'; script-src 'none'; style-src 'none'; worker-src 'none'; base-uri 'none'; frame-ancestors 'self'; and from this, we will allow one by one what we need.

Send report to your website

We will use a very nice feature of CSP, report-uri. It makes web browsers send a JavaScript Object Notation (JSON) file to the web server at the specified Uniform Resource Identifier (URI). This file will give you the basic instructions to build the correct CSP header. In my case, I created a folder /csp/ with the proper ownership and write rights containing one file index.php

<?php
// Start configure
$log_file dirname(__FILE__) . '/csp-violations.log';
$log_file_size_limit 1000000// bytes - once exceeded no further entries are added
// End configuration
$current_domain $_SERVER['SERVER_NAME'];
http_response_code(204); // HTTP 204 No Content
$json_data file_get_contents('php://input');
// We pretty print the JSON before adding it to the log file
if ($json_data json_decode($json_data)) {
  
$json_data json_encode($json_dataJSON_PRETTY_PRINT JSON_UNESCAPED_SLASHES);
  
// Do not write is file size exceeded
  
if (filesize($log_file) > $log_file_size_limit) {
    exit(
0);
  }
  
file_put_contents($log_file$json_dataFILE_APPEND LOCK_EX);
}
?>
It is a simplified and adapted version of the code you can find here. It will log errors and warnings in /csp/csp-violations.log.

All together

In your virtual host, add the following line

Header set Content-Security-Policy-Report-Only "report-uri /csp/; default-src 'none'; child-src 'none'; connect-src 'none'; font-src 'none'; frame-src 'none'; img-src 'none'; manifest-src 'none'; media-src 'none'; object-src 'none'; script-src 'none'; style-src 'none'; worker-src 'none'; base-uri 'none'; frame-ancestors 'self';"
Mind the ' and the ". The set of directives must start and end with ". All arguments of each directive must be surrounded by ' if, and only if, they are keywords. The corollary is do not use ' to surround Uniform Resource Locator (URL) and URI.

Restart Apache HTTP Server

# systemctl restart apache2

Debugging

Visit your website. You should see in the Console tab of the Developer Tools.

In the /csp/ folder of you virtual host, you should see the csp-violations.log file. It contains lines like

{
     "csp-report": {
         "blocked-uri": "self",
         "document-uri": "https://clementfevrier.fr/images/r3.svg",
         "original-policy": "report-uri https://clementfevrier.fr/csp/ https://clementfevrier.fr/images/default-src https://clementfevrier.fr/images/'none'; child-src 'none'; connect-src 'none'; font-src 'none'; frame-src 'none'; img-src 'none'; manifest-src 'none'; media-src 'none'; object-src 'none'; script-src 'none'; style-src 'none'; worker-src 'none'",
         "referrer": "https://clementfevrier.fr/articles/11_rand.php",
         "script-sample": "onclick attribute on g element",
         "source-file": "https://clementfevrier.fr/images/r3.svg",
         "violated-directive": "script-src 'none'"
     }
 }
original-policy displays the CSP header. It is useful to ensure that it matches what you set in our web server. violated-directive tells you with directive you should adjust, in this example script-src and it also remind you its arguments 'none' because it is our starting point. blocked-uri tells you what it blocked, here it is self. So, you just need to replace script-src 'none' by script-src 'self' and the warning will disappear.

Repeat this for each violated directive.

Notice that without explicitly setting all directives, violated-directive will always report to default-src which makes it more difficult to debug.

For each directive that I set, I restart the web server, delete the log file, reload a page from my website in my web browser, and look at the new log.

Don't forget to check different pages of your web site to check all cases.

Your CSP log file does not report anymore violated directive? Let us add the real header.

Apply your CSP

Your header should look like

Header set Content-Security-Policy-Report-Only "report-uri /csp/; default-src 'none'; connect-src 'self'; font-src 'self'; frame-src 'none'; img-src 'self' data: https://toot.forumanalogue.fr; manifest-src 'none'; media-src 'self'; object-src 'self'; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline'; worker-src 'none'; base-uri 'none'; frame-ancestors 'self';
Just remove the -Report-Only and you are done.
Header set Content-Security-Policy "report-uri /csp/; default-src 'none'; connect-src 'self'; font-src 'self'; frame-src 'none'; img-src 'self' data: https://toot.forumanalogue.fr; manifest-src 'none'; media-src 'self'; object-src 'self'; script-src 'self' 'unsafe-inline'; style-src 'self' 'unsafe-inline'; worker-src 'none'; base-uri 'none'; frame-ancestors 'self';

Restart Apache HTTP Server

# systemctl restart apache2

Final checks

First, check that your website renders as you wish.

Then, there are couple of useful tools to perform checks on you CSP headers.

Check their recommendations and the links, they are full of useful informations.

As you can see, I still have to work to achieve a good CSP, but I know you to eliminate most of the so-called insecure inline code. I say so-called because for most of it, if not all of it, it is perfectly secure since I am not in the cases where it can be potentially insecure.

Changing you website and modifying your CSP headers

You can have both Content-Security-Policy-Report-Only and Content-Security-Policy at the time. It is useful to make a more restrictive CSP without blocking content. You can have one report-uri for the block content and one for the report only, which will simplify debugging.

Further reading

MDN Web Docs, by Mozilla, is the only website that I found with proper explanation and reference of CSP.

Captain's Log

Sunday Nov 19, 2017 01:30

Couple of news about my blog.

MySQLiToRSS

First, I fixed the error in the git repository for MySQLiToRSS. The git link actually linked to MySQLiToRSS repository and not to SHA1BruteForce anymore.

Comments

Second, I added a commentary section. I started my intellectual development by arguing with people on a forum, the Forum Analogue, so it is important for me to have people's reviews. I am planning on adding a RSS feed for each commentary section so it will be easy to follow. I will also add a cookie to auto fill the fields Name and Website. I will put a check box if you want to add the cookie when you comment.

Mastodon

In order to promote Mastodon, I added two buttons at the bottom of each article, one to follow me, the other one to share the article.


Mastodon Follow me Mastodon Share
<style>
.mastodon-intent-btn {
  display: inline-block;
  color: #fff; 
  text-decoration: none; 
  font-size: 14px; 
  line-height: 32px; 
  font-weight: 500; 
  background: #2b90d9; 
  border-radius: 4px; 
  padding: 0 18px; 
  padding-left: 16px; 
  font-family: Roboto, sans-serif; 
 box-sizing: content-box;
}
.mastodon-intent-btn img {
    width: 48px; 
    height: 48px; 
    vertical-align: middle; 
    padding-top: 10px; 
    padding-bottom: 10px; 
  }
</style>

<script>
[].forEach.call(document.querySelectorAll('.mastodon-intent-btn'), function (el) {
    el.addEventListener('click', function(e) {
    e.preventDefault();
    window.open(e.target.href, 'mastodon-intent', 'width=400,height=400,resizable=no,menubar=no,status=no,scrollbars=yes');
  });
});
</script>

<a href='web+mastodon://follow?uri=acct:me@myinstance.tld' class='mastodon-intent-btn' target="_blank">
  <img src='data:image/svg+xml;charset=utf-8;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSI2MS4wNzY5NTRtbSIgaGVpZ2h0PSI2NS40NzgzMW1tIiB2aWV3Qm94PSIwIDAgMjE2LjQxNDQgMjMyLjAwOTc2Ij48cGF0aCBkPSJNMjExLjgwNzM0IDEzOS4wODc1Yy0zLjE4MTI1IDE2LjM2NjI1LTI4LjQ5MjUgMzQuMjc3NS01Ny41NjI1IDM3Ljc0ODc1LTE1LjE1ODc1IDEuODA4NzUtMzAuMDgzNzUgMy40NzEyNS00NS45OTg3NSAyLjc0MTI1LTI2LjAyNzUtMS4xOTI1LTQ2LjU2NS02LjIxMjUtNDYuNTY1LTYuMjEyNSAwIDIuNTMzNzUuMTU2MjUgNC45NDYyNS40Njg3NSA3LjIwMjUgMy4zODM3NSAyNS42ODYyNSAyNS40NyAyNy4yMjUgNDYuMzkxMjUgMjcuOTQyNSAyMS4xMTYyNS43MjI1IDM5LjkxODc1LTUuMjA2MjUgMzkuOTE4NzUtNS4yMDYyNWwuODY3NSAxOS4wOXMtMTQuNzcgNy45MzEyNS00MS4wODEyNSA5LjM5Yy0xNC41MDg3NS43OTc1LTMyLjUyMzc1LS4zNjUtNTMuNTA2MjUtNS45MTg3NUM5LjIzMjM0IDIxMy44MiAxLjQwNjA5IDE2NS4zMTEyNS4yMDg1OSAxMTYuMDkxMjVjLS4zNjUtMTQuNjEzNzUtLjE0LTI4LjM5Mzc1LS4xNC0zOS45MTg3NSAwLTUwLjMzIDMyLjk3NjI1LTY1LjA4MjUgMzIuOTc2MjUtNjUuMDgyNUM0OS42NzIzNCAzLjQ1Mzc1IDc4LjIwMzU5LjI0MjUgMTA3Ljg2NDg0IDBoLjcyODc1YzI5LjY2MTI1LjI0MjUgNTguMjExMjUgMy40NTM3NSA3NC44Mzc1IDExLjA5IDAgMCAzMi45NzUgMTQuNzUyNSAzMi45NzUgNjUuMDgyNSAwIDAgLjQxMzc1IDM3LjEzMzc1LTQuNTk4NzUgNjIuOTE1IiBmaWxsPSIjZmZmIi8+PHBhdGggZD0iTTE3Ny41MDk4NCA4MC4wNzd2NjAuOTQxMjVoLTI0LjE0Mzc1di01OS4xNWMwLTEyLjQ2ODc1LTUuMjQ2MjUtMTguNzk3NS0xNS43NC0xOC43OTc1LTExLjYwMjUgMC0xNy40MTc1IDcuNTA3NS0xNy40MTc1IDIyLjM1MjV2MzIuMzc2MjVIOTYuMjA3MzRWODUuNDIzMjVjMC0xNC44NDUtNS44MTYyNS0yMi4zNTI1LTE3LjQxODc1LTIyLjM1MjUtMTAuNDkzNzUgMC0xNS43NCA2LjMyODc1LTE1Ljc0IDE4Ljc5NzV2NTkuMTVIMzguOTA0ODRWODAuMDc3YzAtMTIuNDU1IDMuMTcxMjUtMjIuMzUyNSA5LjU0MTI1LTI5LjY3NSA2LjU2ODc1LTcuMzIyNSAxNS4xNzEyNS0xMS4wNzYyNSAyNS44NS0xMS4wNzYyNSAxMi4zNTUgMCAyMS43MTEyNSA0Ljc0ODc1IDI3Ljg5NzUgMTQuMjQ3NWw2LjAxMzc1IDEwLjA4MTI1IDYuMDE1LTEwLjA4MTI1YzYuMTg1LTkuNDk4NzUgMTUuNTQxMjUtMTQuMjQ3NSAyNy44OTc1LTE0LjI0NzUgMTAuNjc3NSAwIDE5LjI4IDMuNzUzNzUgMjUuODUgMTEuMDc2MjUgNi4zNjg3NSA3LjMyMjUgOS41NCAxNy4yMiA5LjU0IDI5LjY3NSIgZmlsbD0iIzMwODhkNCIvPjwvc3ZnPg==' alt='Mastodon'>
  Follow me
</a>

<?php
$title 
"My title";
$url "https://mydomain.tld/myarticle.php";
$mastodon "<a href='web+mastodon://share?text=";
$mastodonText $title;
$mastodonText .= " " $url;

$url_reserved_char = array("!""#""$""&""'""("")""*""+"",""/"":"";""=""?""@""[""]");
$url_subtitute = array("%21""%23""%24""%26""%27""%28""%29""%2A""%2B""%2C""%2F""%3A""%3B""%3D""%3F""%40""%5B""%5D");
for (
$x 0$x <= 10$x++)
{
 
$mastodonText str_replace($url_reserved_char[$x], $url_subtitute[$x], $mastodonText);
}

$mastodonText str_replace(" ""+"$mastodonText);
$mastodon .= $mastodonText;
$mastodon .= "' class='mastodon-intent-btn' target=\"_blank\" >";
echo 
$mastodon
?>

<img src='data:image/svg+xml;charset=utf-8;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSI2MS4wNzY5NTRtbSIgaGVpZ2h0PSI2NS40NzgzMW1tIiB2aWV3Qm94PSIwIDAgMjE2LjQxNDQgMjMyLjAwOTc2Ij48cGF0aCBkPSJNMjExLjgwNzM0IDEzOS4wODc1Yy0zLjE4MTI1IDE2LjM2NjI1LTI4LjQ5MjUgMzQuMjc3NS01Ny41NjI1IDM3Ljc0ODc1LTE1LjE1ODc1IDEuODA4NzUtMzAuMDgzNzUgMy40NzEyNS00NS45OTg3NSAyLjc0MTI1LTI2LjAyNzUtMS4xOTI1LTQ2LjU2NS02LjIxMjUtNDYuNTY1LTYuMjEyNSAwIDIuNTMzNzUuMTU2MjUgNC45NDYyNS40Njg3NSA3LjIwMjUgMy4zODM3NSAyNS42ODYyNSAyNS40NyAyNy4yMjUgNDYuMzkxMjUgMjcuOTQyNSAyMS4xMTYyNS43MjI1IDM5LjkxODc1LTUuMjA2MjUgMzkuOTE4NzUtNS4yMDYyNWwuODY3NSAxOS4wOXMtMTQuNzcgNy45MzEyNS00MS4wODEyNSA5LjM5Yy0xNC41MDg3NS43OTc1LTMyLjUyMzc1LS4zNjUtNTMuNTA2MjUtNS45MTg3NUM5LjIzMjM0IDIxMy44MiAxLjQwNjA5IDE2NS4zMTEyNS4yMDg1OSAxMTYuMDkxMjVjLS4zNjUtMTQuNjEzNzUtLjE0LTI4LjM5Mzc1LS4xNC0zOS45MTg3NSAwLTUwLjMzIDMyLjk3NjI1LTY1LjA4MjUgMzIuOTc2MjUtNjUuMDgyNUM0OS42NzIzNCAzLjQ1Mzc1IDc4LjIwMzU5LjI0MjUgMTA3Ljg2NDg0IDBoLjcyODc1YzI5LjY2MTI1LjI0MjUgNTguMjExMjUgMy40NTM3NSA3NC44Mzc1IDExLjA5IDAgMCAzMi45NzUgMTQuNzUyNSAzMi45NzUgNjUuMDgyNSAwIDAgLjQxMzc1IDM3LjEzMzc1LTQuNTk4NzUgNjIuOTE1IiBmaWxsPSIjZmZmIi8+PHBhdGggZD0iTTE3Ny41MDk4NCA4MC4wNzd2NjAuOTQxMjVoLTI0LjE0Mzc1di01OS4xNWMwLTEyLjQ2ODc1LTUuMjQ2MjUtMTguNzk3NS0xNS43NC0xOC43OTc1LTExLjYwMjUgMC0xNy40MTc1IDcuNTA3NS0xNy40MTc1IDIyLjM1MjV2MzIuMzc2MjVIOTYuMjA3MzRWODUuNDIzMjVjMC0xNC44NDUtNS44MTYyNS0yMi4zNTI1LTE3LjQxODc1LTIyLjM1MjUtMTAuNDkzNzUgMC0xNS43NCA2LjMyODc1LTE1Ljc0IDE4Ljc5NzV2NTkuMTVIMzguOTA0ODRWODAuMDc3YzAtMTIuNDU1IDMuMTcxMjUtMjIuMzUyNSA5LjU0MTI1LTI5LjY3NSA2LjU2ODc1LTcuMzIyNSAxNS4xNzEyNS0xMS4wNzYyNSAyNS44NS0xMS4wNzYyNSAxMi4zNTUgMCAyMS43MTEyNSA0Ljc0ODc1IDI3Ljg5NzUgMTQuMjQ3NWw2LjAxMzc1IDEwLjA4MTI1IDYuMDE1LTEwLjA4MTI1YzYuMTg1LTkuNDk4NzUgMTUuNTQxMjUtMTQuMjQ3NSAyNy44OTc1LTE0LjI0NzUgMTAuNjc3NSAwIDE5LjI4IDMuNzUzNzUgMjUuODUgMTEuMDc2MjUgNi4zNjg3NSA3LjMyMjUgOS41NCAxNy4yMiA5LjU0IDI5LjY3NSIgZmlsbD0iIzMwODhkNCIvPjwvc3ZnPg==' alt='Mastodon'>
  Share
</a>
Favicon

I added a favicon to the blog.

Tags

I added tags for each articles, currently at the top of the article.

PhD thesis

I manage to get rid of the iframe to display my PhD thesis. I generate the html base using pdf2htmlEX, which needs new maintainers. After a bit of rewriting, I think that I achieved something better integrated to this blog.

If anyone is interested, I can publish the code, but because it is based on an unmaintained software, I do not know if it is worth the trouble.

Statistics

Last, but not least, I spend two days to rewrite AWStats from Perl to PHP and adding W3CSS. I manage to get rid of the iframe to integrate the statistics to the blog. Overall, I made it a lot more modular than the original script, or maybe I have this feeling because I know my own code better or because I am more comfortable in PHP.



I think one of the nicest improvements are the filters. They are using GET, which I kept but I do not display by default. In my version, I use JavaScript to make an on-the-fly filter.



Fonts

I fix few serious font issues. For now on, they should be displayed problem.

Future
TootLine

I am working on TootLine, the PHP code that allows you to share your TootLine on your blog, like the one there is on the right or bottom, depending on the size of your screen. I have couple of issues to address before publishing it, which are proper word wrapping, create a cache for the media in order to solve CSP issue, handle the NSFW content that is displayed so far.

Tags' categories

I am planning on adding clickable tags to fetch all articles with matching tags.

Translation

I already published a note in French and I began to write couple of drafts in French, so I would like to make a French version of this blog, with most of the articles translated.

rand()%range, or the wrong way to generate random numbers

Saturday Nov 11, 2017 03:44
The mathematics

In C++, rand() returns a random number between 0 and RAND_MAX, included. It is a uniform distribution, meaning that the probability Pn to have n ∈ [0, RAND_MAX] is Pn = 1∕(RAND_MAX+1) = P. Now, a common practice is to generate random numbers using the modulo operator %. rand()%range returns a number x ∈ [0, range-1]. The distribution is not uniform and favors smaller numbers. Indeed, the probabilities become

Px = n=0RAND_MAX Pn δn%range, x
Px = P × (floor(RAND_MAX∕range) + 1), for x ≤ RAND_MAX%range
P × floor(RAND_MAX∕range), otherwise.
One can see that Px is not a constant and the desired behavior is usually a uniform distribution with a probability Pu = 1/range.

Unrealistic working example

Let us assume that RAND_MAX = 2. rand() can return 0, 1 and 2. the probability P to have n ∈ [0, 2] is P = 1∕(2+1) = 1∕3. If range = 2, rand()%range has 3 cases.

  • rand() returns 0, 0%2 = 0.
  • rand() returns 1, 1%2 = 1.
  • rand() returns 2, 2%2 = 0.
Each with a probability P = 1∕3. So, Px=0 = Pn=0+Pn=2 = 2P = 2∕3 and Px=1 = Pn=1 = P = 1∕3, whereas a uniform probability would have been Pu = 1∕2. The ratio between the two probabilities is Px=0∕Px=1 = 2. It means that there is two times more chance to get 0 than 1. The relative error is ∣(Px=0-Px=1)∕Px=1∣ = 1.

Proof of concept

Here a small code that demonstrates the discussion above. It simulate the function rand() for RAND_MAX = 3.

main.cpp
    1 #include <cstdio>   // NULL
    2 #include <ctime>    // time
    3 #include <cstdlib>  // srand
    4 #include <random>   // std::random_device,
    5                     // std::uniform_int_distribution
    6                     // std::default_random_engine generator
    7 #include <iostream> // std::cout, ostream (std::endl)
    8 
    9 #include "functions.h"
   10 
   11 int main()
+  -  12 +-- 72 lines: {
12 {
|  13  // Value of rand_max
|  14  constexpr unsigned long long int rand_max = 2;
|  15  // Set range to maximize the error.
|  16  constexpr unsigned long int range = rand_max;
|  17  // Natural scale is rand_max + 1
|  18  constexpr unsigned long long int scale = rand_max + 1;
|  19  // Number of itrations to accumulate statistic
|  20  // In the unit of the natural scale
|  21  constexpr unsigned long long int Niterations = (unsigned long long int)500000 * scale;
|  22  // Random seed
|  23  srand(time(NULL));
|  24  // True random generator
|  25  std::random_device rd;
|  26  // Entropy of the true random generator
|  27  const auto entropy = rd.entropy();
|  28  std::cout<<"Entropy: "<<entropy<<std::endl;
|  29  // If the entropy is 0, it means there is no true random generator available (e.g. Apple computers)
|  30  // so a pseudo random number generator will be sused instead
|  31  if(entropy > 0)
|+ |- 32 +---  3 lines: {
32  {
|| 33   std::cout<<"True random generator"<<std::endl;
|| 34  }
|  35  else
|+ |- 36 +---  3 lines: {
36  {
|| 37   std::cout<<"Pseudo random generator"<<std::endl;
|| 38  }
|  39  // Uniform distribution to simultate rand() with our custom RAND_MAX
|  40  std::uniform_int_distribution< unsigned long int > distribution1(0, rand_max);
|  41  // Uniform distribution in the range [0, range-1]
|  42  std::uniform_int_distribution< unsigned long int > distribution2(0, range - 1);
|  43  std::cout<<"RAND_MAX: "<<rand_max<<std::endl;
|  44  std::cout<<"range:    "<<range<<std::endl;
|  45  std::cout<<"The "<<(rand_max % range) + 1;
|  46  std::cout<<" first probabilities are greater than the ";
|  47  std::cout<<range - ((rand_max % range) + 1);
|  48  std::cout<<" other ones by a factor:"<<std::endl;
|  49  std::cout<<"Predicted p(0)/p(range-1):                               ";
|  50  std::cout<<(((rand_max + 1) / range) + 1) / (float)(rand_max / range);
|  51  std::cout<<std::endl;
|  52  std::cout<<"Predicted relative error (p(0) - p(range-1))/p(range-1): ";
|  53  std::cout<<(((scale / range) + 1) - (scale / range)) / (float)(scale / range);
|  54  std::cout<<std::endl;
|  55 
|  56  std::cout<<" * Making statistic *"<<std::endl;
|  57 
|  58  std::cout<<" * rand()%range *"<<std::endl;
|  59  constexpr char nameMod [] = "results/mod3.dat";
|  60  if(entropy > 0)
|+ |- 61 +---  3 lines: {
61  {
|| 62   statistic1(rand_max, range, Niterations, nameMod, rd, distribution1, scale);
|| 63  }
|  64  else
|+ |- 65 +---  4 lines: {
65  {
|| 66   std::default_random_engine generator;
|| 67   statistic1(rand_max, range, Niterations, nameMod, generator, distribution1, scale);
|| 68  }
|  69 
|  70  std::cout<<" * uniform *"<<std::endl;
|  71  constexpr char nameUniform [] = "results/uniform3.dat";
|  72  if(entropy > 0)
|+ |- 73 +---  3 lines: {
73  {
|| 74   statistic2(rand_max, range, Niterations, nameUniform, rd, distribution2, scale);
|| 75  }
|  76  else
|+ |- 77 +---  4 lines: {
77  {
|| 78   std::default_random_engine generator;
|| 79   statistic2(rand_max, range, Niterations, nameUniform, generator, distribution2, scale);
|| 80  }
|  81 
|  82  return 0;
|  83 }
functions.h
       1 #ifndef _FUNCTIONS_H_
       2 #define _FUNCTIONS_H_
       3 
       4 #include <fstream>  // std::fstream
       5 #include <iostream> // std::cout, ostream (std::endl)
       6 #include <cmath>    // abs
       7 #include <cstdlib>  // rand
       8 
       9 void read(const auto name, auto & start, auto p)
+    -     10 +-- 23 lines: {
 10 {
|     11  for(unsigned char it = 0; it < 2; ++it)
|+   |-    12 +---  3 lines: {
 12  {
||    13   p[it] = 0;
||    14  }
|     15 
|     16  std::fstream file(name);
|     17  if(file.is_open())
|+   |-    18 +---  9 lines: {
 18  {
||    19   while(file.good())
||+  ||-   20 +----  5 lines: {
 20   {
|||   21    file>>start;
|||   22    file>>p[0];
|||   23    file>>p[1];
|||   24   }
||    25   file.clear();
||    26  }
|     27  file.close();
|     28 
|     29  std::cout<<"Start at trial number "<<start + 1;
|     30  std::cout<<" with probabilities "<<p[0] / (float)(start + 1);
|     31  std::cout<<" and "<<p[1] / (float)(start + 1)<<std::endl;
|     32 }
      33 
      34 void statistic1
      35 (
      36  const auto & rand_max,
      37  const auto & range,
      38  const auto & Niterations,
      39  const auto name,
      40  const auto & scale
      41 )
+    -     42 +-- 36 lines: {
 42 {
|     43  unsigned long long int * p = new unsigned long long int [2];
|     44  unsigned long int index = 0;
|     45  unsigned long long int start = 0;
|     46 
|     47  read(name, start, p);
|     48  std::ofstream file(name, std::ofstream::app);
|     49  for(unsigned long long int it = start + 1; it < Niterations; ++it)
|+   |-    50 +--- 21 lines: {
 50  {
||    51   index = rand() % range;
||    52   if(index == 0)
||+  ||-   53 +----  3 lines: {
 53   {
|||   54    ++p[0];
|||   55   }
||    56   else
||+  ||-   57 +----  6 lines: {
 57   {
|||   58    if(index == (range - 1))
|||+ |||-  59 +-----  3 lines: {
 59    {
||||  60     ++p[1];
||||  61    }
|||   62   }
||    63   if(it % scale == rand_max)
||+  ||-   64 +----  6 lines: {
 64   {
|||   65    if(p[0] != 0 && p[1] != 0)
|||+ |||-  66 +-----  3 lines: {
 66    {
||||  67     file<<it<<" "<<p[0]<<" "<<p[1]<<std::endl;
||||  68    }
|||   69   }
||    70  }
|     71  file.close();
|     72 
|     73  std::cout<<"Computed p(0)/p(range-1):                                ";
|     74  std::cout<<p[0]/(long double)p[1]<<std::endl;
|     75  std::cout<<"Computed relative error (p(0) - p(range-1))/p(range-1):  ";
|     76  std::cout<<std::abs((float)p[0] - p[1])/(long double)p[1]<<std::endl;
|     77 }
      78 
      79 void statistic2
      80 (
      81  const auto & rand_max,
      82  const auto & range,
      83  const auto & Niterations,
      84  const auto name,
      85  auto & generator,
      86  auto & distribution,
      87  const auto & scale
      88 )
+    -     89 +-- 36 lines: {
 89 {
|     90  unsigned long long int * p = new unsigned long long int [2];
|     91  unsigned long int index = 0;
|     92  unsigned long long int start = 0;
|     93 
|     94  read(name, start, p);
|     95  std::ofstream file(name, std::ofstream::app);
|     96  for(unsigned long long int it = start + 1; it < Niterations; ++it)
|+   |-    97 +--- 21 lines: {
 97  {
||    98   index = distribution(generator);
||    99   if(index == 0)
||+  ||-  100 +----  3 lines: {
100   {
|||  101    ++p[0];
|||  102   }
||   103   else
||+  ||-  104 +----  6 lines: {
104   {
|||  105    if(index == (range - 1))
|||+ |||- 106 +-----  3 lines: {
106    {
|||| 107     ++p[1];
|||| 108    }
|||  109   }
||   110   if(it % scale == rand_max)
||+  ||-  111 +----  6 lines: {
111   {
|||  112    if(p[0] != 0 && p[1] != 0)
|||+ |||- 113 +-----  3 lines: {
113    {
|||| 114     file<<it<<" "<<p[0]<<" "<<p[1]<<std::endl;
|||| 115    }
|||  116   }
||   117  }
|    118  file.close();
|    119 
|    120  std::cout<<"Computed p(0)/p(range-1):                                ";
|    121  std::cout<<p[0]/(long double)p[1]<<std::endl;
|    122  std::cout<<"Computed relative error (p(0) - p(range-1))/p(range-1):  ";
|    123  std::cout<<std::abs((float)p[0] - p[1])/(long double)p[1]<<std::endl;
|    124 }
     125 
     126 void statistic1
     127 (
     128  const auto & rand_max,
     129  const auto & range,
     130  const auto & Niterations,
     131  const auto name,
     132  auto & generator,
     133  auto & distribution,
     134  const auto & scale
     135 )
+    -    136 +-- 36 lines: {
136 {
|    137  unsigned long long int * p = new unsigned long long int [2];
|    138  unsigned long int index = 0;
|    139  unsigned long long int start = 0;
|    140 
|    141  read(name, start, p);
|    142  std::ofstream file(name, std::ofstream::app);
|    143  for(unsigned long long int it = start + 1; it < Niterations; ++it)
|+   |-   144 +--- 21 lines: {
144  {
||   145   index = distribution(generator) % range;
||   146   if(index == 0)
||+  ||-  147 +----  3 lines: {
147   {
|||  148    ++p[0];
|||  149   }
||   150   else
||+  ||-  151 +----  6 lines: {
151   {
|||  152    if(index == (range - 1))
|||+ |||- 153 +-----  3 lines: {
153    {
|||| 154     ++p[1];
|||| 155    }
|||  156   }
||   157   if(it % scale == rand_max)
||+  ||-  158 +----  6 lines: {
158   {
|||  159    if(p[0] != 0 && p[1] != 0)
|||+ |||- 160 +-----  3 lines: {
160    {
|||| 161     file<<it<<" "<<p[0]<<" "<<p[1]<<std::endl;
|||| 162    }
|||  163   }
||   164  }
|    165  file.close();
|    166 
|    167  std::cout<<"Computed p(0)/p(range-1):                                ";
|    168  std::cout<<p[0]/(long double)p[1]<<std::endl;
|    169  std::cout<<"Computed relative error (p(0) - p(range-1))/p(range-1):  ";
|    170  std::cout<<std::abs((float)p[0] - p[1])/(long double)p[1]<<std::endl;
|    171 }
     172 
     173 #endif
You can download the files by clicking on their name. To compile,
% g++ -std=gnu++14 main.cpp
Of course, you can add more options, like -Ofast and other that will speed up the code. Create a folder results before executing the code
% mkdir results
and finally, execute it
% ./a.out
It should last less than a second. It generate two files in the folder results. Each contains 3 columns, the number of iterations, Px=0, and Px=1. mod.dat contains the results when using rand()%RAND_MAX and uniform.dat the results for a uniform distribution.

Let us now look at the results. I plotted the expected probabilities for a uniform distribution Pu, P and 2P has explained in the previous section. I also plotted the probabilities Pm using the modulo operator and Pu using a uniform distribution. I plotted them in the natural scale in this context, (RAND_MAX+1).

After only few tens of iterations per (RAND_MAX+1), the probabilities are already stables and converged to the probabilities computed above. To ensure the stability, let us draw for a large number of iterations.

After 1500000 iterations, there is no doubt about the stability. It clearly shows that using modulo operator favors smaller number up to 2 times the largest.

Of course, each set of data will look different, but they will converge to the same results. I ran this code many times with the same outcome.

Here the gnuplot files to reproduce the plots

results3_200.gnu
set term svg enhanced mouse
set style fill transparent
set key at 190,0.66
set title 'Convergence after 600 iterations'
set xlabel 'Number of iterations / (RAND\_MAX + 1)'
set ylabel 'Probabilities'
set output "r3_200.svg"
set xrange[0:200]
set yrange[0.3:0.7]
plot 'results/mod3.dat' u ($1/3.):($2/$1) t 'P_m(x=0)' w lp, \
     'results/mod3.dat' u ($1/3.):($3/$1) t 'P_m(x=1)' w lp, \
     'results/uniform3.dat' u ($1/3.):($3/$1) t 'P_u(x=0)' w lp, \
     'results/uniform3.dat' u ($2/3.):($3/$1) t 'P_u(x=1)' w lp, \
     1./3. lw 4 t 'P', \
     2./3. lw 4 t '2P', \
     1./2. lw 4 t 'P_u'
results3.gnu
set term svg enhanced mouse
set style fill transparent
set key at 499000,0.66
set title 'Convergence after 1500000 iterations'
set xlabel 'Number of iterations / (RAND\_MAX + 1)'
set ylabel 'Probabilities'
set output "svg/r3.svg"
set xrange[0:500000]
set yrange[0.3:0.7]
plot 'results/mod3.dat' u ($1/3.):($2/$1) t 'P_m(x=0)' w l, \
     'results/mod3.dat' u ($1/3.):($3/$1) t 'P_m(x=1)' w l, \
     'results/uniform3.dat' u ($1/3.):($3/$1) t 'P_u(x=0)' w l, \
     'results/uniform3.dat' u ($2/3.):($3/$1) t 'P_u(x=1)' w l, \
     1./3. lw 1 t 'P', \
     2./3. lw 1 t '2P', \
     1./2. lw 1 t 'P_u'

General proof of concept

I entitled unrealistic example above because RAND_MAX is most likely defined as (232∕2)-1, i.e. the greatest int. The worst case happens when range = RAND_MAX. Indeed, the probability to have 0, Pm(x=0), is twice the any other probability Pm(x≠0). Using the same notation as above, Pm(x=0) = 2P and Pm(x≠0) = P.
Notice that P = Pu.

I chose to compare Pm(x=0) with Pm(x=RAND_MAX-1). In the code above, one can just change the variable rand_max by RAND_MAX on line 14, and call the proper function statistic1 that uses the actual rand() and that is enough. However, it will take too long to terminate (few years). So one should replace Niterations by a smaller number, for example (unsigned long long int)50 * scale = 107374182350. It should take couple of hours to run and should be enough to see the trend.

Let me show you the new main

main.cpp
    1 #include <cstdio>   // NULL
    2 #include <ctime>    // time
    3 #include <cstdlib>  // srand, RAND_MAX
    4 #include <random>   // std::random_device,
    5                     // std::uniform_int_distribution
    6                     // std::default_random_engine generator
    7 #include <iostream> // std::cout, ostream (std::endl)
    8 
    9 #include "functions.h"
   10 
   11 int main()
+  -  12 +-- 62 lines: {
12 {
|  13  // Value of rand_max
|  14  constexpr unsigned long long int rand_max = RAND_MAX;
|  15  // Set range to maximize the error.
|  16  constexpr unsigned long int range = rand_max;
|  17  // Natural scale is rand_max + 1
|  18  constexpr unsigned long long int scale = rand_max + 1;
|  19  // Number of itrations to accumulate statistic
|  20  // In the unit of the natural scale
|  21  constexpr unsigned long long int Niterations = (unsigned long long int)50 * scale;
|  22  // Random seed
|  23  srand(time(NULL));
|  24  // True random generator
|  25  std::random_device rd;
|  26  // Entropy of the true random generator
|  27  const auto entropy = rd.entropy();
|  28  std::cout<<"Entropy: "<<entropy<<std::endl;
|  29  // If the entropy is 0, it means there is no true random generator available (e.g. Apple computers)
|  30  // so a pseudo random number generator will be sused instead
|  31  if(entropy > 0)
|+ |- 32 +---  3 lines: {
32  {
|| 33   std::cout<<"True random generator"<<std::endl;
|| 34  }
|  35  else
|+ |- 36 +---  3 lines: {
36  {
|| 37   std::cout<<"Pseudo random generator"<<std::endl;
|| 38  }
|  39  // Uniform distribution in the range [0, range-1]
|  40  std::uniform_int_distribution< unsigned long int > distribution2(0, range - 1);
|  41  std::cout<<"RAND_MAX: "<<rand_max<<std::endl;
|  42  std::cout<<"range:    "<<range<<std::endl;
|  43  std::cout<<"The "<<(rand_max % range) + 1;
|  44  std::cout<<" first probabilities are greater than the ";
|  45  std::cout<<range - ((rand_max % range) + 1);
|  46  std::cout<<" other ones by a factor:"<<std::endl;
|  47  std::cout<<"Predicted p(0)/p(range-1):                               ";
|  48  std::cout<<(((rand_max + 1) / range) + 1) / (float)(rand_max / range);
|  49  std::cout<<std::endl;
|  50  std::cout<<"Predicted relative error (p(0) - p(range-1))/p(range-1): ";
|  51  std::cout<<(((scale / range) + 1) - (scale / range)) / (float)(scale / range);
|  52  std::cout<<std::endl;
|  53 
|  54  std::cout<<" * Making statistic *"<<std::endl;
|  55 
|  56  std::cout<<" * rand()%range *"<<std::endl;
|  57  constexpr char nameMod [] = "results/mod.dat";
|  58  statistic1(rand_max, range, Niterations, nameMod, scale);
|  59 
|  60  std::cout<<" * uniform *"<<std::endl;
|  61  constexpr char nameUniform [] = "results/uniform.dat";
|  62  if(entropy > 0)
|+ |- 63 +---  3 lines: {
63  {
|| 64   statistic2(rand_max, range, Niterations, nameUniform, rd, distribution2, scale);
|| 65  }
|  66  else
|+ |- 67 +---  4 lines: {
67  {
|| 68   std::default_random_engine generator;
|| 69   statistic2(rand_max, range, Niterations, nameUniform, generator, distribution2, scale);
|| 70  }
|  71 
|  72  return 0;
|  73 }

At the time of writing, I launched the code a bit more than two hour ago, so only the part for rand()%range is done. The second part seems to have difficulties to accumulate statistic. After 107374182350 iterations, the probabilities are still 0. I launched it on my laptop, a MacBook Pro from my lab (without Mac OS, but Ubuntu instead), but it does not have a hardware random generator. Maybe the issue comes from here. Latter, I will try on my server which have one.

Anyway, let us look at what we obtained so far.

Well, it looks good. The probabilities are converging to the expected one.

It shows that even in the case of a real life RAND_MAX, in this case, there are two times more chance to obtain 0 than any other number.

Here the gnuplot file to reproduce the plot

results.gnu
set term svg enhanced mouse
set style fill transparent
#set key at 499000,0.66
set title 'Convergence after 107374182350 iterations'
set xlabel 'Number of iterations / (RAND\_MAX + 1)'
set ylabel 'Probabilities (10^-^9)'
set output "r.svg"
set xrange[0:50]
#set yrange[0.3:0.7]
plot 'results/mod.dat' u ($1/2147483648.):($2/$1*1000000000.) t 'P_m(x=0)' w lp lc 1, \
     'results/mod.dat' u ($1/2147483648.):($3/$1*1000000000.) t 'P_m(x=RAND\_MAX-1)' w lp lc 2, \
     'results/uniform.dat' u ($1/2147483648.):($2/$1*1000000000.) t 'P_u(x=0)' w lp lc 3, \
     'results/uniform.dat' u ($1/2147483648.):($3/$1*1000000000.) t 'P_u(x=1)' w lp lc 4, \
     1./2147483648.*1000000000. lw 1 t 'P, P_u', \
     2./2147483648.*1000000000. lw 1 t '2P'

The worst and the best, a short story on prime numbers

Let us see what are the best and the worst cases.

The best

Let us assume that RAND_MAX is an odd number, which most likely the case because probably defined as (28×2n)∕2-1. It implies that RAND_MAX%2 = 1. So, Pm(x=0) = Pm(x=1). It means that for an odd RAND_MAX, it is always safe to generate random numbers x ∈[0, 1]. It is not the only case. As a general rule, it is always safe to generate random numbers when (RAND_MAX+1)%range = 0.

The worst

The worst is if RAND_MAX+1 is a prime number. Indeed, the condition (RAND_MAX+1)%range = 0 can be never be satisfied to generate random numbers using the modulo operator.
In practice, it is unlikely to happen because of the definition above implies that RAND_MAX+1 will be even.

But as a general rule, I use two criteria, the rarity and the ratio between the two different possible probabilities. With these two criteria, the worst is considered when a single number has a bigger probability than the other ones. In fact, it will be always 0 and it happens each time the following condition is fulfilled RAND_MAX%range = 0. The ratio increases as range increases and thus is maximum for range = RAND_MAX.

A concluding word

I hope that you will be careful next time that you will generate random numbers. C++, and especially C++11, provides great useful tools to tackles this problems. Using former standards or C, you will just need to write a simple function that will make your distribution uniform again. I advice you to look at the default one, It is pretty simple, easy to understand, and quick to code.

I hope that you enjoyed it. There is comment section yet, I am writing my own, but feel free to contact me for review, comment, thoughts, using social media like Mastodon, emails, XMPP, etc.
Share if you liked it :)

MySQLiToRSS

Wednesday Nov 08, 2017 00:54

I wrote a new piece of code to generate a RSS feed from a MySQL database. I named it MySQLiToRSS. It is a PHP file that generate a RSS 2.0 feed. It is licensed under GPLv3. It handles sorting by date, the use of HTML in the description of an item, which allow to render the article as it appears in the feed reader, excepted for the CSS sheets, and multitag articles. It is based on Version 2.0.11 of the RSS 2.0 specification, the most up to date at the time of writing this software. Some optional items are missing because I do not use them. I might add them later (some or all). I will be pleased to add optional items if requested.

SHA1BruteForce

Wednesday Nov 08, 2017 00:05

I'm proud to announce my first (free) software, SHA1BruteForce, that performs brute-force attack to crack SHA-1 hash.
Page of the project

Why?

After my 10yo Firefox session crashed, I lost a password stored in it. But, I managed to find the hash and it turns out to be a SHA-1 hash (software installed in 2009 on my server). I could change it, I guess, but I knew that SHA-1 is now considered as a weak encryption (although the first real collision is from February), so I challenged myself to recover it by writing a piece of code that do the job. It took me a bit more than a day to achieve a working code.
After, I began to optimized it and to have fundamental questioning about C++.

It is pretty simple and it seems to perform well. It takes about 4h to crack any 6 characters password on my computer. So I decided to publish it on my server, which was not as simple as I expected.
But I also have account in the main git platforms.
So I also published it on GitHub, GitLab and FramaGit, more know by the French.

It is licensed under GPL3.

It performs the tasks on the CPU only. GPU implementation does not seems possible at the time using only free software. Indeed, CUDA required the proprietary drivers and OpenCL does not seems to work properly with Nouveau (last version of the Linux kernel, i.e.4.13, on Ubuntu 16.04). But I want to use only free software (and I cannot install Nvidia drivers anyway, they do not work on my system).

It is not a revolutionary tools that intends to bit existing ones. I did it for myself and share it for anyone interested.

It is my first published code, so there are most likely some improvements to do on how to write the manual, how to write the code so it can be used by others, how I should comment it, and so on. The same goes for the code itself. Feel free to comment, share, submit commits, report bugs, etc.

A × C = 8, ou pourquoi quand je me présente la réaction est j'ai vu un reportage sur Arte.

Monday Oct 30, 2017 22:30
Tu fais quoi dans la vie ?

Je suis docteur en physique théorique. Quand je dis que j'ai fait une thèse en physique, j'ai souvent comme réaction

≪ Ah oui, une fois j'ai vu un reportage en astrophysique sur Arte. ≫
Cela m'est encore arrivé ce vendredi. Je viens d'un domaine où expliquer ce que je fais ou étudie n'est pas directement accessible au public. Ce n'est généralement pas le cas dans les sciences humaines. Ces derniers peuvent souvent donner leur sujet de thèse et être compris, par exemple, Les recueils de poésie funèbre imprimés en Italie, en France et dans les Îles britanniques (1587-1644). Je n'ai jamais étudié la littérature comparée et pourtant je comprends ce titre de thèse. Mon titre de thèse est Nouvelles phases électroniques avec orbitales eg dans les réseaux triangulaires. Je ne le dis jamais car plus d'un mot sur deux est incompréhensible par le public. Je dis encore moins ce que je fais en pratique car cela rend le sujet encore plus obscur. En effet, je vais souvent faire des calculs du genre A × B = 8. Et c'est pourtant extrêmement simple dès qu'on explique par étape.

Le système

J'ai un réseau, c'est à dire que mon système est une collection de points, appelés sites. Par exemple, une ligne de 4 sites
× × × ×

L'objet

J'étudie des électrons. Ils ont la particularité de ne pas pouvoir se trouver au même endroit en même temps. Ici, endroit a une définition un peu particulière car ça ne se réfère pas à un lieu, mais à un état. Oublions cet aspect pour des raisons de simplicité. Cela veut dire que sur chaque site ×, je peux avoir soit 0, soit 1 électron. Voici toutes les possibilités :

××××
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
Chaque ligne représente ce que l'on appelle un état, différent de l'état d'un électron. L'ordre est arbitraire et j'ai fait le choix du binaire naturel. Il y en 16 et une manière de les représenter est de leur assigner un symbole, par exemple les 10 chiffres puis les premières lettres de l'alphabet :
0000=0
0001=1
0010=2
0011=3
0100=4
0101=5
0110=6
0111=7
1000=8
1001=9
1010=A
1011=B
1100=C
1101=D
1110=E
1111=F
C'est la convention utilisée en hexadécimal.

La problématique

On considère deux états, par exemple 1010 (A) et 1100 (C), et on se demande quels sont les sites occupés par les deux états. Pour cela, on se demande pour chaque site s'il y a un électron sur A et C. et se comporte comme on peut si attendre.

  • 0 et 0 ne donne rien (0).
  • 0 et 1 ne donne rien (0).
  • 1 et 0 ne donne rien (0).
  • 1 et 1 donne quelque chose (1).
C'est de l'algèbre de Boole. Ces quatre lignes, que l'on appelle table de vérité, car on associe généralement 0 à Faux et 1 à Vrai, se comportent exactement comme une multiplication.
0×0=0
0×1=0
1×0=0
1×1=1
En répétant cette opération pour chaque site, on peut écrire A × C = 1010 × 1100.
1010
× 1100
= 1000
Seul le site le plus à gauche est occupé par les deux états, 1000. En regardant la liste plus haut, on voit que cet état, 1000, correspond à 8. Nous avons donc bien A × C = 8. Rien de mystique ou de compliqué là dedans.

Le scientifique dans la société

En fait, même des sujets qui apparaissent compliqués sont seulement une association de concepts simples comme celui que je viens de présenter. Malgré cela, la représentation pour le grand public des sciences dures est bien loin de la vérité et perdure avec des documentaires et des divertissements avec pleins de représentations artistiques des travaux et d'équations compliquées. Cela donne par exemple dans un épisode de The Big Bang Theory

Bien que d'apparence compliquée, cela est assez simple. Il y a premièrement une erreur dans la question.

≪ Solve the equation! ≫
Vous noterez l'intonation quasi prophétique. Ce n'est pas une équation. Une équation comporte un signe égal et au moins une inconnue à trouver. Ici, ce qu'il faut comprendre, c'est calculer l'intégrale. L'intégrale en question est du niveau d'un première année de master en physique. Une intégrale est juste une sorte de somme. Quelqu'un comme Sheldon Cooper, le personnage au maillot jaune dans la série, ne devrait pas paraître si surpris. Si on reprend rapidement ensemble cette somme, on peut vite la comprendre.
The Big Bang Theory
On a un électron, à gauche, qui interagit avec un muon, à droite, grâce à un photon, au milieu. Si on prend le carré du module de cette somme, on obtient la probabilité d'interaction entre un électron et un muon, appelée section efficace. Prendre le carré veut dire multiplier le résultat par lui-même, par exemple x × x. Le module est la distance par rapport à 0, par exemple 1 est à 1 de 0 et -2 à 2 de 0. Lorsque les objets sont très petits, les règles changent et les choses ne se passent pas toujours comme notre entendement peut le percevoir. On ne sait pas si deux particules vont interagir, on calcule donc leur probabilité d'interaction. C'est ce à quoi cette somme sert. Il y a quelques règles à respecter en physique comme la conversation de l'énergie et de l'impulsion, c'est le terme entouré en bleu sur la seconde ligne. Un étudiant en physique passera environ deux heures à calculer un tel terme la première fois. La partie supérieure avec le diagramme est juste une manière plus élégante et courte d'écrire la somme qui se trouve en bas. Pourtant ces écritures sont biens équivalentes. Tout cela pour vous dire que tout cela paraît compliqué car c'est juste écrit dans une autre langue, la physique, mais que la traduction est compréhensible.

Server Git repository with Apache HTTP Server - Smart HTTP

Wednesday Oct 25, 2017 18:59, last edition on Wednesday Oct 25, 2017 20:17
There is a lot of outdated documentations about how to make your git repositories available through HTTP, even in the official Reference Manual. There are not working, not well documented, and trying to adjust them can lead to serious security issues like allowing anonymous users to push commits. After hours of reading the official manual, discussions about people struggling with the configuration, I finally managed to have a working set up with the expected behavior. What I want to do is sharing one of my repositories through HTTP with Apache HTTP Server with anonymous users allowed to pull and clone, and only register users allowed to push, which is the expected behavior in most cases. This for Apache HTTP Server version 2.4 because the configuration changed compare to version 2.2. The first step is to enable the required mods
# a2enmod cgi alias env
# systemctl restart apache2
With mpm event instead of prefork, which the case is you enables HTTP/2 on Ubuntu Xenial, you will have a warning about the fact it enable cgid, not cgi, but it does not affect this configuration. The second step is to change your virtual host. You need to choose a method to authenticate the users. Because I will be the only one to push commit, I will use the most simple configuration which is AuthType Basic. You can choose other ones, but I will not cover it. You need to create a file that contains users and their password, if not already done. To create the user USER in the file /etc/htpasswd/.htpasswd
# mkdir -p /etc/htpasswd/
# htpasswd -c /etc/htpasswd/.htpasswd USER
Now, if you want to server repositories stored in /var/www/git with the URL mydomain.tld/git/ change your virtual host by adding the following lines
SetEnv GIT_PROJECT_ROOT /var/www/git
SetEnv GIT_HTTP_EXPORT_ALL
ScriptAlias /git/ /usr/lib/git-core/git-http-backend/
<Files "git-http-backend">
 AuthType Basic
 AuthName "Git Access"
 AuthUserFile /etc/htpasswd/.htpasswd
 Require expr !(%{QUERY_STRING} -strmatch '*service=git-receive-pack*' || %{REQUEST_URI} =~ m#/git-receive-pack$#)
 Require valid-user
</Files>
Restart Apache HTTP Server and your are done!
# systemctl restart apache2
This is done!
To go a bit further about git, let us consider that you have a local repository called MyProject that you want to share. On the webserver, you need to initialize the repository
# mkdir -p /var/git/MyProject.git
# cd /var/git/MyProject.git
# git init --bare
# chown -R www-data:www-data /var/git/
The last command set the proper ownership on the files. If you don't do it, you will be able to push locally but not remotely. For the next repository, you will not need to perform the command on the whole directory. Just do
# mkdir /var/git/MyNEWProject.git
# cd /var/git/MyNEWProject.git
# git init --bare
# chown -R www-data:www-data /var/git/MyNEWProject.git
Now, push your project from your computer to the web server
% git remote set-url origin --push --add https://USER:PASSWORD@mydomain.tld/git/MyProject.git
% git remote -v
origin https://USER:PASSWORD@mydomain.tld/git/MyProject.git
% git push Delta compression using up to 8 threads.
Compressing objects: 100% (9/9), done.
Writing objects: 100% (9/9), 20.15 KiB | 0 bytes/s, done.
Total 9 (delta 1), reused 0 (delta 0)
To https://USER:PASSWORD@mydomain.tld/git/MyProject.git
 * [new branch]      master -> master
The second line just check that the previous command worked. On another computer, you can clone the project on other computers with
% git clone https://mydomain.tld/git/MyProject.git
If you try to push from the second computer, you will need to authenticate
% git push
Username for 'https://mydomain.tld': USER
Password for 'https://USER@mydomain.tld':
Counting objects: 2, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (2/2), 222 bytes | 0 bytes/s, done.
Total 2 (delta 1), reused 0 (delta 0)
To https://mydomain.tld/git/MyProject.git
   bc49af8..f7121b5  master -> master
To add the authentication
% git remote set-url origin --push --delete https://mydomain.tld/git/MyProject.git
% git remote set-url origin --push --add https://USER:PASSWORD@mydomain.tld/git/MyProject.git
and you are done!

Are ^ and != operators equivalent on unsigned char in C++?

Wednesday Oct 25, 2017 03:51, last edition on Wednesday Oct 25, 2017 03:57

I was trying to optimize a C++ code where I check if two unsigned char are different. So the first thing that I wrote was n1 != n2. unsigned char are an integer type, and an efficient way to perform this operation is an exclusive or using bitwise operators, which is ^ in C++. So I could simply change my code to n1 ^ n2. But I was wondering if there are equivalent or if one of them is more efficient. I wrote the simplest functions for each operation in separate files.

% cat neq.cpp
bool f(unsigned char n1, unsigned char n2)
{
 return n1 != n2;
}
% cat xor.cpp
bool f(unsigned char n1, unsigned char n2)
{
 return n1 ^ n2;
}
Then, I compiled them to produce assembler code with the -S option of GCC
% g++ -S neq.cpp xor.cpp
For each source file .cpp, the corresponding assembler code is store in a .s file. If we compare the two file, they are equivalent, of course apart for the file name of the source code
% diff neq.s xor.s
1c1
< .file "neq.cpp"
---
> .file "xor.cpp"
Using ^ or != are strictly equivalent for unsigned char.

My initial goal was to optimize my code but there is nothing that we can do to improve it in C++. To perform optimizations, we will use the compiler. Let us take a look at the assembler code, where the first line is skipped because irrelevant

.text .globl _Z1fhh .type _Z1fhh, @function _Z1fhh: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 movl %edi, %edx movl %esi, %eax movb %dl, -4(%rbp) movb %al, -8(%rbp) movzbl -4(%rbp), %eax cmpb -8(%rbp), %al setne %al popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE0: .size _Z1fhh, .-_Z1fhh .ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609" .section .note.GNU-stack,"",@progbits
Let us compile with different sets of flats, -O, or -O1 which is equivalent, will drastically reduce the number of instruction passed to the CPU. Indeed, the block between .cfi_startproc and .cfi_endproc will be reduced to 3 instructions, against 15 without flag.
.text .globl _Z1fhh .type _Z1fhh, @function _Z1fhh: .LFB0: .cfi_startproc cmpb %sil, %dil setne %al ret .cfi_endproc .LFE0: .size _Z1fhh, .-_Z1fhh .ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609" .section .note.GNU-stack,"",@progbits
-O2 can be used, -O3 and -Ofast will not affect the assembler code. In my case, the option -march (-march=bdver2) also affects the generate assembler code when combine with -O2. There is also -Os to optimize for size. Because I cannot judge which one is the best based on the assembler code, let us benchmark the operator with a dummy program
% cat main.cpp
int main()
{
 for(unsigned char n1 = 0; n1 < 255; ++n1)
 {
  for(unsigned char n2 = 0; n2 < 255; ++n2)
  {
   for(unsigned short int i = 0; i < 65535; ++i)
   {
    n1 ^ n2;
   }
  }
 }
 unsigned char n1 = 255;
 for(unsigned char n2 = 0; n2 < 255; ++n2)
 {
  for(unsigned short int i = 0; i < 65535; ++i)
  {
   n1 ^ n2;
  }
 }
}
Then, let us write a script to display the computation time in each case
% cat script.sh
#!/bin/bash

printf "g++ main.cpp; time ./a.out"
g++ main.cpp; time ./a.out
printf "\n"
printf "g++ -O1 main.cpp -o o1; time ./o1"
g++ -O1 main.cpp -o o1; time ./o1
printf "\n"
printf "g++ -O2 main.cpp -o o2; time ./o2"
g++ -O2 main.cpp -o o2; time ./o2
printf "\n"
printf "g++ -O2 -march=bdver2 main.cpp -o march; time ./march"
g++ -O2 -march=bdver2 main.cpp -o march; time ./march
printf "\n"
printf "g++ -Os main.cpp -o os; time ./os"
g++ -Os main.cpp -o os; time ./os
We now makethis script executable
% chmod +x script.sh
and we execute it
% ./script.sh g++ main.cpp; time ./a.out real 0m11.138s user 0m11.138s sys 0m0.000s g++ -O1 main.cpp -o o1; time ./o1 real 0m2.649s user 0m2.645s sys 0m0.004s g++ -O2 main.cpp -o o2; time ./o2 real 0m0.002s user 0m0.001s sys 0m0.000s g++ -O2 -march=bdver2 main.cpp -o march; time ./march real 0m0.002s user 0m0.001s sys 0m0.000s g++ -Os main.cpp -o os; time ./os real 0m0.002s user 0m0.001s sys 0m0.000s
With no doubt, the flags optimizes the code. But we don't have enough data to distinguish between -O2, -O2 -march=bdver2 and -Os, so let us change the code to deasable some optimizations closer to real life case and perform it only on these three last cases because the first two cases might take a very long time.
 % cat main2.cpp
#include <iostream> // std::cout
#include <ostream> // std::endl

bool f(const unsigned char & n1, const unsigned char & n2)
{
 return n1 ^ n2;
}

int main()
{
 bool * tmp = new bool [256];
 for(unsigned short int i = 0; i < 256; ++i)
 {
  tmp[i] = false;
 }
 for(unsigned char n1 = 0; n1 < 255; ++n1)
 {
  for(unsigned char n2 = 0; n2 < 255; ++n2)
  {
   {
    for(unsigned short int i = 0; i < 65535; ++i)
    {
     tmp[n1] += f(n1, n2);
    }
   }
  }
 }
 unsigned char n1 = 255;
 for(unsigned char n2 = 0; n2 < 255; ++n2)
 {
  {
   for(unsigned short int i = 0; i < 65535; ++i)
   {
    tmp[n1] += f(n1, n2);
   }
  }
 }
 std::cout<  return 0;
}
Here the new version of the script to test only the relevent flags
% cat script2.sh
#!/bin/bash

printf "g++ -O2 main2.cpp -o o2; time ./o2"
g++ -O2 main2.cpp -o o2; time ./o2
printf "\n"
printf "g++ -O2 -march=bdver2 main2.cpp -o march; time ./march"
g++ -O2 -march=bdver2 main2.cpp -o march; time ./march
printf "\n"
printf "g++ -Os main2.cpp -o os; time ./os"
g++ -Os main2.cpp -o os; time ./os
Now, let us execute this script
% ./script2.sh g++ -O2 main2.cpp -o o2; time ./o2 0x13cfc20 real 0m3.803s user 0m3.803s sys 0m0.001s g++ -O2 -march=bdver2 main2.cpp -o march; time ./march 0x16d0c20 real 0m3.843s user 0m3.841s sys 0m0.000s g++ -Os main2.cpp -o os; time ./os 0x1cc1c20 real 0m4.962s user 0m4.953s sys 0m0.008s
Running several times gives similar results. -Os made a code a little bit slower than the ones produced by -O2 and -O2 -march=bdver2. Notice that forcing inlining of f improves the computation time for -Os and give similar results than for the other flags.

About 50% of the time-O2 is faster -O2 -march=bdver2, but the relative difference is less than 1%. There is no clear gain from -march=bdver2 in this case.

.section .text.unlikely,"ax",@progbits .LCOLDB0: .text .LHOTB0: .p2align 4,,15 .globl _Z1fhh .type _Z1fhh, @function _Z1fhh: .LFB0: .cfi_startproc cmpb %sil, %dil setne %al ret .cfi_endproc .LFE0: .size _Z1fhh, .-_Z1fhh .section .text.unlikely .LCOLDE0: .text .LHOTE0: .ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609" .section .note.GNU-stack,"",@progbits
Here the difference in the f function
% diff o2.s march.s
6c6,7
< .p2align 4,,15
---
> .p2align 4,,10
> .p2align 3
To conclude, we first showed that ^ and != are strictly equivalent for unsigned char, then that the maximum of optimization for this operator is achived with -O2 and finally that -march=bdver2 does not provide noticeable improvement. In a real life code, other optimizations can make -Os performing better than -O2 and other flags, such as -O3, might affect the executable.

View server certificate in Thunderbird, a 11yo request.

Friday Oct 20, 2017 18:43, last edition on Friday Oct 20, 2017 19:23

While I was configuring the certificate for my mail server, I wanted to check if everything works fine in Thunderbird. It turns out that one cannot check the server certificates in Thunderbird. With a simple web search, I found a post from August 2006, i.e. more than 11 years ago, with the same question. The answer was that it was not possible and no extension doing it exists. I asked to see if things changed since them and the answer is no. This process is very simple in Firefox.

Check certificate #1
Check certificate #2
Check certificate #3
Check certificate #4

In my opinion, this should be a critical feature to have.

SVG with GZIP Compression in Apache HTTP Server

Friday Oct 20, 2017 16:45, last edition on Friday Oct 20, 2017 17:42

To save bandwidth with Apache HTTP Server, you can compress SVG files. On Ubuntu 16.04.3, Apache HTTP Server comes with version 2.4.27 which enable GZIP Compression by default.

# apache2 -v
Server version: Apache/2.4.27 (Ubuntu)
Server built: 2017-09-28T00:00:00
If not enable, you can do it with mod_deflate
# a2enmod deflate
For SVG Compression, edit the file /etc/apache2/mods-available/deflate.conf and add AddOutputFilterByType DEFLATE image/svg+xml between the IfModule. It should look like this
# cat /etc/apache2/mods-available/deflate.conf
<IfModule mod_deflate.c>
 <IfModule mod_filter.c>
  AddOutputFilterByType DEFLATE text/html text/plain text/xml text/css
  AddOutputFilterByType DEFLATE application/x-javascript application/javascript application/ecmascript
  AddOutputFilterByType DEFLATE application/rss+xml
  AddOutputFilterByType DEFLATE application/xml
  AddOutputFilterByType DEFLATE image/svg+xml
 </IfModule>
</IfModule>
Restart Apache HTTP Server
# systemctl restart apache2
Now, let us check if it is working with curl and its -I argument to query only the headers.
% curl -I https://clementfevrier.fr/images/CC-BY-SA_icon.svg
HTTP/1.1 200 OK
Date: Fri, 20 Oct 2017 14:23:11 GMT
Server: Apache/2.4.27 (Ubuntu)
Upgrade: h2
Connection: Upgrade
Last-Modified: Mon, 07 Oct 2013 03:09:42 GMT
ETag: "1bf8-4e81dfbbce980"
Accept-Ranges: bytes
Content-Length: 7160
Vary: Accept-Encoding
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
X-Frame-Options: sameorigin
X-Download-Options: noopen
X-Permitted-Cross-Domain-Policies: none
Referrer-Policy: strict-origin-when-cross-origin
Content-Type: image/svg+xml
and now with -H 'Accept-Encoding: gzip,deflate' to let know Apache HTTP Server that it can serve the file using compression
% curl -I -H 'Accept-Encoding: gzip,deflate' https://clementfevrier.fr/images/CC-BY-SA_icon.svg
HTTP/1.1 200 OK
Date: Fri, 20 Oct 2017 14:25:19 GMT Server: Apache/2.4.27 (Ubuntu) Upgrade: h2 Connection: Upgrade Last-Modified: Mon, 07 Oct 2013 03:09:42 GMT
ETag: "1bf8-4e81dfbbce980-gzip"
Accept-Ranges: bytes
Vary: Accept-Encoding
Content-Encoding: gzip
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
X-Frame-Options: sameorigin
X-Download-Options: noopen
X-Permitted-Cross-Domain-Policies: none
Referrer-Policy: strict-origin-when-cross-origin
Content-Length: 3077
Content-Type: image/svg+xml
In this case, the file is more than 2 times smaller (7160÷3077 ~ 2.326941826). Enjoy!

Identity

Friday Oct 20, 2017 13:40, last edition on Thursday Oct 26, 2017 13:56

Someone's identity evolves with time and can be multiple. For example, someone's identity for someone met in work place will be usually the name given by their parents. Sometimes, it can be the husband's family name. Whereas your friend or family can call you with a nickname.
On the web, this identity is a pseudo or an avatar. It is an email address. Each time a new account is created on a server, a new identity for this server is created. It means that a person can have multiple identities in one location, that is usually not the case in the real world.

Building identity

There are two paths to build virtual identities. If someone decides to take different pseudo for each created account, then this person gain a new identity each time. It is a simple way to protect your privacy. The second choice is to keep the same pseudo, avatar, email address, and so on, to build a strong and consistent identity, and not necessarily the same as in the real world.

My identity

I have many virtual identities. Some correspond to the real world one, this website is a perfect example as it includes my name a picture of myself. Others are not related at all and can be difficult to link to me. Let me give you some of my virtual identities.

As Clément Février or related

Mastodon
Twitter
Facebook
LinkedIn
Physical Review Journals
arXiv.org
Launchpad, although it was clement.analogue for more than 9 years.

As Analogue or related

Mozilla Add-ons
Mozilla Support
GitHub, but my real name is displayed.
FramaGit, but my real name is displayed.
GitLab, but a nickname is displayed.
Documentation Ubuntu-fr
Forum Ubuntu-fr

Others

Clèm on StackExchange
clement on Forum Analogue
Dr Clèm on this blog

and I can continue like that for a while.

The trend in my case is to use my real name when I need to be a public figure, which as needed when I run as deputy deputy for the parliamentary elections, or for work, the pseudo Analogue or related when it is linked to software, and my first name and nickname for more personal accounts.

Edit on October 25, 2917: Add new identities, update identiy from FramaGit and GitHub.
Edit on October 26, 2917: Add new identities (Ubuntu-fr)

Welcome

Friday Oct 20, 2017 10:55

Hello,
Welcome on my blog. I decided to create this website to have a place to express myself. I write it from scratch to learn web-development. You can find a quick presentation on the right panel or bellow, depending on the size of your screen and your window, and my Mastodon feed bellow. On the footer, you can find some of my other websites, my profile on few social medias, the contact information, and legal notice. The only tab, so far, on the navigation bar is my Ph.D. thesis. I use PHP to develop this website. The latter is served exclusively through HTTPS, with strong encryption if your device support it and medium encryption for older devices, like Android smartphone as they don't received update, even for critical security issues. The web-server is Apache HTTP Server, using protocol HTTP/2. The template is a reproduction of La France Insoumise's website but using W3.CSS for styling instead of Bootstrap as it is smaller, faster, and easier to use.


Dr Clément Février

I am Dr Clément Février, French, living in Grenoble. I defended my PhD on July 4th, 2016. After my defense I run as deputy deputy (not a typo) for the national parliamentary elections in the 1st circonscription of Isère for the political movement La France Insoumise.


RT @FranceInsoumise Contrairement aux idées fausses diffusées par les lobbies industriels, l’agriculture biologique est aussi performante que les pesticides de synthèse face aux animaux ravageurs et plus performante face aux agents pathogènes (étude de l’INRA) lemonde.fr/biodiversite/articl