Dr Clèm's Blog

Tags: Coding C++ Assembler Optimization GCC

Are ^ and != operators equivalent on unsigned char in C++?

Wednesday Oct 25, 2017 03:51, last edition on Wednesday Oct 25, 2017 03:57

I was trying to optimize a C++ code where I check if two unsigned char are different. So the first thing that I wrote was n1 != n2. unsigned char are an integer type, and an efficient way to perform this operation is an exclusive or using bitwise operators, which is ^ in C++. So I could simply change my code to n1 ^ n2. But I was wondering if there are equivalent or if one of them is more efficient. I wrote the simplest functions for each operation in separate files.

% cat neq.cpp
bool f(unsigned char n1, unsigned char n2)
{
 return n1 != n2;
}
% cat xor.cpp
bool f(unsigned char n1, unsigned char n2)
{
 return n1 ^ n2;
}
Then, I compiled them to produce assembler code with the -S option of GCC
% g++ -S neq.cpp xor.cpp
For each source file .cpp, the corresponding assembler code is store in a .s file. If we compare the two file, they are equivalent, of course apart for the file name of the source code
% diff neq.s xor.s
1c1
< .file "neq.cpp"
---
> .file "xor.cpp"
Using ^ or != are strictly equivalent for unsigned char.

My initial goal was to optimize my code but there is nothing that we can do to improve it in C++. To perform optimizations, we will use the compiler. Let us take a look at the assembler code, where the first line is skipped because irrelevant

.text .globl _Z1fhh .type _Z1fhh, @function _Z1fhh: .LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 movl %edi, %edx movl %esi, %eax movb %dl, -4(%rbp) movb %al, -8(%rbp) movzbl -4(%rbp), %eax cmpb -8(%rbp), %al setne %al popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE0: .size _Z1fhh, .-_Z1fhh .ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609" .section .note.GNU-stack,"",@progbits
Let us compile with different sets of flats, -O, or -O1 which is equivalent, will drastically reduce the number of instruction passed to the CPU. Indeed, the block between .cfi_startproc and .cfi_endproc will be reduced to 3 instructions, against 15 without flag.
.text .globl _Z1fhh .type _Z1fhh, @function _Z1fhh: .LFB0: .cfi_startproc cmpb %sil, %dil setne %al ret .cfi_endproc .LFE0: .size _Z1fhh, .-_Z1fhh .ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609" .section .note.GNU-stack,"",@progbits
-O2 can be used, -O3 and -Ofast will not affect the assembler code. In my case, the option -march (-march=bdver2) also affects the generate assembler code when combine with -O2. There is also -Os to optimize for size. Because I cannot judge which one is the best based on the assembler code, let us benchmark the operator with a dummy program
% cat main.cpp
int main()
{
 for(unsigned char n1 = 0; n1 < 255; ++n1)
 {
  for(unsigned char n2 = 0; n2 < 255; ++n2)
  {
   for(unsigned short int i = 0; i < 65535; ++i)
   {
    n1 ^ n2;
   }
  }
 }
 unsigned char n1 = 255;
 for(unsigned char n2 = 0; n2 < 255; ++n2)
 {
  for(unsigned short int i = 0; i < 65535; ++i)
  {
   n1 ^ n2;
  }
 }
}
Then, let us write a script to display the computation time in each case
% cat script.sh
#!/bin/bash

printf "g++ main.cpp; time ./a.out"
g++ main.cpp; time ./a.out
printf "\n"
printf "g++ -O1 main.cpp -o o1; time ./o1"
g++ -O1 main.cpp -o o1; time ./o1
printf "\n"
printf "g++ -O2 main.cpp -o o2; time ./o2"
g++ -O2 main.cpp -o o2; time ./o2
printf "\n"
printf "g++ -O2 -march=bdver2 main.cpp -o march; time ./march"
g++ -O2 -march=bdver2 main.cpp -o march; time ./march
printf "\n"
printf "g++ -Os main.cpp -o os; time ./os"
g++ -Os main.cpp -o os; time ./os
We now makethis script executable
% chmod +x script.sh
and we execute it
% ./script.sh g++ main.cpp; time ./a.out real 0m11.138s user 0m11.138s sys 0m0.000s g++ -O1 main.cpp -o o1; time ./o1 real 0m2.649s user 0m2.645s sys 0m0.004s g++ -O2 main.cpp -o o2; time ./o2 real 0m0.002s user 0m0.001s sys 0m0.000s g++ -O2 -march=bdver2 main.cpp -o march; time ./march real 0m0.002s user 0m0.001s sys 0m0.000s g++ -Os main.cpp -o os; time ./os real 0m0.002s user 0m0.001s sys 0m0.000s
With no doubt, the flags optimizes the code. But we don't have enough data to distinguish between -O2, -O2 -march=bdver2 and -Os, so let us change the code to deasable some optimizations closer to real life case and perform it only on these three last cases because the first two cases might take a very long time.
 % cat main2.cpp
#include <iostream> // std::cout
#include <ostream> // std::endl

bool f(const unsigned char & n1, const unsigned char & n2)
{
 return n1 ^ n2;
}

int main()
{
 bool * tmp = new bool [256];
 for(unsigned short int i = 0; i < 256; ++i)
 {
  tmp[i] = false;
 }
 for(unsigned char n1 = 0; n1 < 255; ++n1)
 {
  for(unsigned char n2 = 0; n2 < 255; ++n2)
  {
   {
    for(unsigned short int i = 0; i < 65535; ++i)
    {
     tmp[n1] += f(n1, n2);
    }
   }
  }
 }
 unsigned char n1 = 255;
 for(unsigned char n2 = 0; n2 < 255; ++n2)
 {
  {
   for(unsigned short int i = 0; i < 65535; ++i)
   {
    tmp[n1] += f(n1, n2);
   }
  }
 }
 std::cout<  return 0;
}
Here the new version of the script to test only the relevent flags
% cat script2.sh
#!/bin/bash

printf "g++ -O2 main2.cpp -o o2; time ./o2"
g++ -O2 main2.cpp -o o2; time ./o2
printf "\n"
printf "g++ -O2 -march=bdver2 main2.cpp -o march; time ./march"
g++ -O2 -march=bdver2 main2.cpp -o march; time ./march
printf "\n"
printf "g++ -Os main2.cpp -o os; time ./os"
g++ -Os main2.cpp -o os; time ./os
Now, let us execute this script
% ./script2.sh g++ -O2 main2.cpp -o o2; time ./o2 0x13cfc20 real 0m3.803s user 0m3.803s sys 0m0.001s g++ -O2 -march=bdver2 main2.cpp -o march; time ./march 0x16d0c20 real 0m3.843s user 0m3.841s sys 0m0.000s g++ -Os main2.cpp -o os; time ./os 0x1cc1c20 real 0m4.962s user 0m4.953s sys 0m0.008s
Running several times gives similar results. -Os made a code a little bit slower than the ones produced by -O2 and -O2 -march=bdver2. Notice that forcing inlining of f improves the computation time for -Os and give similar results than for the other flags.

About 50% of the time-O2 is faster -O2 -march=bdver2, but the relative difference is less than 1%. There is no clear gain from -march=bdver2 in this case.

.section .text.unlikely,"ax",@progbits .LCOLDB0: .text .LHOTB0: .p2align 4,,15 .globl _Z1fhh .type _Z1fhh, @function _Z1fhh: .LFB0: .cfi_startproc cmpb %sil, %dil setne %al ret .cfi_endproc .LFE0: .size _Z1fhh, .-_Z1fhh .section .text.unlikely .LCOLDE0: .text .LHOTE0: .ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609" .section .note.GNU-stack,"",@progbits
Here the difference in the f function
% diff o2.s march.s
6c6,7
< .p2align 4,,15
---
> .p2align 4,,10
> .p2align 3
To conclude, we first showed that ^ and != are strictly equivalent for unsigned char, then that the maximum of optimization for this operator is achived with -O2 and finally that -march=bdver2 does not provide noticeable improvement. In a real life code, other optimizations can make -Os performing better than -O2 and other flags, such as -O3, might affect the executable.

Mastodon Follow me Mastodon Share
Comments
There is no comment yet.
Dr Clément Février

I am Dr Clément Février, French, living in Grenoble. I defended my PhD on July 4th, 2016. After my defense I run as deputy deputy (not a typo) for the national parliamentary elections in the 1st circonscription of Isère for the political movement La France Insoumise.


Ça se passe dans l'Ain (entre autre). La France n'a pas les moyens de mettre 10000€ pour faire une enquête sur des malformations de bébés, on doit vraiment être un des pays les plus pauvre de la planète. L'État abandonnait déjà les vieux avec les retraites (ça coûte trop cher), la santé (ça coûte trop cher), la sécurité (ça coûte trop cher), la recherche (ça coûte trop cher), les pauvres (ça coûte trop cher), l'environnement (ça coûte trop cher),

mamot.fr/@LeMediaTV/1032052807

"Les états-unis [gentil] aurait espionner Assange [méchant]" (sauf dans Médiapart)
"Durant la manifestation, 15 policiers ont été gravement blessés par des projectiles a base d’œufs et de farines lancés par les manifestants, désigné de "ultra" par le gouvernement, et 12 manifestants auraient été légèrement blessés par des projectiles de provenances inconnus"

Si le gentil est accusé d'une exaction, alors non seulement le conditionnel est utilisé, mais en plus, les propos sont rapportés par le méchant. Exemple : "Selon [gilet jaune/Erdogan/Putin/Trump], le [gouvernement/kurdes/USA/UnPeuToutLeMonde] aurait fait un truc mal."
"Selon Erdogan [méchant], le PKK [gentil] est a l'origine de l'attaque suicide qui a tué des dizaines de civiles sur la place public."
"Erdogan [méchant] a lancé l'offensive contre les kurdes [gentils]"

"Gilet jaune blessé : les "selon" du 20 Heures" par Arrêt sur Images.

Maintenant, regardez n'importe quel article, si le conditionnel est utilisé sauf pour les exactions, où le présent de vérité général est utilisé, alors c'est un méchant/ennemis/terroriste/"ultra"/dictateur/..., si au contraire, le présent de vérité général est utilisé par défaut, alors c'est un gentil.

arretsurimages.net/emissions/c