Are ^ and != operators equivalent on unsigned char in C++?
Wednesday Oct 25, 2017 03:51, last edition on Wednesday Oct 25, 2017 03:57
I was trying to optimize a C++ code where I check if two unsigned char
are different.
So the first thing that I wrote was n1 != n2
.
unsigned char
are an integer type,
and an efficient way to perform this operation is an exclusive or using bitwise operators,
which is ^
in C++.
So I could simply change my code to n1 ^ n2
.
But I was wondering if there are equivalent or if one of them is more efficient.
I wrote the simplest functions for each operation in separate files.
% cat neq.cpp
bool f(unsigned char n1, unsigned char n2)
{
return n1 != n2;
}
% cat xor.cpp
bool f(unsigned char n1, unsigned char n2)
{
return n1 ^ n2;
}
Then, I compiled them to produce assembler code with the -S
option of GCC
% g++ -S neq.cpp xor.cpp
For each source file .cpp, the corresponding assembler code is store in a .s file.
If we compare the two file, they are equivalent, of course apart for the file name of the source code
% diff neq.s xor.s
1c1
< .file "neq.cpp"
---
> .file "xor.cpp"
Using ^
or !=
are strictly equivalent for unsigned char
.
My initial goal was to optimize my code but there is nothing that we can do to improve it in C++. To perform optimizations, we will use the compiler. Let us take a look at the assembler code, where the first line is skipped because irrelevant
.text
.globl _Z1fhh
.type _Z1fhh, @function
_Z1fhh:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl %edi, %edx
movl %esi, %eax
movb %dl, -4(%rbp)
movb %al, -8(%rbp)
movzbl -4(%rbp), %eax
cmpb -8(%rbp), %al
setne %al
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size _Z1fhh, .-_Z1fhh
.ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609"
.section .note.GNU-stack,"",@progbits
Let us compile with different sets of flats,
-O, or -O1 which is equivalent,
will drastically reduce the number of instruction passed to the CPU.
Indeed, the block between .cfi_startproc and .cfi_endproc will be reduced to 3 instructions, against 15 without flag.
.text
.globl _Z1fhh
.type _Z1fhh, @function
_Z1fhh:
.LFB0:
.cfi_startproc
cmpb %sil, %dil
setne %al
ret
.cfi_endproc
.LFE0:
.size _Z1fhh, .-_Z1fhh
.ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609"
.section .note.GNU-stack,"",@progbits
-O2
can be used, -O3
and -Ofast
will not affect the assembler code.
In my case, the option -march
(-march=bdver2
) also affects the generate assembler code when combine with -O2
.
There is also -Os
to optimize for size.
Because I cannot judge which one is the best based on the assembler code,
let us benchmark the operator with a dummy program
% cat main.cpp
int main()
{
for(unsigned char n1 = 0; n1 < 255; ++n1)
{
for(unsigned char n2 = 0; n2 < 255; ++n2)
{
for(unsigned short int i = 0; i < 65535; ++i)
{
n1 ^ n2;
}
}
}
unsigned char n1 = 255;
for(unsigned char n2 = 0; n2 < 255; ++n2)
{
for(unsigned short int i = 0; i < 65535; ++i)
{
n1 ^ n2;
}
}
}
Then, let us write a script to display the computation time in each case
% cat script.sh
#!/bin/bash
printf "g++ main.cpp; time ./a.out"
g++ main.cpp; time ./a.out
printf "\n"
printf "g++ -O1 main.cpp -o o1; time ./o1"
g++ -O1 main.cpp -o o1; time ./o1
printf "\n"
printf "g++ -O2 main.cpp -o o2; time ./o2"
g++ -O2 main.cpp -o o2; time ./o2
printf "\n"
printf "g++ -O2 -march=bdver2 main.cpp -o march; time ./march"
g++ -O2 -march=bdver2 main.cpp -o march; time ./march
printf "\n"
printf "g++ -Os main.cpp -o os; time ./os"
g++ -Os main.cpp -o os; time ./os
We now makethis script executable
% chmod +x script.sh
and we execute it
% ./script.sh
g++ main.cpp; time ./a.out
real 0m11.138s
user 0m11.138s
sys 0m0.000s
g++ -O1 main.cpp -o o1; time ./o1
real 0m2.649s
user 0m2.645s
sys 0m0.004s
g++ -O2 main.cpp -o o2; time ./o2
real 0m0.002s
user 0m0.001s
sys 0m0.000s
g++ -O2 -march=bdver2 main.cpp -o march; time ./march
real 0m0.002s
user 0m0.001s
sys 0m0.000s
g++ -Os main.cpp -o os; time ./os
real 0m0.002s
user 0m0.001s
sys 0m0.000s
With no doubt, the flags optimizes the code.
But we don't have enough data to distinguish between -O2
, -O2 -march=bdver2
and -Os
, so let us change the code to deasable some optimizations closer to real life case and perform it only on these three last cases because the first two cases might take a very long time.
% cat main2.cpp
#include <iostream> // std::cout
#include <ostream> // std::endl
bool f(const unsigned char & n1, const unsigned char & n2)
{
return n1 ^ n2;
}
int main()
{
bool * tmp = new bool [256];
for(unsigned short int i = 0; i < 256; ++i)
{
tmp[i] = false;
}
for(unsigned char n1 = 0; n1 < 255; ++n1)
{
for(unsigned char n2 = 0; n2 < 255; ++n2)
{
{
for(unsigned short int i = 0; i < 65535; ++i)
{
tmp[n1] += f(n1, n2);
}
}
}
}
unsigned char n1 = 255;
for(unsigned char n2 = 0; n2 < 255; ++n2)
{
{
for(unsigned short int i = 0; i < 65535; ++i)
{
tmp[n1] += f(n1, n2);
}
}
}
std::cout<
return 0;
}
Here the new version of the script to test only the relevent flags
% cat script2.sh
#!/bin/bash
printf "g++ -O2 main2.cpp -o o2; time ./o2"
g++ -O2 main2.cpp -o o2; time ./o2
printf "\n"
printf "g++ -O2 -march=bdver2 main2.cpp -o march; time ./march"
g++ -O2 -march=bdver2 main2.cpp -o march; time ./march
printf "\n"
printf "g++ -Os main2.cpp -o os; time ./os"
g++ -Os main2.cpp -o os; time ./os
Now, let us execute this script
% ./script2.sh
g++ -O2 main2.cpp -o o2; time ./o2
0x13cfc20
real 0m3.803s
user 0m3.803s
sys 0m0.001s
g++ -O2 -march=bdver2 main2.cpp -o march; time ./march
0x16d0c20
real 0m3.843s
user 0m3.841s
sys 0m0.000s
g++ -Os main2.cpp -o os; time ./os
0x1cc1c20
real 0m4.962s
user 0m4.953s
sys 0m0.008s
Running several times gives similar results.
-Os
made a code a little bit slower than the ones produced by -O2
and -O2 -march=bdver2
.
Notice that forcing inlining of f
improves the computation time for -Os
and give similar results than for the other flags.
About 50% of the time-O2
is faster -O2 -march=bdver2
, but the relative difference is less than 1%. There is no clear gain from -march=bdver2
in this case.
.section .text.unlikely,"ax",@progbits
.LCOLDB0:
.text
.LHOTB0:
.p2align 4,,15
.globl _Z1fhh
.type _Z1fhh, @function
_Z1fhh:
.LFB0:
.cfi_startproc
cmpb %sil, %dil
setne %al
ret
.cfi_endproc
.LFE0:
.size _Z1fhh, .-_Z1fhh
.section .text.unlikely
.LCOLDE0:
.text
.LHOTE0:
.ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609"
.section .note.GNU-stack,"",@progbits
Here the difference in the f
function
% diff o2.s march.s
6c6,7
< .p2align 4,,15
---
> .p2align 4,,10
> .p2align 3
To conclude, we first showed that ^
and !=
are strictly equivalent for unsigned char
, then that the maximum of optimization for this operator is achived with -O2
and finally that -march=bdver2
does not provide noticeable improvement.
In a real life code, other optimizations can make -Os
performing better than -O2
and other flags, such as -O3
, might affect the executable.