v swap

---
layout: post
title: A false trick on swapping two variables
date: 2017-8-16
categories: blog
tags: [C Lang]
description:
---
There are two typical approaches to implement an function to compute the n'th element of the Fibonacci sequence.
* One approach:
<pre><code>class Solution {
public:
int Fibonacci(int n) {
if (n <= 0) {
return 0;
}
if (n == 1) {
return 1;
}
int a_i_1 = 0;
int a_i = 1;
int tmp;
for (int i = 2; i <= n; i++) {
tmp = a_i;
a_i = a_i_1 + a_i;
a_i_1 = tmp;
}
return a_i;
}
};
</code></pre>
* Yet another approach:
<pre><code>class Solution {
public:
int Fibonacci(int n) {
if (n <= 0) {
return 0;
}
if (n == 1) {
return 1;
}
int a_i_1 = 0;
int a_i = 1;
//int tmp;
for (int i = 2; i <= n; i++) {
//tmp = a_i;
a_i = a_i_1 + a_i;
a_i_1 = a_i - a_i_1;
}
return a_i;
}
};
</code></pre>
It seems that the second approach will ease the number of registers used and the decrease in number of statements will result in a better performance. However, it is proven to be false.
Below are my experiments:
First write a <code>Makefile</code> to compile them:
<pre><code>CC = g++
CFLAGS = -Wall -g
TARGET = driver
OBJECT = driver.o
default: driver
$(TARGET): $(OBJECT)
    $(CC) $(CFLAGS) -o $(TARGET) $(OBJECT)
OBJECT: driver.cpp
    $(CC) $(CFLAGS) -o driver.cpp
clean:
-rm -f *.o *~
</code></pre>
Then <code>make</code> to compile the source <code>driver.cpp</code> and generate the binary <code>driver</code>,
Run the binary to see if anything goes right.
<pre><code>./driver</code></pre>
Dump the binary into assembly:
<pre><code>objdump -D driver > driver.s</code></pre>
Do these steps for both the versions with and without <code>tmp</code>
For the version with tmp:
<pre><code>00000000004005dc &lt;_ZN8Solution9FibonacciEi&gt;:
403 4005dc: 55 push %rbp
404 4005dd: 48 89 e5 mov %rsp,%rbp
405 4005e0: 48 89 7d e8 mov %rdi,-0x18(%rbp)
406 4005e4: 89 75 e4 mov %esi,-0x1c(%rbp)
407 4005e7: 83 7d e4 00 cmpl $0x0,-0x1c(%rbp)
408 4005eb: 7f 07 jg 4005f4 &lt;_ZN8Solution9FibonacciEi+0x18&gt;
409 4005ed: b8 00 00 00 00 mov $0x0,%eax
410 4005f2: eb 45 jmp 400639 &lt;_ZN8Solution9FibonacciEi+0x5d&gt;
411 4005f4: 83 7d e4 01 cmpl $0x1,-0x1c(%rbp)
412 4005f8: 75 07 jne 400601 &lt;_ZN8Solution9FibonacciEi+0x25&gt;
413 4005fa: b8 01 00 00 00 mov $0x1,%eax
414 4005ff: eb 38 jmp 400639 &lt;_ZN8Solution9FibonacciEi+0x5d&gt;
415 400601: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
416 400608: c7 45 f8 01 00 00 00 movl $0x1,-0x8(%rbp)
417 40060f: c7 45 f4 02 00 00 00 movl $0x2,-0xc(%rbp)
418 400616: eb 16 jmp 40062e &lt;_ZN8Solution9FibonacciEi+0x52&gt;
419 400618: 8b 45 f8 mov -0x8(%rbp),%eax
420 40061b: 89 45 f0 mov %eax,-0x10(%rbp)
421 40061e: 8b 45 fc mov -0x4(%rbp),%eax
422 400621: 01 45 f8 add %eax,-0x8(%rbp)
423 400624: 8b 45 f0 mov -0x10(%rbp),%eax
424 400627: 89 45 fc mov %eax,-0x4(%rbp)
425 40062a: 83 45 f4 01 addl $0x1,-0xc(%rbp)
426 40062e: 8b 45 f4 mov -0xc(%rbp),%eax
427 400631: 3b 45 e4 cmp -0x1c(%rbp),%eax
428 400634: 7e e2 jle 400618 &lt;_ZN8Solution9FibonacciEi+0x3c&gt;
429 400636: 8b 45 f8 mov -0x8(%rbp),%eax
430 400639: 5d pop %rbp
431 40063a: c3 retq
</code></pre>
For the version without tmp:
<pre><code>00000000004005dc &lt;_ZN8Solution9FibonacciEi&gt;:
401 4005dc: 55 push %rbp
402 4005dd: 48 89 e5 mov %rsp,%rbp
403 4005e0: 48 89 7d e8 mov %rdi,-0x18(%rbp)
404 4005e4: 89 75 e4 mov %esi,-0x1c(%rbp)
405 4005e7: 83 7d e4 00 cmpl $0x0,-0x1c(%rbp)
406 4005eb: 7f 07 jg 4005f4 &lt;_ZN8Solution9FibonacciEi+0x18&gt;
407 4005ed: b8 00 00 00 00 mov $0x0,%eax
408 4005f2: eb 46 jmp 40063a &lt;_ZN8Solution9FibonacciEi+0x5e&gt;
409 4005f4: 83 7d e4 01 cmpl $0x1,-0x1c(%rbp)
410 4005f8: 75 07 jne 400601 &lt;_ZN8Solution9FibonacciEi+0x25&gt;
411 4005fa: b8 01 00 00 00 mov $0x1,%eax
412 4005ff: eb 39 jmp 40063a &lt;_ZN8Solution9FibonacciEi+0x5e&gt;
413 400601: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp)
414 400608: c7 45 f8 01 00 00 00 movl $0x1,-0x8(%rbp)
415 40060f: c7 45 f4 02 00 00 00 movl $0x2,-0xc(%rbp)
416 400616: eb 17 jmp 40062f &lt;_ZN8Solution9FibonacciEi+0x53&gt;
417 400618: 8b 45 fc mov -0x4(%rbp),%eax
418 40061b: 01 45 f8 add %eax,-0x8(%rbp)
419 40061e: 8b 45 fc mov -0x4(%rbp),%eax
420 400621: 8b 55 f8 mov -0x8(%rbp),%edx
421 400624: 29 c2 sub %eax,%edx
422 400626: 89 d0 mov %edx,%eax
423 400628: 89 45 fc mov %eax,-0x4(%rbp)
424 40062b: 83 45 f4 01 addl $0x1,-0xc(%rbp)
425 40062f: 8b 45 f4 mov -0xc(%rbp),%eax
426 400632: 3b 45 e4 cmp -0x1c(%rbp),%eax
427 400635: 7e e1 jle 400618 &lt;_ZN8Solution9FibonacciEi+0x3c&gt;
428 400637: 8b 45 f8 mov -0x8(%rbp),%eax
429 40063a: 5d pop %rbp
</code></pre>
Compare these two versions of assembly code, the corresponding parts inside the loop are:
line <code>419</code> to <code>427</code> in first version;
line <code>417</code> to <code>426</code> in second version;
They both have 6 <code>mov</code>, 1 <code>add</code>, 1 <code>addl</code>,
but the second version has one more <code>sub</code>
Notes:
To make it easier to read the assembly code, here I briefly give a map between C variables and the corresponding notation in assembly:
<code>-0x1c(%rbp)</code> is n.
<code>-0xc(%rbp)</code> is i inside the loop.
<code>-0x8(%rbp)</code> is corresponding to <code>a_i</code>.
<code>-0x4(%rbp)</code> is corresponding to <code>a_i_1</code>.
Conclusion:
There is no actual performance benefit to use subtraction in order to avoid extra temperal variables in C code, since the actual variable swap is handled by compiler.