GNU Assembler Examples
GNU Assembler Examples
Examples
GAS, the GNU Assembler, is the default assembler for the GNU Operating System. It works on many
different architectures and supports several assembly language syntaxes. These examples are only for
operating systems using the Linux kernel and an x86-64 processor, however.
Getting Started
Here is the traditional Hello World program that uses Linux System calls, for a 64-bit
installation:
hello.s
#
#
#
#
#
#
#
#
#
#
---------------------------------------------------------------------------------------Writes "Hello, World" to the console using only system calls. Runs on 64-bit Linux only.
To assemble and run:
gcc -c hello.s && ld hello.o && ./a.out
or
gcc -nostdlib hello.s && ./a.out
---------------------------------------------------------------------------------------.global _start
.text
_start:
# write(1, message, 13)
mov
$1, %rax
mov
$1, %rdi
mov
$message, %rsi
mov
$13, %rdx
syscall
# exit(0)
mov
$60, %rax
xor
%rdi, %rdi
syscall
message:
.ascii "Hello, world\n"
#
#
#
#
#
$gccchello.s&&ldhello.o&&./a.out
Hello,World
If you are using a different operating system, such as OSX or Windows, the system call numbers
and the registers used will likely be different.
---------------------------------------------------------------------------------------Writes "Hola, mundo" to the console using a C library. Runs on Linux or any other system
that does not use underscores for symbols in its C library. To assemble and run:
gcc hola.s && ./a.out
---------------------------------------------------------------------------------------.global main
.text
main:
mov
$message, %rdi
call
puts
ret
message:
.asciz "Hola, mundo"
#
#
#
#
$gcchola.s&&./a.out
Hola,mundo
The callee is also supposed to save the control bits of the XMCSR and the x87 control
word, but x87 instructions are rare in 64-bit code so you probably don't have to worry
about this.
Integers are returned in rax or rdx:rax, and floating point values are returned in xmm0 or
xmm1:xmm0.
This program prints the first few fibonacci numbers, illustrating how registers have to be saved
and restored:
fib.s
#
#
#
#
#
#
#
----------------------------------------------------------------------------A 64-bit Linux application that writes the first 90 Fibonacci numbers. It
needs to be linked with a C library.
Assemble and Link:
gcc fib.s
----------------------------------------------------------------------------.global main
.text
main:
push
%rbx
mov
xor
xor
inc
$90, %ecx
%rax, %rax
%rbx, %rbx
%rbx
#
#
#
#
ecx
rax
rbx
rbx
will countdown to 0
will hold the current number
will hold the next number
is originally 1
print:
# We need to call printf, but we are using eax, ebx, and ecx. printf
# may destroy eax and ecx so we will save these before the call and
# restore them afterwards.
push
push
%rax
%rcx
# caller-save register
# caller-save register
mov
mov
xor
$format, %rdi
%rax, %rsi
%rax, %rax
%rcx
%rax
mov
mov
add
dec
jnz
%rax, %rdx
%rbx, %rax
%rdx, %rbx
%ecx
print
#
#
#
#
#
pop
ret
%rbx
.asciz
"%20ld\n"
format:
$gccfib.s&&./a.out
0
1
1
2
3
...
420196140727489673
679891637638612258
1100087778366101931
1779979416004714189
----------------------------------------------------------------------------A 64-bit function that returns the maximum value of its three 64-bit integer
arguments. The function has signature:
int64_t maxofthree(int64_t x, int64_t y, int64_t z)
Note that the parameters have already been passed in rdi, rsi, and rdx. We
just have to return the value in rax.
----------------------------------------------------------------------------.globl
.text
maxofthree:
mov
cmp
cmovl
cmp
cmovl
ret
maxofthree
%rdi,
%rsi,
%rsi,
%rdx,
%rdx,
%rax
%rax
%rax
%rax
%rax
#
#
#
#
#
#
----------------------------------------------------------------------------A 64-bit program that displays its commandline arguments, one per line.
On entry, %rdi will contain argc and %rsi will contain argv.
----------------------------------------------------------------------------.global main
.text
main:
push
push
sub
%rdi
%rsi
$8, %rsp
mov
call
(%rsi), %rdi
puts
add
pop
pop
$8, %rsp
%rsi
%rdi
add
dec
jnz
$8, %rsi
%rdi
main
ret
format:
.asciz
"%s\n"
$gccecho.s&&./a.out25782doghuh$$
./a.out
25782
dog
huh
9971
$gccecho.s&&./a.out25782doghuh'$$'
./a.out
25782
dog
huh
$$
Note that as far as the C Library is concerned, command line arguments are always strings. If
y
you want to treat them as integers, call atoi. Here's a little program to compute x . Another
feature of this example is that it shows how to restrict values to 32-bit ones.
power.s
#
#
#
#
#
#
.global main
.text
main:
push
%r12
# save callee-save registers
push
%r13
push
%r14
# By pushing 3 registers our stack is already aligned for calls
cmp
jne
$3, %rdi
error1
mov
%rsi, %r12
# argv
# We will use ecx to count down form the exponent to zero, esi to hold the
# value of the base, and eax to hold the running product.
mov
call
cmp
jl
mov
16(%r12), %rdi
atoi
$0, %eax
error2
%eax, %r13d
# argv[2]
# y in eax
# disallow negative exponents
mov
call
mov
8(%r12), %rdi
atoi
%eax, %r14d
# argv
# x in eax
# x in r14d
mov
$1, %eax
test
jz
imul
dec
jmp
%r13d, %r13d
gotit
%r14d, %eax
%r13d
check
mov
movslq
xor
call
jmp
$answer, %rdi
%eax, %rsi
%rax, %rax
printf
done
mov
call
jmp
$badArgumentCount, %edi
puts
done
mov
call
$negativeExponent, %edi
puts
pop
pop
pop
ret
%r14
%r13
%r12
# y in r13d
check:
gotit:
error1:
error2:
done:
answer:
.asciz "%d\n"
badArgumentCount:
.asciz "Requires exactly two arguments\n"
negativeExponent:
.asciz "The exponent may not be negative\n"
$./power219
524288
$./power38
Theexponentmaynotbenegative
$./power1500
1
Exercise: Rewrite this example to use 64-bit numbers everywhere. You will also need to
switch from atoi to strtol.
----------------------------------------------------------------------------A 64-bit function that returns the sum of the elements in a floating-point
array. The function has prototype:
double sum(double[] array, unsigned length)
----------------------------------------------------------------------------.global sum
.text
sum:
xorpd
cmp
je
%xmm0, %xmm0
$0, %rsi
done
addsd
add
dec
jnz
(%rdi), %xmm0
$8, %rdi
%rsi
next
#
#
#
#
next:
add in the current array element
move to next array element
count down
if not done counting, continue
done:
ret
$gcccallsum.csum.s&&./a.out
26.7000000
67.2000000
0.0000000
89.1000000
Data Sections
The text section is read-only on most operating systems, so you might find the need for a data
section. On most operating systems, the data section is only for initialized data, and you have a
special .bss section for uninitialized data. Here is a program that averages the command line
arguments, expected to be integers, and displays the result as a floating point number.
average.s
#
#
#
#
#
#
----------------------------------------------------------------------------64-bit program that treats all its command line arguments as integers and
displays their average as a floating point number. This program uses a data
section to store intermediate results, not that it has to, but only to
illustrate how data sections are used.
----------------------------------------------------------------------------.globl
main
.text
main:
dec
%rdi
jz
nothingToAverage
mov
%rdi, count
accumulate:
push
%rdi
push
%rsi
mov
(%rsi,%rdi,8), %rdi
call
atoi
pop
%rsi
pop
%rdi
add
%rax, sum
dec
%rdi
jnz
accumulate
average:
cvtsi2sd sum, %xmm0
cvtsi2sd count, %xmm1
divsd
%xmm1, %xmm0
mov
$format, %rdi
mov
$1, %rax
sub
call
add
$8, %rsp
printf
$8, %rsp
# xmm0 is sum/count
# 1st arg to printf
# printf is varargs, there is 1 non-int argument
# align stack pointer
# printf(format, sum/count)
# restore stack pointer
ret
nothingToAverage:
mov
$error, %rdi
xor
%rax, %rax
call
printf
ret
count:
sum:
format:
error:
.data
.quad
.quad
.asciz
.asciz
0
0
"%g\n"
"There are no command line arguments to average\n"
Recursion
Perhaps surprisingly, there's nothing out of the ordinary required to implement recursive
functions. You just have to be careful to save registers, as usual. Here's an example. In C:
uint64_t factorial(unsigned n) {
return (n <= 1) ? 1 : n * factorial(n-1);
}
factorial.s
#
#
#
#
#
#
#
.text
factorial:
cmp
jnbe
mov
ret
L1:
push
dec
call
pop
imul
ret
factorial
$1, %rdi
L1
$1, %rax
# n <= 1?
# if not, go do a recursive call
# otherwise return 1
%rdi
%rdi
factorial
%rdi
%rdi, %rax
#
#
#
#
#
An example caller:
callfactorial.c
/*
* An application that illustrates calling the factorial function defined elsewhere.
*/
#include <stdio.h>
#include <inttypes.h>
uint64_t factorial(unsigned n);
int main() {
for (unsigned i = 0; i < 20; i++) {
printf("factorial(%2u) = %lu\n", i, factorial(i));
}
}
SIMD Parallelism
The XMM registers can do arithmetic on floating point values one opeation at a time or
multiple operations at a time. The operations have the form:
operation
xmmregister_or_memorylocation, xmmregister
do
do
do
do
2 double-precision additions
just one double-precision addition, using the low 64-bits of the register
4 single-precision additions
just one single-precision addition, using the low 32-bits of the register
Saturated Arithmetic
The XMM registers can also do arithmetic on integers. The instructions have the form:
operation
xmmregister_or_memorylocation, xmmregister
Then on entry to the function, x will be in %edi, y will be in %esi, and the return address will
be on the top of the stack. Where can we put the local variables? An easy choice is on the stack
itself, though if you have enough regsters, use those.
If you are running on a machine that respect the standard ABI, you can leave %rsp where it is
and access the "extra parameters" and the local variables directly from %rsp for example:
+----------+
rsp-24 |
a
|
+----------+
rsp-16 |
b
|
+----------+
rsp-8 |
c
|
+----------+
rsp
rsp+8
| retaddr |
+----------+
| caller's |
| stack
|
| frame
|
| ...
|
+----------+
example
$7, -16(%rsp)
%rdi, %rax
8(%rsp), %rax
%rsi, %rax
If our function were to make another call, you would have to adjust %rsp to get out of the way
at that time.
On Windows you can't use this scheme because if an interrupt were to occur, everything above
the stack pointer gets plastered. This doesn't happen on most other operating systems because
there is a "red zone" of 128 bytes past the stack pointer which is safe from these things. In this
case, you can make room on the stack immediately:
example:
sub
$24, %rsp
example
$24, %rsp
movl
mov
imul
add
add
ret
$7, 8(%rsp)
%rdi, %rax
8(%rsp), %rax
%rsi, %rax
$24, %rsp