Learning assembly - 1.0
I wanted to learn assembly a while back but for some reason I left it halfway, without fully understanding it. I am going to give it another shot and document my journey in a series of blog posts. Mostly, I will be following this fine book by Ray Sefarth which is Introduction to 64 bit Intel Assembly Language Programming in Linux and some other experimentations of my own along the way.
What’s the point of even learning it?
A skilled assembly language coder can write code which uses less CPU time and less memory than that produced by a compiler. However modern C and C++ compilers do excellent optimization and beginning assembly programmers are no match for a good compiler.
I think, and it’s completely my own opinion, it’s the best way to understand how computers work at a deeper level. C comes very close to that but it’s still considered a higher level language.
Some of the other things that I think you can benefit from are:
- optimize the program to run faster and more efficiently
- reduce the size of the binary
- can be helpful in reverse engineering
Alright, now let’s get started.
First program (does nothing, kinda)
segment .text
global _start
_start:
mov eax, 1 ; 1 is the exit syscall number
mov ebx, 5 ; the status value to return
int 0x80 ; generates a software interrupt numbered Ox80 which
is the way linux handles 32 bit system calls
Note: This program just returns a non zero exit status code.
So, what’s going on here? What the hell is even eax
and ebx
? Those are registers. At a high level, they are used to store temporary data for the CPU.
segment
indicates that the data or instructions following it are to be placed in the .text segment or section. In Linux this is where the instructions of a program are located.
global _start
acts like a main
function of a C program. It specifies the entry point of our assembly program.
int 0x80
is basically used to perform syscall to the kernel. It will perform all the above operations and return back to the user.
Executing it
How do we compile and run it? We will use an assembler for that and what it does is produce an object file which generates instructions and data in a form ready to link with other code from other object files or libraries.
We will use yasm
assembler.
yasm -f elf64 -g dwarf2 -l first.lst first.asm
elf64
will generate a linux compatible executable file.dwarf2
is a debugging formatfirst.lst
is a listing file which contains the code in hexadecimal_
You will get 2 files, an object file and a list file.
Now, to make an executable file we will use ld.
ld -o first first.o
You should get the executable file. After running it, check the status code of the execution.
echo $?
It should return 5, because that’s what we have in the ebx
register.
Exercise time
I think these questions are interesting and will be easy to solve. (these are taken from the book)
Q-1. Enter the assembly language program from this chapter and assemble and link it. Then execute the program and enter echo $?
. A non-zero status indicates an error. Change the program to yield a 0 status.
We did most of it above. Now to change the return status, we can simply change the constant value to 0
. After assembling and linking it, you should get 0
when you run echo $?
.
Q-2 Modify the assembly program to define main rather than _start. Assemble it and link it using gcc. What is the difference in size of the executables?
Let’s change it. First, we will look at the previous size of the binary that we got.
It’s 5.1K. Alright, now let’s compile it with gcc by replacing _start
with main
. Assemble it with yasm
and then run:
gcc -o first first.o
What’s the size?
16K! That’s almost 3x more. Massive increase.
This would be the end of the first part. I will try to be consistent and keep posting several more of these.
Keep experimenting!