Learning assembly - 1.0

I wanted to learn assembly a while back but for some reason I left it halfway, without fully understanding it. I am going to give it another shot and document my journey in a series of blog posts. Mostly, I will be following this fine book by Ray Sefarth which is Introduction to 64 bit Intel Assembly Language Programming in Linux and some other experimentations of my own along the way.

What’s the point of even learning it?

A skilled assembly language coder can write code which uses less CPU time and less memory than that produced by a compiler. However modern C and C++ compilers do excellent optimization and beginning assembly programmers are no match for a good compiler.

I think, and it’s completely my own opinion, it’s the best way to understand how computers work at a deeper level. C comes very close to that but it’s still considered a higher level language.

Some of the other things that I think you can benefit from are:

optimize the program to run faster and more efficiently
reduce the size of the binary
can be helpful in reverse engineering

Alright, now let’s get started.

First program (does nothing, kinda)

segment .text
global _start

_start:
mov eax, 1 ; 1 is the exit syscall number
mov ebx, 5 ; the status value to return
int 0x80 ; generates a software interrupt numbered Ox80 which
 is the way linux handles 32 bit system calls

Note: This program just returns a non zero exit status code.

So, what’s going on here? What the hell is even eax and ebx? Those are registers. At a high level, they are used to store temporary data for the CPU.

segment indicates that the data or instructions following it are to be placed in the .text segment or section. In Linux this is where the instructions of a program are located.

global _start acts like a main function of a C program. It specifies the entry point of our assembly program.

int 0x80 is basically used to perform syscall to the kernel. It will perform all the above operations and return back to the user.

Executing it

How do we compile and run it? We will use an assembler for that and what it does is produce an object file which generates instructions and data in a form ready to link with other code from other object files or libraries.

We will use yasm assembler.

yasm -f elf64 -g dwarf2 -l first.lst first.asm

elf64 will generate a linux compatible executable file.
dwarf2 is a debugging format
first.lst is a listing file which contains the code in hexadecimal_

You will get 2 files, an object file and a list file.

yasm

Now, to make an executable file we will use ld.

ld -o first first.o

You should get the executable file. After running it, check the status code of the execution.

echo $?

It should return 5, because that’s what we have in the ebx register.

Exercise time

I think these questions are interesting and will be easy to solve. (these are taken from the book)

Q-1. Enter the assembly language program from this chapter and assemble and link it. Then execute the program and enter echo $?. A non-zero status indicates an error. Change the program to yield a 0 status.

We did most of it above. Now to change the return status, we can simply change the constant value to 0. After assembling and linking it, you should get 0 when you run echo $?.

Q-2 Modify the assembly program to define main rather than _start. Assemble it and link it using gcc. What is the difference in size of the executables?

Let’s change it. First, we will look at the previous size of the binary that we got.

size

It’s 5.1K. Alright, now let’s compile it with gcc by replacing _start with main. Assemble it with yasm and then run:

gcc -o first first.o

What’s the size?

gcc

16K! That’s almost 3x more. Massive increase.

This would be the end of the first part. I will try to be consistent and keep posting several more of these.

Keep experimenting!