Vous êtes sur la page 1sur 43

x86 Assembly Language

For those of you who have been following my eventful career, you already know that this is actually my second published tutorial.
Just to bring you up to date, my first tutorial was titled Pas Pas vers lAssembleur and was originally written inApril, 2009.
After that, I made a small project called asmguru a knowledge base & library for assembly developers.
to Getting

Now, welcome

Your

Hands

Dirty

in

x86

Assembly

Code.

Another

tutorial?

How

come?

There are lots of books concentrating on the topic of Assembly programming in general and Windows programming using the
Assembly language in particular. This saga is intended for everybody who wants to master the art of x86 assembly language under the
Win32
platform
and
start
exploiting
the
security
part
of
the
Windows
operating
system.
Many people believe that assembly language is dead and useless when it comes to writing real programs. Somehow I do agree, but
assembly is not only for coding, it is for Software vulnerability Analysts, Bug Hunters, Shell-coders, Exploit Writers, Reverse Code
Engineers, Virus Authors, and Malware Analysts. Indeed, when you spend your day reading ASM routines, then ASM becomes a
must.
This saga is more code oriented than theory, so you will be provided with all materials and resources in the first of each
part. I really dont support theory when it comes to programming. In others words, blabla is in the books, code is what youll find
mostly here. Additionally, code will be highly commented and structured, written with the JWasm Assembler, for the Intel
x86
architecture and
can
run
under
the Windows
32-bit
platform.
First

and

foremost,

our

objective

Windows

System

Portable

Self-Modifying Code

Code Protection, Anti-Reverse Code Engineering

Executable

is

to

learn:

Programming
Format

However,

many

people

should

not

IA-32 Instruction Set

Assembly Language Foundations

JWasm Assembler Syntax

Win32 Programming

Tools of The Trade (IDA Pro, Immunity Debugger, WinDBG, )

be

familiar

with:

As a consequence, before getting directly to discovering the security of Windows, we need to have a solid background and handle
some prerequisites. So, we will start by giving you a complete look at the assembly basics and the Windows API Programming, then
we will move to the hearth of windows at kernel mode level and Windows internal related stuff. Afterwards, we will explore the
Portable Executable File Format, and we will try to write self-modifying code, polymorphic and metamorphic engine. Finally, we will
end our journey by looking closer at some advanced anti-reverse engineering techniques and code tampering.
Each

code

will

be

structured

in

the

following

way:

New

Concepts

New

APIs

Key

Notes

ASM

Code

Output

In some cases, as in the Part I, there is a book called Programming Windows from Charles Petzold, Its an amazing book
and it teaches Windows programming; unfortunately, its in the C language. I said to myself instead of writing ASM code directly, why
not reconstruct the code from the disassembled C code. Youll look how C Code is disassembled, youll reconstruct it in ASM, then

youll

look

at

it

again

to

see

how

much

closer

your

code

looks

to

the

disassembled

one.

We will follow this Reversing Coding Reversing approach as much as possible.


Before we go on with this saga, every time that you read it, its going to make more sense to you. Every time that you go back to the
principles that youll learn here, it is going to hit you in a different level. What that means is like: if you go out now, maybe the surface
level principles are hitting you and a year from now the same principles that you learn from here will take on a different meaning, and
five years from now the time it took me to master assembly- its gonna have an entirely different meaning.
Happy

ASM

Coding

Part 0x00 : Assembly vs Itself Getting The Basics


0x00 A Few Words About Assembly
0x01 Set Up The Environment And Tools
0x02 Basic Computer Organization
0x03 Assembly Language Fundamentals
0x04 Data Transfers, Addressing, and Arithmetic
0x05 Arrays and Strings
0x06 Branching and Looping
0x07 Procedural Calls, Calling Convention, Stack Frame and Heap Structure.
0x08 Structures, Macros, and Unions.
0x09 Could You Solve This ?
0x10 Appendix and References
Part 0x01 : Assembly vs Win32 API Programming
Getting Started
0x11. Message Box
0x12. First Windows Program
The GDI Philosophy
0x13. Painting with Text using

x86 Assembly Language, Part 1


As usual, last Friday night I was hanging out with friends, picking up girls in the street and chasing after them. One night, I had a
strange feeling just like something was gonna happen. But I was not sure if its gonna be good or bad. My closest friend, Esp!
oNLerAvaGe!, came out with me. I dressed up and we were ready. We took a taxi to An Diab, a very active place
in Casablanca, Morocco. While walking, I saw a very gorgeous girl, everybody was looking at her, and I decided to give myself a
chance and have a talk with her. On my way to her, I heard someone who said loudly: JAVA IS AWESOME.
When I heard that, I lost my attention span, and I kept my thoughts fixed on JAVA & AWESOME. Not because I have something
with Java, but because at that moment, I really wasnt expecting someone to say such a thing. I kept walking towards the girl and
she was
gazing
at
me.
I
said:
Me: Hi, could I have a word with you?
Her: Hi! .. ohhh yeah !
Me: Ive seen you leaving the cafe, you look so adorable and I want to ask you something
Her: Ooh .. Wh..at kind of question is th.at ?
In the meantime, there was a bunch of guys trying to figure out something. I was kinda out of it, looking at the girl but listening to the
guys. Then, I heard :
YOU
WRITE
ONCE
AND
RUN
EVERYWHERE.
I said to myself, shouldnt it be write once, debug everywhere? Afterward, he said: JAVA IS THE FUTURE,
AND HE ASKS HIS FRIENDS TO FORGET ABOUT ASSEMBLY. I decided to intervene. I said:
Sorry, and do you know what is Assembly?
The guy replied: mm not much actually, do you?

I said oh yes.
Him: can I ask you some questions then?
Then

smiled,

asked

What

the

girl

to

join

is

us,

and

the

Assembly

conversation

started.

Language?

Assembly language programming is referred to as low-level programming because each assembly language instruction performs
a much lower-level task compared to an instruction in a high-level language. As a consequence, to perform the same task, assembly
language
code
tends
to
be
much larger than
the
equivalent
high-level
language
code.
So

Assembly

Language

is

Machine

Language

Somehow. Machine language is a close relative of the assembly language. Typically, there is a one-to-one correspondence between
the assembly language and machine language instructions. In fact, they differ only in appearance.
The processor understands only the machine language, whose instructions consist of bits of 1s and 0s. So, you need a program that
can
do
this
magic
for
you!
This
program
is
called
:
the
Assembler.
Writing code only with 1 & 0 is cumbersome, thats why we dont write anymore with machine code, he
murmured

What

is

an

Assembler

An assembler is a utility program that converts source code programs from assembly language into machine language, so the CPU can
understand it. A picture is worth a thousand words:

Is Assembly Language Portable?


Absolutely Not! Assembly language is directly influenced by the instruction set and architecture of the processor. The instructions are
native to the processor used in the system. In other words, porting an assembly language program from one computer to another with a
different processor usually means starting over from scratch. For example, a program written in the Intel assembly language cannot
be
executed
on
the Motorola or
an ARM processor.
Which

Assembler

is

the

Best

There are well over a dozen different assemblers available for the x86 processor running on PCs. They have widely varying feature
sets and syntax. Some are suitable for beginners, some are suitable only for advanced programmers. Some are
very well documented, others have little or no documentation. Some are supported by lots of programming
examples, some have very little in the way of example code. Certain assemblers have tutorials and
books available that use their particular syntax, others have nothing. Some are very basic, others are very complex. Which

assembler

is

best,

then?

Like many of lifes questions, there is no simple answer to the question which assembler is best? This is because different
people have different criteria for judging what is best. Without a universal metric for judging between various assemblers, there is
no way to pick a single assembler and call it the best. In this saga, we will use an assembler calledJWasm. In the next chapter, Ill
tell you why we choose this assembler. Here is a small map Ive designed to give you a global image of different assemblers.

How Does Java Relate to Assembly Language?


High-level languages such as C++ and Java have a one-to-many relationship with assembly language. A single statement in C++
expands into multiple assembly language or machine instructions. We can show how C/C++ statements expand into machine code.
Most people cannot read raw machine code, so we will use its closest relative, assembly language. The following C++ code carries out
two arithmetic operations and assigns the result to a variable. Assume myVariableAand myVariableB are integers:
int myVariableA;
1 <span style="font-family: Courier New; font-size: 10pt;">int myVariableB = (myVariableA
2 + 4) * 3;

Following is the equivalent translation to assembly language. The translation requires multiple statements because assembly language
works at a detailed level:
mov eax,myVariableA ; move Y to the eax register</pre>
<span style="font-family: Courier New; font-size: 10pt;">add eax,4 ; add 4 to the eax
register
</span>
1
2
3
4
5
6
7
8
9
10
11

<span style="font-family: Courier New; font-size: 10pt;">mov ebx,3 ; move 3 to the ebx
register
</span>
<span style="font-family: Courier New; font-size: 10pt;">imul ebx ; multiply eax by
ebx
</span>
<span style="font-family: Courier New; font-size: 10pt;">mov myVariableB,eax ; move
eax to X

Have a look at the figure below:

A statement in high-level language is translated typically into several assembly language instructions, and a lot of 1 and 0 bits in
binary form. Well, ultimately there has to be something to execute the machine language instructions. This is the system hardware,
which
consists
of
digital
logic
circuits
and
the
associated
support.

Pff !! .., This is all crap! I dont get anything in this code and I am still not convinced In JAVA, there is a
reduced risk of bugs, no absence of library routines, programs are easier to maintain. And you dont get
BORED
writing
long
routines.

Why

Should

Care?

Its fast Assembly programs are generally faster than programs created in higher level languages. Often, programmers
write
speed-essential
functions
in
Assembly.

Its powerful You are given unlimited power over your assembly programs. Sometimes, higher level languages have
restrictions
that
make
implementing
certain
things
difficult.

Its small Assembly programs are often much smaller than programs written in other languages. This can be very useful
if
space
is
an
issue.

Its magic To investigate an application whose source code is not available (and most frequently, this is the case), it is
necessary to discover and analyze its algorithm, which is spread over the jungle of assembly code. Or, to understand how a
client/server application communicates, it is necessary to analyze packets and reverse engineer the undocumented protocol.
Sometimes, when a specific vulnerability is exposed, a company may discover more related bugs, so they fix them silently
with no public announcements, and a person may reverse engineer the patches or fixes and detect what changes have been
made to a particular file and possibly create exploit code to exploit it. Also, investigation of undocumented features of the
operating
system
or
a
file
format
is
also
carried
out
using
Assembly.

Other tasks that can be done using this language include searching for backdoors, neutralizing viruses, customizing
applications for the hackers own goals, cracking secret algorithms the list is endless. The area of application of
Assembly language is so wide that it is much easier to list the areas to which it has no relation.

Assembly language is the only computer language that lets you talk to a computer in its native tongue, commanding the hardware to
perform exactly as you say. If you like to be in charge, if you like to control things, if youre interested in details, youll be right at
home with assembly language. Believe me, Assembly is the true language for programmers ! A hacker that hasnt mastered Assembly
language is not a hacker because nothing really moves without it.
Who
Needs
to
Learn
It?
Software Vulnerability Analysts, Bug Hunters, Shell-coders, Exploit Writers, Reverse Code Engineers, Virus Authors, Malware
Analysts .. And many more! Sometimes, some math applications or 3D games need optimization, so they call Assembly.
For instance, consider the situation, in which an infamous General Protection Fault window pops up, containing an error
message informing the user about a critical error. Application Programmers or Software Engineers, cursing and swearing, obediently

close the application and are at a loss (they only guess that this is the programs karma). All of these messages and dumps are
unintelligible to them. The situation is different for the ones that have mastered Assembly. These guys go by the specified address,
correct
the
bug,
and
often
recover
unsaved
data.
What

Types

of

Programs

Will

Create?

Id also like to mention that all examples included in this saga were tested under operating systems of the Windows NT family from
Windows 2000 upwards. Therefore, although I did my best, I cannot guarantee that all examples will work under Windows 9x systems
or
Windows
ME.

You can write desktop, networking, or database management apps;

You can write gaming and DirectX apps;

You can write crackmes, trainers, or security tools..

In Assembly, you are limited only by your imagination.

Tiny Web Browser from the WinAsm Forum.

EzProcess : Process/Thread Manager Program from the WinAsm Forum.

Oldies but Goodies, PacMan in pure ASM :

For our beloved crackers, a key generator from FOFF Team :

Why x86 Family Processors? Why Windows?


Assembly language programs can be written for any operating system and CPU model. Most people at this
point are usingWindows on x86 CPUs, so we will start off with programs that run in this environment. Once a basic grasp of
the assembly language is obtained, it should be easy to write programs for different environments.
What

Background

Should

Have?

You should have programmed in at least one structured high-level language, such as Java, C, C++, Pascal, Python, or Visual Basic.
Generally speaking, you should know what is a variable, an array, a string, what are functions & how to use an IF/WHILE statement
to
solve
programming
problems.
Its
not
a
must,
but
it
is
advisable.
Final

Words

Listen

gentleman:

Now you know that any programming task that can be done in a high level language can also be done in Assembly
language since all high level languages have to compile source code down to Assembly language code level for CPU
execution.

I hope that you understand also that Assembly is more needed when size or time speed matter.

Finally Im sure that you get the idea that Assembly is CPU-dependent, we are focusing the x86-32bits family here, under
the
Windows
platform.

With that piece of information in hand, we shall go off to next chapter, setting up an environment development with the right tools.
What
about
meeting
tomorrow
folks?
Same
time
same
place.
Bring
your
laptops.

The girl: Humm!! Impressive. Tell me, what is that question you wanted to ask me?

Me: Aha!! Let me ask you first what is your name?

Her: They call me Megabyte. You?

Me: They call me Noteworthy. Were you interested in the conversation?

Her : Oh yes :)

Me: So what about joining us tomorrow?

Her: That would be my pleasure. See you tomorrow.

0
inShare

X86 Assembly Language, Part 2


Hi

Guys,

Hi

Megabyte,

lets

get

the

ball

rolling.

In this chapter, I provide a brief introduction to Assembly language programming tools. This chapter is intended for beginners;
therefore,
experienced
programmers
can
skip
it.

To program in Assembly, you will need some software, namely an assembler and a code editor as we have seen inchapter 1.
An assembler takes the written assembly code and converts it into machine code, it will come with a linkerthat links the
assembled
files
and
produces
a
executable
from
it
(.exe extension).
Sometimes, a crash may happen when the program cannot normally continue its execution or even run because of a programming bug;
fortunately, there is a program called the debugger that runs other programs, allowing its user to exercise some degree of control
over the
program,
and
to
examine
them
when
things
go
amiss.
Another
tool
you
may
have guessed is the disassembler, which
translates executable code into assembly languagethe inverse operation to that of an assembler. In fact, it is impossible to recover
100
%
original
source
code
through
the
assembly
code.
Finally,
In

there

each

is
tool,

tool
there

called
is

a resource
quite

compiler,
good

selection

Im

going
that

Code Editor: (Notepad++, UltraEdit, VIM, )

Assemblers: (JWasm, GoAsm, yASM, Fasm, )

Linker: (JWlink, Link, PoLink, )

Resource Compiler: (Microsoft RC, PoRC, GoRC, )

Debugger: (OllyDBG, Immunity Debugger, WinDBG, SoftICE, )

Disassembler: (IDA Pro, Win32Dasm, HDasm, )

to
can

explain
do

it

later

the

job

in

this
very

saga.
well.

There are some people who like the old fashion style and use each software separately. And others, who prefer something we call the
IDE.
An Integrated Development Environment (IDE) is a toolkit, an All-In-One utility that provides comprehensive
facilities
to
computer
programmers
for
software
development.
An
IDE
normally
consists
of
a:

[highlight color=eg. yellow, black]Source Code Editor + [Assembler + Linker + Resource Compiler]
[/highlight]
For each tool, there are a lot of choices around, so pick the one you like. For me, Ive highlighted in red
the one that concern this saga and Im going to tell you why. You dont have to stick with my choices if you know what you are doing
and
you
dont
have
to
pay
anything
since
all
of
them
are
free.
Assembler / Linker :
It goes without saying that MASM, originally by Microsoft, is the king of the hill when it comes to the sheer volume of books
describing how to program in assembly language. There are literally dozens and dozens of books available that use MASM as their
assembler
of
choice
for
teaching
assembly
language.
The real problem with MASM is the restrictions about its license, and also that its not constantly updated but only on an as-needed
basis by Microsoft. Some others will say that MASM is not a true low level assembler because it substantially hides the beauty of the
Assembly language and its capabilities. And using macro makes you forget that programming for many individuals is an art.
In my humble opinion, this is criticism, this is a flattering idea but not true! Because no one forces you to use MACROS, UNIONS or
any sort of high level structures in your coding style. Anyway, this stuff often comes into conflict. However, this relates to specific
philosophical
aspects;
therefore,
this
topic
will
not
be
covered
in
this
saga.
JWasm

fixes

it

all,

takes

MASM,

and

JWasm is free, has no artificial license restrictions, and can be used to create binaries for any OS.

JWasms source is open. Hence JWasm is able to run natively on Windows, DOS, Linux, FreeBSD and OS/2.

More output formats supported (Bin, ELF).

Optionally very small object modules can be created.

Better support for Open Watcom, for example the register-based calling convention.

more:

JWasm is faster than MASM.

We will use PoLink as a linker, we can use ML (Microsoft Linker) too, there is only one difference between them: PoLink accept RES
files for resources, whereas ML wants an OBJ file. Another difference is that PoLink can make smaller EXEs although, with the right
switches,
and
it
is
more
up
to
date.
If you need more information about JWasm, visit the official link. You should download it from here. Also, you
need to download the MASM32 package that contains all the import libraries and includes for building Win32
applications
or
DLLs
(Dynamic
Link
Libraries).
It is possible to use the WinInc: a set of include files for Masm, created by h2incx. Be aware that WinInc is
intended for people being familiar with the command line interface and experienced in programming (not
necessarily Assembler, however). There is also no installer supplied, just a compressed package of
directories and files together with a simple README.TXT trying to explain things. Thats why we gotta omit
it but its also an excellent choice of headers Taken from the JWasm website.
Debugger/Disassembler
Now, we will look at some of the differences between several of the most widely used Debuggers/Disassembles. This is by no means
exhaustive. Consider it as a brief overview to give people new to assembly/reversing a quick start guide.
Before we look at IDA Pro (Free), Immunity Debugger (ImmDBG) and Olly Debugger (OllyDBG). We must first fully understand the
differences between a debugger and a disassembler. I have heard these terms used interchangeably, but they are two separate tools. A
disassembler will take a binary and break it down into human readable assembly. With a disassembler you can take a binary and see
exactly how it functions (static analysis). Whereas with a debugger we can step through, break and edit the assembly while it is
executing (dynamic
analysis).

IDA Pro (Free)

Honestly, IDA Pro should be in a category by itself. It is an interactive, extensible disassembler and debugger. IDA is also
programmable with a complete development environment. This allows users to build plug-ins and scripts to assist them in their
research. The standard version of IDA is too expensive and gives you support for over 50 families of processors. But for someone who
is
new
to
reversing/disassembling,
the
free
version
will
do
just
fine.
One of the main advantages youll notice that IDA has over Immunity Debugger (ImmDBG) and Olly
Debugger (OllyDBG) is its platform support. IDA is available for Windows and Linux as well as Mac OS X.
You
can
download
ithere.
Olly Debugger (OllyDBG)

OllyDBG is a user-friendly, very small and portable 32-bit user-mode debugger with intuitive interface. As you get experience, youll
able to discover how powerful OllyDBG is. OllyDBG knows most of the Windows APIs when youre examining your binary.
OllyDBG will show you what each register parameter means. Unfortunately, it does not understand Microsofts symbol file format or
debug
information.

Immunity Debugger (ImmDBG)

Immunity Debugger is very similar to OllyDBG, the only new features ImmDbg offers over Olly is Python scripting and function
graphing, both of which are already supported in Olly through plug-ins. There are also plug-ins to fix the numerous bugs Olly has as
well. This is what its all about.

Official version of OllyDBG is available to download from here.

Official version of Immunity Debugger, is available to download from here.

Personally, Im going to choose ImmDebugger; nevertheless, you know sometimes you get bored and you feel that you need to change
and you switch to Olly :). When I do some cracking sessions, its in OllyDBG, when I do exploitation sessions, its in
ImmDebugger, it
is a
question
of
taste.

Integrated Development Environment


In this case, there are also a thousand IDEs, all of them are quite awesome, I made a screenshot of the most famous ones:

Again, to avoid putting images of all of them, here is a quick album you can look at. Choose what you get a good feel for. There is not
really a significant difference between them.

If you dont want to bother yourself, follow me, we will set up EasyCode, its very suitable for beginners,
made in an easy way as was never possible before. You can download it from here (MASM version).
Once you have the JWasm Assembler, the MASM32 SDK, and the EasyCode IDE, extract them in a default folder in your hard disk.
You
dont
actually
need
the
other
tools
for
this
part,
keep
them
for
later.

Unzip the package and run install.exe. Then, a series of message boxes will pop up, keep hitting OK till it asks to start extracting
the package. Again, click OK till it says that the installation has proceeded to its completion and appears to have run correctly.

Unzip the EasyCode.zip file and the EasyCode.Ms folder will be created.

Place the whole EasyCode.Ms folder anywhere you like in one of your hard disks. If the folder already exists, overwrite it.

Close all applications, open the EasyCode.Ms folder and run the Settings.exe program (if possible, as an Administrator).
Choose the desired options and press the OK button.

Now extract the JWasm archive, locate JWasm.exe, and copy it in the C:\masm32\bin directory.

Run the EasyCode.exe file (located in the EasyCode.Ms\Bin folder) or in the desktop and set the paths for Masm32
files. To do so, use the Tools>Settings menu. Go to the Compiler/Link Tab and set up paths as below:

Apply the changes, then press OK. Now that we have our tools working like a charm, lets begin programming! This is the most
commonly written program in the world, the Hello World! program. Click CTRL+N for a new project, chooseclassic
executable file, and uncheck all the options:
Copy and paste the following code in your IDE:
;; MessageBox.asm Displays Dont learn in a message box
; (c) Ayoub Faouzi aka Lord Noteworthy / i753CURi7Y Team 2012

;
.386
.Data
MsgBoxCaption
MsgBoxText

DB
DB

Simple Message Box,0


Hello, 0ld W0rld !,0

.Code
start:
push MB_OK +MB_ICONASTERISK
push offset MsgBoxCaption
push offset MsgBoxText
push NULL
call MessageBox
invoke ExitProcess, NULL
End start

Click F7 for building the project, youll be asked to save it. First of all, I recommend you create a new folder called Projects in
EasyCode.Ms and save all your projects in it. Afterward, create a new folder in the Projects directory and call
it: myFirstProgram, save all files:

myFirstProgram.ecp
(The Project File).

myFirstProgram.asm (The Assembly code file).

Press CTRL+F5 to run it:

Congratulations, you have just run your first assembly code ! Take your time to discover your favorite IDE and its features. Also, you
should take into consideration that IDA Pro alone requires a book or a whole chapter to fully present it as it is worth, and this also goes
for OllyDBG & ImmDBG. But dont worry, Ill teach you things on-demand, I mean when you really need it. As an additional note, I
didnt describe each IDE and I didnt explain the main ideas and origins of why we still see some new assemblers appear and others
die.
In the first hand, because you dont have enough knowledge to handle it, and in the other hand because lets assume for the sake of
argument that you picked up an *assembler* that is capable of making your code the most effective optimization possible, thats
awesome! However, It doesnt mean that you can write shitty code and expect the assembler to do its magic.
All in all, I want you to go straight forward to Assembly as a language and I dont want you to waste your time; these things are just
tools. What you need to master is the assembly language itself, the Intel x86 instruction set and how to write great code beyond any
assembler
product.
This
is
a
general
programming
advice.
Dont believe that because something is trendy that its good. I see some people doing something against their own inner guidance
because they think the community wants them to do it that way, so people are working on some subject even they arent terribly
interested in it, because they think that they will get more prestige. This is ridiculous. Seek good science instead of popular science
because
its
the
most
beneficial
to
the
world.
Sorry
if
I
was
out
of
the
subject!
The last thing you need to own is the Win32 Help Library (Win32.hlp file) if you dont have access to
Internet, otherwise you should navigate through the MSDN (Microsoft Developer Network) so we can look up

APIs

definitions

and

prototypes.

In this chapter, the primary goal was to get you familiar with some assembly and debugging/disassembling tools. I assume you
understand that the syntax of assembly code differs slightly from an assembler to another; nevertheless, different assemblers will
generate in the end the same machine code. I hope you understood everything fine.

X86 Assembly Language, Part 3.1


For part 2 of this series, please click here.

Programming in a high-level language does not require a detailed knowledge of the system hardware. Assembly language
programmers, however, should have some basic understanding of the underlying system architecture.
Although you can write software that is ignorant of these concepts, understanding the organization of the system will allow you to
write code that runs as fast as possible. If you want to delve further, check an architectural tutorial, you may find more information
than an assembly based tutorial.
The basic operational design of a computer system is called its architecture. John Von Neumann, a pioneer in computer
design, is given credit for the architecture of most computers in use today. For example, the x86 family uses the Von Neumann
architecture. A typical Von Neumann system has three major components:
The Central Processing Unit (CPU)

The Memory

Input/Output (I/O) devices.


This refinement of the Von Neumann model combines the Arithmetic and Logic Unit (ALU) and the Control Unit
(CU) into one functional unit, the CPU. The Input and Output units are also combined into a single I/O unit.
The Arithmetic Logic Unit (ALU) performs arithmetic operations such as addition and subtraction and logical
operations
such
as
AND,
OR,
and
NOT.

The Control Unit (CU) coordinates the sequencing of steps involved in executing machine instructions.

Registers are local storage areas within the processor that are used to hold data that is being worked on by the
processor.

These three major components are interconnected together using the System Bus, which is made up of:
The Control Bus

The Data Bus

The Address Bus


The CPU communicates with memory and I/O devices by placing a numeric value on the address bus to select one of the
memory locations or I/O device port locations, each of which has a unique binary numeric address. Then the CPU, I/O, and memory
devices pass data among themselves by placing the data on the data bus. The control bus contains signals that determine the
direction of the data transfer:
to memory or from memory.

to I/O device or from I/O device.


During a memory read or write operation, the address bus contains the address of the memory location where the data is to be
read from or written to. Note that the terms read and write are with respect to the CPU: the CPU reads data from memory and
writes data into memory. If data is to be read from memory then the data bus contains the value read from that address in memory. If
the data is to be written into memory then the data bus contains the data value to be written into memory.

The CPU (Central Processing Unit)

The CPU is the main component, the computer brain which execute a sequence of instructions that performs some primitive operation,
such as adding two numbers. An instruction is encoded in binary form as a sequence of 1 or 0.
The instructions supported by a particular processor and their byte-level encodings are known as its instruction-set
architecture (ISA). Different families of processors, such as Intel, MIPS, PowerPC, Motorola, Zilog, Texas
Instrumentand the ARM processor family have different ISAs.
As you can see, there are several commonly used computer architectures. Since the bulk of the processors in use today are Intel
x86, and they are the most dominant format for the world s computers, we will further focus on that architecture.
You have seen me writing x86 from the beginning of this saga but you may probably wonder what does that exactly mean. To start,
lets
give
you
a
brief
history
of
Intels
family
of
microprocessors.
Evolution from 4004 to todays microprocessors (Core i7)
Im going to narrate the story of Intel, keep your eyes open guys, if you hear any unfamiliar word, ask me.

Intel introduced microprocessors way back in 1971. Their first 4-bit microprocessor was the 4004. This was followed by
the 8080 and 8085 processors. The work on these early microprocessors led to the development of theIntel
architecture (IA). The first processor in the Intel family was the 8086 processor, introduced in 1979. It has a 20-bit
address
bus
and
a
16-bit
data
bus.

The 8088 is a less expensive version of the 8086 processor. The cost reduction is obtained by using an 8-bit data bus.
Except for this difference, the 8088 is identical to the 8086 processor. Intel introduced segmentation with these

processors. These processors can address up to four segments of 64KB each. This IA segmentation is referred to as
the real-mode segmentation.

X86 Assembly Language, Part 3.2


Ge! Stop! Whats Segmentation?
Memory segmentation

The 80186 is a faster version of the 8086. It also has a 20-bit address bus and 16-bit data bus, but has an
improved instruction set. The 80186 was never widely used in computer systems.
The real successor to the 8086 is the 80286, which was introduced in 1982. It has a 24-bit address bus, which
implies 16MB of memory address space. The data bus is still 16-bits wide, but the 80286 has some memory
protection capabilities. It introduced the protected-mode into the IA architecture. Segmentation in this new
mode is different from the real-mode segmentation. We present details on this new segmentation later. The
80286 is backward compatible in that it can run the 8086-based software.
Huh, whats protected mode? And how many modes can the processor run on?
And what is backward compatibility?
Each processor introduced into the Intel family since the 8086 has been backward-compatible with earlier processors. This approach
enables older software to run (without recompilation) on newer computers without modification. Newer software eventually appeared,
requiring features of more advanced processors.

Intel introduced its first 32-bit processor the 80386 in 1985. It has a 32-bit data bus and 32-bit address bus.
It follows their 32-bit architecture known as IA-32. The memory address space has grown substantially (from
16MB address space to 4GB). This processor introduced paging into the IA architecture. It also allowed
definition of segments as large as 4GB. This effectively allowed for a flat model (i.e., effectively turning off
segmentation). Later sections present details on this aspect. Like the 80286, it can run all the programs written
for 8086 and 8088 processors.
What is paging?
Paging is a special job that microprocessors can perform to make the available amount of memory in a system appear larger and more
dynamic than it actually is. In a paging system, a certain amount of space may be laid aside on the hard drive (or on any secondary
storage) called the swap file or swap partition. The virtual memory of the system is everything a program can access like memory, and
includes physical RAM and the swap space.
Eh! Flat model, what on earth could that mean? What other models can the memory operate on?
The Intel 80486 processor was introduced in 1989. This is an improved version of the 80386. While maintaining
the same address and data buses, it combined the co-processor functions for performingfloating point
arithmetic. The 80486 processor has added more parallel execution capability to decode instructions and

execution units to achieve a scalar execution rate of one instruction per clock. It has an 8KB on chip LI cache.
Furthermore, support for the L2 cache and multiprocessing has been added. Later versions of the 80486
processors incorporated features such as energy saving mode for notebooks.
What is floating point arithmetic?
It is a function which handles all mathematical operations that have anything to do with floating point numbers or fractions. It is a dedicated
logic unit specifically designed to work on floating point numbers and nothing else, hence the name. It can be defined as a specialized
coprocessor that can manipulate numbers quicker than the basic microprocessor circuitry itself.
Instruction per what?!!
The latest in the family is the Pentium series. It is not named 80586 because Intel found belatedly that
numbers couldnt be trademarked! The first Pentium was introduced in 1993.
The Pentium is similar to the 80486, but uses a 64-bit wide data bus. Internally, it has 128 and 256 bit wide data
paths to speed up internal data transfers. However, the Pentium instruction set supports 32-bit operands like
the 80486 processor. It has added a second execution pipeline to achieve superscalar performance by having
the capability to execute two instructions per clock. It has also doubled the on chip LI cache, with 8KB for data
and another 8KB for the instructions. Branch prediction has also been added. The Pentium Pro processor has a
three-way superscalar architecture. That is, it can execute three instructions per clock cycle. The address bus
has been expanded to 36bits, which gives it an address space of 64GB. It also provides dynamic execution
including out-of-order and speculative execution. In addition to the LI caches provided by the Pentium, the
Pentium Pro has a 256KB L2 cache in the same package as the CPU.
The Pentium II processor has added multimedia (MMX) instructions to the Pentium Pro architecture. It has
expanded the LI data and instruction caches to 16KB each. It has also added more comprehensive power
management features including Sleep and Deep Sleep modes to conserve power during idle times.
What are multimedia MMX instructions?
The Pentium III processor introduced streaming SIMD extensions (SSE), cache prefetch instructions, and
memory fences, and the single-instruction multiple-data (SIMD) architecture for concurrent execution of
multiple floating-point operations. Pentium 4 enhanced these features further.
What is SIMD ?

Intels 64-bit Itanium processor is targeted for server applications. For these applications, the 32-bit memory
address space is not adequate. The Itanium uses a 64-bit address bus to provide substantially larger address
space. Its data bus is 128 bits wide. In a major departure, Intel has moved from the CISC designs used in their
32-bit processors to RISC orientation for their 64-bit Itanium processors. The Itanium also incorporates several
advanced architectural features to provide improved performance for the high-end server market.
What is the difference between RISC and CISC?
RISC and CISC stand for two different competing philosophies in designing modern computer architecture. The debate between them has
been going on for a long time and will likely continue. The difference between RISC and CISC can lie on many levels.

CISC, pronounced sisk, stands for Complex Instruction Set Computer. What is a complex instruction? For example, adding
two integers is considered a simple instruction. But an instruction that copies an element from one array to another and automatically
updates both array subscripts is considered a complex instruction.
The philosophy behind CISC is that hardware is always faster than software, therefore one should make a powerful instruction set, which
provides programmers with assembly instructions to do a lot with short programs. In fact, in CISC architecture, what you do is just keep
layering on more & more instructions. You get some new things that you find a lot of people are doing frequently, like the compiler always
needs to generate; and we say, you know, lets put all that sequence of instructions in one single complex instruction. For instance, Intel
and AMD CPUs are based on CISC architectures.
The other major architecture is RISC and stands for Reduced Instruction Set Computer. This term is misleading; many are
under the impression that there are fewer instruction in the processors instruction set. You should realize that RISC actually means
(Reduced Instruction) Set Computer, not Reduced (Instruction Set) Computer. That is, the goal of RISC was to reduce the complexity of
individual instructions, not necessarily reduce the number of instructions a RISC CPU supports. RISC is sort of a push back against CISC
when they just keep adding things
Most of the time, were only doing this small subset of your RISC thing, and the compiler/writer still does not know yet how to use all these
things and cant figure out how to generate, so we are gonna go ahead and try to figure out from HL code.

IBM PowerPC processors have RISC architecture. Apple Mac used to be based on PowerPC processors, but it is not true anymore.
However, we can still find PowerPC processors in video game consoles like (Wii, Xbox 360, and PlayStation 3). Another RISC
architecture is ARM, used extensively in consumer electronics, including:
Mobile phones (Some Nokia and Sony Ericsson).
Palms and Pocket-PCs PDAs, tablets, smartphones (Samsung galaxy, iPhones).
Digital media and music players (iPodes).
Calculators and computer peripherals such as hard drives and routers

Here is a small additional side by side comparison between the two competing architectures:

And which one is better?


Right now this is still pretty much in the air. While the PC world is dominated by CISC processors, elsewhere mostly RISC processors are
used.
But really now, which one is better?
It is just a matter of time. Some will claim that RISC is cheaper and faster, so it is the processor that will withstand the test of time, others
say that RISC architecture puts too much of a burden on software, that the only way to go is to push the complexity to the hardware with
CISC processors, as they are becoming faster and cheaper. Yet more and more I believe that RISC and CISC processors will someday
merge because of the common goal of high performance.
Thats the point, looking at the most modern processors, it becomes evident that the whole rivalry between CISC and RISC is now not of
great importance. This is because the two architectures are converging closer to each other, with CPUs from each side incorporating ideas
from the other.
Todays RISC chips support as many instructions as older CISC chips.
CISC chips are starting to use techniques there we associated with RISC chips.
Finally, you understand why the architecture is called x86 because the earliest processors in this family were identified by model numbers
ending in the sequence 86: the 8086, the 80186, the 80286, the 386, and the 486. Because one cannot establish trademark rights on
numbers, Intel and most of its competitors began to use trademark-acceptable names such as Pentium for subsequent generations of
processors, but the earlier naming scheme remains as a term for the entire family.
Registers
Most of the operations of the processor require processing data. Unfortunately, the slowest operations a processor can undertake are trying
to read or write data in memory. As shown in first figure, when the processor accesses a data element, the request must travel outside
of the processor, across the control bus, and into the memory storage unit. This process is not only complicated, but also forces the
processor to wait while the memory access is being performed. This downtime could be spent processing other instructions.
To help solve this problem, the processor includes internal memory locations called registers. The registers are capable of storing data
elements for processing without having to access the memory storage unit. The downside to registers is that a limited number of them are
built into the processor chip. If you look carefully at the figure below, you should notice that the lower you go, the higher memory storage
you get, but the slowest as well.

The IA-32 platform processors have multiple groups of registers of different sizes. They are classified according to the functions they
perform. Different processors within the IA-32 platform include specialized registers. The core groups of registers available to all processors
in the IA-32 family are shown in the following table.

Here we come to the end of our part 3. In the next tutorial, we will discuss the uses of registers in greater detail.

Vous aimerez peut-être aussi