Vous êtes sur la page 1sur 95

Engr.

Julius Cansino

Assembly language is a compiled language


Source-code must first be created with a texteditor program Then the source-code will be compiled Assembly language compilers => assemblers First: text-editor(source code editor) Second: assembler Third: Linker

Auxiliary Programs

Assembles source code to generate object code in the process.


Combines object code modules created by assembler

Fourth: Loader
Built-in to the operating system and is never explicitly executed. Takes the relocatable code created by the linker, loads: it into memory at the lowest available location, then runs it.

Fifth: Debugger
Environment for running and testing assembly language programs.

Object Code

Linker

Relocatable Code

Loader

Source Code

Assembler

RAM

Other Object Code1

Other Object Code2

DOS
provides the environment in which programs run. Provides a set of helpful utility functions
Must be understood in order to create program in DOS

You can use the edit command in DOS or just use the notepad.

AH BH CH DH

AL BL CL DL

CS
DS SS ES

SP BP
SI DI

Bus Control Unit ALU

CU
Flag Register

1 2 3 4

Instruction Pointer

Assembly language
Thought goes into the use of the computer memory and the CPU registers

Register
Like a memory location in that it can store a byte (or work) value. No address in the memory, it is not part of the computer memory(built into the CPU)

Importance of Registers in Assembly Prog.


Instructions using registers > operating on values stored at memory locations. Instructions tend to be shorter (less room to store in memory) Register-oriented instructions operate faster that memory-oriented instructions
Since the computer hardware can access a register much faster than a memory location.

AX
BX CX DX SI

The Accumulator
The Pointer Register The Loop Counter Used for multiplication and Division The Source string index register

SP
IP CS DS SS

The stack pointer


The Instruction pointer The code segment register The data segment register The stack segment register

DI
BP

The Destination String index register


Used for passing arguments on the stack

ES
FLAG

The Extra segment register


The flag register

CS

Code Segment

16-bit number that points to the active code-segment

DS

Data Segment

16-bit number that points to the active data-segment


16-bit number that points to the active stack-segment 16-bit number that points to the active extra-segment

SS

Stack Segment

ES

Extra Segment

IP

Instruction Pointer

16-bit number that points to the offset of the next instruction 16-bit number that points to the offset that the stack is using
used to pass data to and from the stack

SP

Stack Pointer

BP

Base Pointer

AX

Accumulator Register
Base Register Count Register Data Register

BX CX DX

mostly used for calculations and for input/output Only register that can be used as an index register used for the loop instruction input/output and used by multiply and divide

SI

Source Index

used by string operations as source used by string operations as destination

DI

Destination Index

AX, BX, CX, & DX more flexible that other


Can be used as word registers(16-bit val) Or as a pairs of byte registers (8-bit vals)

A General purpose registers can be split


AX = AH + AL BX = BH + BL CX = CH + CL DX = DH + DL

Ex: DX = 1234h, then DH = 12h and DL = 34h

Consist of 9 status bits(flags) Flags because it can be either


SET(1) NOT SET(0)

Abr. OF

Name Overflow Flag

bit n 11

Description indicates an overflow when set used for string operations to check direction if set, interrupt are enabled, else disabled if set, CPU can work in single step mode if set, resulting number of calculation is negative

DF

Direction Flag

10

IF

Interrupt Flag

TF

Trap Flag

SF

Sign Flag

Abr.

Name

bit n

Description

ZF

Zero Flag

if set, resulting number of calculation is zero some sort of second carry flag indicates even or odd parity contains the leftmost bit after calculations

AF

Auxiliary Carry

PF

Parity Flag

CF

Carry Flag

You want to see all these register and flags?


go to DOS Type debug type "r" The youll see all the registers and some abbreviations for the flags. Type "q" to quit again.

How DOS uses memory


databus = 16-bit
it can move and store 16 bits(1 word = 2 bytes) at a time.

If the processor store 1 word (16-bits) it stores the bytes in reverse order in the memory. 1234h (word) ---> memory 34h (byte) 12h (byte)
Memory value: 78h 56h derived value 5678h

Computer divides it memory into segments


Standard in DOS Segments are 64KB big and have a number These numbers are stored in the segment registers Three main segments are the code, data and stack segment
Overlap each other almost completely Try type d in the debug
4576:0100 -> memory address where 4576 segment number; 0100 offset

(see above).

Segments overlaps
The address 0000:0010 = 0001:0000 Therefore, segments starts at paragraph boundaries
A paragraph = 16 bytes So a segment starts at an address divisible by 16

0000:0010 => 0h:10h => 0:16


Memory Location: (0*16)+16 = 0+16 = 16
(linear address)

0001:0000 => 1h:0h => 1:0


Memory Location: (1*16)+0 = 16+0 = 16
(linear address)

.model small .stack .data message db "Hello world, I'm learning Assembly !!!", "$"

My First Program

.code main proc mov ax,seg message mov ds,ax mov ah,09 lea dx,message int 21h

mov ax,4c00h int 21h main endp end main

Identifiers

An identifier is a name you apply to items in your program. the two types of identifiers are "name", which refers to the address of a data item, and "label", which refers to the address of an instruction. The same rules apply to names and labels

Statements

A program is made of a set of statements, there are two types of statements, "instructions" such as MOV and LEA, and "directives" which tell the assembler to perform a specific action, like ".model small or .code

Here's the general format of a statement:


identifier - operation - operand(s) - comment

The identifier is the name as explained above. The operation is an instruction like MOV. The operands provide information for the Operation to act on. Like The comment is a line of text you can add as a comment, everything the assembler sees after a ";" is ignored.
MOV (operation) AX,BX (operands).

Example
MOV AX,BX ;this is a MOV instruction

The source code can only be assembled by an assembler or and the linker.
A86 MASM TASM we will use this one

Install TASM

Then use the tasm.exe and tlink.exe

How to Assemble
The Assemble
To assemble Type the ff. on the command prompt:
cd c:\tasm\bin tasm <filename/path of the source code>
tasm c:\first.asm

tlink <filename/path of the object code>


tlink c:\tasm\bin\first.obj or tlink first.obj

To run call the .exe on the command prompt:


Example in our program(First.asm) C:\tasm\bin\first.exe or just first.exe

.model small .stack .data


message db "Hello world, I'm learning Assembly !!!", "$"

.code
main proc mov ax,seg message mov ds,ax mov ah,09 lea dx,message int 21h mov ax,4c00h int 21h main endp end main

.model small
Lines that start with a "." are used to provide the assembler with information. The word(s) behind it say what kind of info.
In this case it just tells the assembler that the program is small and doesn't need a lot of memory. I'll get back on this later.

.stack
This one tells the assembler that the "stack" segment starts here.
The stack is used to store temporary data.

.data
indicates that the data segment starts here and that the stack segment ends there.

.model small .stack .data

message db "Hello world, I'm learning Assembly


!!!", "$"

.code
main proc
mov ax,seg message mov ds,ax mov ah,09 lea dx,message int 21h mov ax,4c00h int 21h

main endp end main

.code

indicates that the code segment starts there and the data segment ends there.
Code must be in procedures, just like in C or any other language. This indicates a procedure called main starts here. endp states that the procedure is finished. endmain main : tells the assembler that the program is finished. At the procedure called main in this case.

main proc

It also tells the assembler where to start in the program.

message db "xxxx"

DB means Define Byte and so it does. In the data-segment it defines a couple of bytes. It's called an identifier".

These bytes contain the information between the brackets. "Message" is a name to indentify this byte-string.

Memory space for variables


DB (Byte 8 bit ) DW (Word 16 bit) DD (Doubleword 32 bit) Example:
foo db 27 ;by default all numbers are decimal bar dw 3e1h ; appending an "h" means hexadecimal real_fat_rat dd ? ; "?" means "don't care about the value

Variable name

Address cant be changed Value can be changed

.model small .stack .data message db "Hello world, I'm learning Assembly !!!", "$" .code main proc

mov ax, seg message


mov ds,ax mov ah,09 lea dx,message int 21h mov ax,4c00h int 21h main endp end main

mov ax, seg message


AX is a register.
You use registers all the time, so that's why you had to know about them before.

MOV is an instruction that moves data.


It can have a few "operands Here the operands are AX and seg message.

seg message can be seen as a number.


It's the number of the segment "message in (The data-segment) We have to know this number, so we can load the DS register with it. Else we can't get to the bit-string in memory. We need to know WHERE the bit-string is located in memory.

The number is loaded in the AX register.


MOV always moves data to the operand left of the comma and from the operand right of the comma.

Syntax:
MOV destination, source

Allows you to move data into and out the registers

Destination either registers or mem. Loc. Source can be either registers, mem. Loc. or numeric value

Memory-to-memory transfer NOT ALLOWED

Codes we do earlier

foo db 27 bar dw 3e1h real_fat_rat dd ?

;by default all numbers are decimal ; appending an "h" means hexadecimal ; "?" means "don't care about the value

Notice the size of the source and destination (must match in reg-reg, mem-reg, reg-mem Transfers)

mov ax,bar mov dl,foo mov bx,ax mov bl,ch mov bar,si mov foo,dh mov mov mov mov ax,5 al,5 bar,5 foo,5

; load the word-size register ax with ; the word value stored at location bar. ; load the byte-size register dl with ; the byte value stored at location foo. ; load the word-size register bx with ; the byte value in ax. ; load the byte-size register bl with ; the byte value in ch. ; store the value in the word-size ; register si at the memory location ; labelled "bar". ; store the byte value in the register ; dh at memory location foo. ; store the word 5 in the ax register. ; store the byte 5 in the al register. ; store the word 5 at location bar. ; store the byte 5 at location foo.

Constant must consistent with the destination

MOV AL, 3172 MOV foo, 3172

Why the code above are Illegal?

.model small .stack .data message db "Hello world, I'm learning Assembly !!!", "$" .code main proc mov ax, seg message

mov ds,ax mov ah,09 lea dx,message


int 21h mov ax,4c00h int 21h main endp end main

mov ds,ax

Here it moves the number in the AX register (the number of the data segment) into the DS register. We have to load this DS register this way (with two instructions) Just typing: "mov ds,segment message" isn't possible.

mov ah, 09

MOV again. This time it load the AH register with the constant value nine. LEA - Load Effective Address.

lea dx, message

This instructions stores the offset within the datasegment of the bitstring message into the DX register. This offset is the second thing we need to know, when we want to know where "message" is in the memory. So now we have DS:DX.

AH BH CH DH

AL BL CL DL

CS
DS SS ES

SP BP
SI DI

Bus Control Unit ALU

CU
Flag Register

1 2 3 4

Instruction Pointer

.model small .stack .data message db "Hello world, I'm learning Assembly !!!", "$" .code main proc mov ax,seg message mov ds,ax mov ah,09 lea dx,message

int 21h

mov ax,4c00h int 21h


main endp end main

int 21h

This instruction causes an Interrupt. The processor calls a routine somewhere in memory. 21h tells the processor what kind of routine, in this case a DOS routine. For now assume that INT just calls a procedure from DOS. The procedure looks at the AH register to find out what it has to do. In this example the value 9 in the AH register indicates that the procedure should write a bit-string to the screen.

mov ax, 4c00h

Load the Ax register with the constant value 4c00h this time the AH register contains the value 4ch (AX=4c00h) and to the DOS procedure that means "exit program". The value of AL is used as an "exit-code" 00h means "No error"

int 21h

After running:
Go to DOS and type debug FIRST.exe to debug. Type d -> display some addresses Type u -> you will see something

Segment Number & Offset Machine Code instruction

0F77:0000 0F77:0003 0F77:0005


0F77:0000

B8C813 8ED8 B409


B8790F

MOV MOV MOV


MOV

AX,13C8 DS,AX AH,09


AX,0F79

originally: mov ax, seg message B8 ->mov ax 790F ->number

It means that data is store in the segment with number 0F79

The other instruction lea dx,message turned into mov dx,0.


So that means that the offset of the bit-string is 0 -> 13C8:0000. Try to type d 13C8:0000 Calculating other address
We will subtract 2 segments from 13C8 = 13C6 2 segments = 32 bit (0002:0000) The other address is 13C6:0020

The stack is a place where data is temporarily stored The SS and SP registers point to that place like this: SS:SP
So the SS register is the segment and the SP register contains the offset

There are a few instructions that make use of the stack


PUSH - Push a value on the stack POP - retrieve that value from the stack

The final value of AX will be 1234h.


First we load 1234h into AX, then we push that value to the stack. We now store 9 in AH, so AX will be 0934h and execute an INT. Then we pop the AX register.
We retrieve the pushed value from the stack.

MOV AX,1234H PUSH AX MOV AH,09 INT 21H POP AX

So AX contains 1234h again

We pushed the AX to the stack and we popped that value in BX.


What is the final value of AX and BX?

MOV AX, 1234H MOV BX, 5678H PUSH AX POP BX

It is easy done by the instruction .stack that will create a stack of 1024 bytes. The stack uses a LIFO system (Last In First Out)

MOV AX,1234H MOV BX,5678H PUSH AX PUSH BX POP AX POP BX

First the value 1234h was pushed after that the value 5678h was pushed to the stack. According to LIFO 5678h comes of first, so AX will pop that value and BX will pop the next. What is the value of AX and BX?

it "grows" downwards in memory. When you push a word (2 bytes) for example, the word will be stored at SS:SP and SP will be decreased to times. So in the beginning SP points to the top of the stack and (if you don't pay attention) it can grow so big downwards in memory that it overwrites the source code. Major system crash is the result.

If you fully understand this stuff (registers, flags, segments, stack, names, etc.) you may, from now on, call yourself a
"Level 0 Assembly Coder"

Suppose that we have 4 word-sized values stored in the variables MY, NAME, IS, NOBODY, (initial values 4, 5, 6, and 32) and that we want to move these values to the variables PLAY, MISTY, FOR, ME. Fortran Prog
INTEGER*2 MY,NAME,IS,NOBODY,PLAY,MISTY,FOR,ME DATA MY,NAME,IS,NOBODY/4,5,6,32/ .... PLAY=MY MISTY=NAME FOR=IS ME=NOBODY ....

Assembly Version
; destination variables play db ? misty db ? for db ? me db ? ; source variables my db 4 name db 5 is db 6 nobody db 32 ..... mov al,my ; PLAY=MY mov play,al mov al,name ; MISTY=NAME mov misty,al mov al,is ; FOR=IS mov for,al mov al,nobody ; ME=NOBODY mov me,al

We can write program in DEBUG


The reason for this is that with DEBUG we can concentrate our thoughts purely on assembly language

DEBUG
System Debugger Has its own built-in editor and primitive assembler Its code does not need to be linked also has facilities for modifying memory locations and for examining memory locations

Debug
cannot be used to conveniently develop larger programs one must literally know the memory addresses of all data items. an (immediate) value is distinguished from the value stored at an address in that an address is enclosed in square brackets.
MOV AX, 200 load ax with the value 200 MOV AX, [200] load ax with the value at address 200
200 means 200H or 512

My Name is Nobody program debug version


Let say we the byte variables MY, NAME, IS, NOBODY, PLAY, MISTY, FOR, and ME to reside at memory locations 200 (hex) through 207.

Our

program

might looks like ; ; ; ;


PLAY=MY MISTY=NAME FOR=IS ME=NOBODY

this:

mov mov mov mov mov mov mov mov

ax,[200] [204],ax ax,[201] [205],ax ax,[202] [206],ax ax,[203] [207],ax

The program may be entered with the "A" or command followed by the address. (Annnn)
-a100 48EE:0100 48EE:0103 48EE:0106 48EE:0109 48EE:010C 48EE:010F 48EE:0112 48EE:0115 48EE:0118 mov mov mov mov mov mov mov mov ax,[200] [204],ax ax,[201] [205],ax ax,[202] [206],ax ax,[203] [207],ax

"assemble

Entering a blank line terminates this process

We can check that the program is actually in the computer at address 100 with the "U" or "unassemble" command.

You may also type in U100,118 to specify the ending line to view

DEBUG Program
RAM
MOV AX,[200] assembler A10002
U command Unassembler

Deduced Code MOV AX,[200]

RAM
Executable Program Loader A10002

initialize the variables MY, NAME, IS, and NOBODY (which is to say, the values stored at memory locations 200 through 203).
can be done with the

"E"

or

"enter"

instruction (Ennnn)

E200 419F:0200 77.4 20.5 64.6 69.20


where

<space> moves cursor to the next address <enter> terminated enter command Can be also possible to use

77,20,64,69 are the original values of stored at the address

DB

and

DW

using

A:

-a200 419F:200 419F:201 419F:202 419F:203

db db db db

4 5 6 20

View entered values using

or

display

command

dnnn display from address nnnn dnnn,mmmm - display from nnnn to mmmm

Running the program


Using

or

Go

command

G=nnnn,mmmm runs the program from address nnnn to mmmm


In our case: G=100,118 Verify if it really works by displaying the location
-d200,207 419F:0200 04 05 06 20 72 65 63 74 .......

200

to

207

Terminating Debugger
Q or Quit Command

Other DEBUG command :


R register command
Examine or modify registers

Modifying Registers Rrn where rn is the name of the registers(AX,BX...) Ex. to store 4567 (hex) in the CX register

Instructions
ADD Additional SUB - Subtraction

Syntax

mnemonic destination, source ADD destination, source SUB destination, source


no memory-to-memory operations are allowed the source operand can be an immediate value

Things to remember:
the sizes of the source and destination operands must match

The ADD (SUB) instruction adds (subtracts) the value of the source operand to (from) the value of the destination operand, and stores the result in the destination

add add add add add


add sub sub sub

dx,dx cx,5 si,di bl,cl foo,5


foo,al bar,5 bar,3e1h al,foo

; add the DX register to itself.


; add the value 5 to the cx reg. ; add the di register to si reg. ; add cl reg. to bl. reg. ; add the value 5 to the ; variable foo. ; add contents of al to foo. ; subtract word value 5 from bar ; subtract 3e1h from variable bar ; subtract value of var. foo ; subtract contents of ax from si

from al

sub si,ax

Why the following codes are illegal


add cl,3e1h add cl,bx sub foo,cx

MY NAME IS NOBODY Program


; destination variables play db ? misty db ? for db ? me db ? ; source variables my db 4 name db 5 is db 6 nobody db 32 ..... mov al,my mov play,al mov al,name mov misty,al mov al,is mov for,al mov al,nobody mov me,al

Give the code of MY NAME IS NOBODY


Modify the program which at this time we wanted to do the equivalent of
PLAY=MY+1 MISTY=NAME-1 FOR=IS+1 ME=NOBODY-1

Smaller and executes more quickly INC


INC destination Adds one to destination

DEC
DEC destination Subtracts one from destination

It can happen in integer addition that the result of an addition is too big for the destination address to hold

the carry flag is used to store both carries and borrows in integer addition and subtraction Ex:
MOV MOV MOV ADD AL,200 BL,195 CL,25 AL,BL

the carry flag would be "set" to one, and the result would be truncated to 8 bits: i.e., AL would contain 139.

MOV MOV MOV ADD

the result, 225 (<256) is byte sized, so we would find that AL contains 225 and the carry flag is "cleared" to zero
AL,200 BL,195 CL,25 AL,BL

AL,200 BL,195 CL,25 AL,CL

MOV MOV MOV SUB

we are subtracting a smaller number from a bigger number, so AL register contains the result, 5, and the carry flag (which stores the "borrow") is cleared

ADC

ADC destination,source "add with carry ADC automatically adds in the carry left over from previous operations SBB destination,source "subtract with borrow SBB automatically subtracts the borrow

SBB

The instructions we have considered so far are limited in that they allow only linear code However, for real programming we need to have a way of transferring control from one program location to another.
We need to be able to choose which part of the computer's memory contains the program to be executed.

The control is accomplished with "jump" instructions.


Jump instructions have the syntax
mnemonic address

The mnemonic here can be a number of different things, but for the moment, we will assume that it is "JMP". A JMP instruction "jumps from the present location in memory (as indicated by the instruction pointer register IP) to the specified address in memory. In essence, JMP simply stores the given address in the IP register.

In DEBUG, the address operand is, of course, simply a number. For example, if we executed the instruction
JMP 121

then the very next instruction executed would be the instruction located at address 121h.

In tasm assembler
. . . JMP FOOBAR ADD AX,21 FOOBAR: INC AX . . .

"FOOBAR" is a label.

JMP performs an unconditional jump:

There are also a series of conditional jump instructions which perform a jump only if some special condition is met.

it always goes to the specified address, regardless of any special conditions that may obtain.

These instructions all have the general syntax given above, but their

Vous aimerez peut-être aussi