Vous êtes sur la page 1sur 42

von Neumann

Stored Program concept Main memory storing programs and data ALU operating on binary data Control unit interpreting instructions from memory and executing Input and output equipment operated by control unit Princeton Institute for Advanced Studies (IAS) Completed 1952

von Neumann

von Neumann
1000 x 40 bit words
Binary number 2 x 20 bit instructions

Set of registers (storage in CPU)


Memory Buffer Register Memory Address Register Instruction Register Instruction Buffer Register Program Counter Accumulator Multiplier Quotient

Moores Law
Increased density of components on chip Gordon Moore - cofounder of Intel Number of transistors on a chip will double every year Since 1970s development has slowed a little Number of transistors doubles every 18 months Cost of a chip has remained almost unchanged Higher packing density means shorter electrical paths, giving higher performance Smaller size gives increased flexibility Reduced power and cooling requirements Fewer interconnections increases reliability

Speeding it up
Pipelining On board cache On board L1 & L2 cache Branch prediction Data flow analysis Speculative execution

Performance Mismatch
Processor speed increased Memory capacity increased Memory speed lags behind processor speed

Solutions
Increase number of bits retrieved at one time
Make DRAM wider rather than deeper

Change DRAM interface


Cache

Reduce frequency of memory access


More complex cache and cache on chip

Increase interconnection bandwidth


High speed buses Hierarchy of buses

Intel 8086
The 8086 is a 16-bit microprocessor chip designed by Intel and introduced on the market in 1978, which gave rise to the x86 architecture. Intel 8088, released in 1979, was essentially the same chip, but with an external 8-bit data bus (allowing the use of cheaper and fewer supporting logic chips), and is notable as the processor used in the original IBM PC.

Segmentation
Compilers for the 8086 commonly supported two types of pointer, "near" and "far". Near pointers were 16-bit addresses implicitly associated with the program's code or data segment (and so made sense only in programs small enough to fit in one segment). Far pointers were 32-bit segment:offset pairs. C compilers also supported "huge" pointers, which were like far pointers except that pointer arithmetic on a huge pointer treated it as a flat 20-bit pointer, while pointer arithmetic on a far pointer wrapped around within its initial 64-kilobyte segment.

Segmentation
To avoid the need to specify "near" and "far" on every pointer and every function which took or returned a pointer, compilers also supported "memory models" which specified default pointer sizes. The "small", "compact", "medium", and "large" models covered every combination of near and far pointers for code and data. The "tiny" model was like "small" except that code and data shared one segment. The "huge" model was like "large" except that all pointers were huge instead of far by default. Precompiled libraries often came in several versions compiled for different memory models.

Assembly Language
most modern assemblers include a macro facility (described below), and are called macro assemblers. Most assemblers also include macro facilities for performing textual substitution e.g., to generate common short sequences of instructions to run inline, instead of in a subroutine

Assembler
Typically a modern assembler creates object code by translating assembly instruction mnemonics into op codes, and by resolving symbolic names for memory locations and other entities. The use of symbolic references is a key feature of assemblers, saving tedious calculations and manual address updates after program modifications. Most assemblers also include macro facilities for performing textual substitutione.g., to generate common short sequences of instructions to run inline, instead of in a subroutine. Assemblers are generally simpler to write than compilers for high-level languages, and have been available since the 1950s. Modern assemblers, especially for RISC based architectures, such as MIPS, Sun SPARC, HP PA-RISC and x86(-64), optimize instruction scheduling to exploit the CPU pipeline efficiently.

Assembler
There are two types of assemblers based on how many passes through the source are needed to produce the executable program. One-pass assemblers go through the source code once and assumes that all symbols will be defined before any instruction that references them. Two-pass assemblers (and multi-pass assemblers) create a table with all unresolved symbols in the first pass, then use the 2nd pass to resolve these addresses. The advantage in one-pass assemblers is speed - which is not as important as it once was with advances in computer speed and capabilities. The advantage of the two-pass assembler is that symbols can be defined anywhere in the program source.

Assembler
As a result, the program can be defined in a more logical and meaningful way. This makes two-pass assembler programs easier to read and maintain.

Language design
Basic elements Any Assembly language consists of 3 types of instruction statements which are used to define the program operations: opcode mnemonics data sections assembly directives

Opcode mnemonics
Instructions (statements) in assembly language are generally very simple, unlike those in high-level languages. Generally, an opcode is a symbolic name for a single executable machine language instruction, and there is at least one opcode mnemonic defined for each machine language instruction. Each instruction typically consists of an operation or opcode plus zero or more operands. Most instructions refer to a single value, or a pair of values. Operands can be either immediate (typically one byte values, coded in the instruction itself) or the addresses of data located elsewhere in storage. This is determined by the underlying processor architecture: the assembler merely reflects how this

Data sections
There are instructions used to define data elements to hold data and variables. They define what type of data, length and alignment of data. These instructions can also define whether the data is available to outside programs (programs assembled separately) or only to the program in which the data section is defined.

Assembly directives / pseudo-ops


Assembly Directives are instructions that are executed by the Assembler at assembly time, not by the CPU at run time. They can make the assembly of the program dependent on parameters input by the programmer, so that one program can be assembled different ways, perhaps for different applications. They also can be used to manipulate presentation of the program to make it easier for the programmer to read and maintain. (For example, pseudo-ops would be used to reserve storage areas and optionally their initial contents.) The names of pseudo-ops often start with a dot to distinguish them from machine instructions.

Assembly directives / pseudo-ops


Some assemblers also support pseudo-instructions, which generate two or more machine instructions. Symbolic assemblers allow programmers to associate arbitrary names (labels or symbols) with memory locations. Usually, every constant and variable is given a name so instructions can reference those locations by name, thus promoting self-documenting code. In executable code, the name of each subroutine is associated with its entry point, so any calls to a subroutine can use its name. Inside subroutines, GOTO destinations are given labels. Some assemblers support local symbols which are lexically distinct from normal symbols (e.g., the use of "10$" as a GOTO destination).

Assembly directives / pseudo-ops


Most assemblers provide flexible symbol management, allowing programmers to manage different namespaces, automatically calculate offsets within data structures, and assign labels that refer to literal values or the result of simple computations performed by the assembler. Labels can also be used to initialize constants and variables with re locatable addresses. Assembly languages, like most other computer languages, allow comments to be added to assembly source code that are ignored by the assembler.

Assembly directives / pseudo-ops


Good use of comments is even more important with assembly code than with higher-level languages, as the meaning and purpose of a sequence of instructions is harder to decipher from the code itself. Wise use of these facilities can greatly simplify the problems of coding and maintaining low-level code. Raw assembly source code as generated by compilers or disassemblers code without any comments, meaningful symbols, or data definitions is quite difficult to read when changes must be made.

Assembling the Source Code File


The text editor first creates a new text file, and later changes that same text file, as you extend, modify, and perfect your assembly language program. As a convention, most assembly language source code files are given a file extension of .ASM. In other words, for the program named FOO, the assembly language source code file would be named FOO.ASM. It is possible to use file extensions other than .ASM, but I feel that using the .ASM extension can eliminate some confusion by allowing you to tell at a glance what a file is for-just by looking at its name. All told, about nine different kinds of files can be involved during assembly language development-more if you take the horrendous leap into Windows software development.

Assembling the Source Code File


Each type of file will have its own standard file extension. Anything that will help you keep all that complexity in line will be worth the (admittedly) rigid confines of a standard naming convention. As you can see from the flow in figure above, the editor produces a source code text file, which we show as having the .ASM extension. This file is then passed to the assembler program itself, for translation to a re locatable object module file with an extension of .OBJ. When you invoke the assembler, DOS will load the assembler from disk and run it.

Assembling the Source Code File


The assembler will open the source code file you named after the name of the assembler and begin processing the file. Almost immediately afterward, it will create an object file with the same name as the source file, but with an .OBJ extension. As the assembler reads lines from the source code file, it will examine them, construct the binary machine instructions the source code lines represent, and then write those machine instructions to the object code file. When the assembler comes to the end of the source code file, it will close both source code file and object code file and return control to DOS.

Assembler Errors
The previous paragraphs describe what happens if the .ASM file is correct. By correct, I mean that the file is completely comprehensible to the assembler and can be translated into machine instructions without the assembler getting confused. If the assembler encounters something it doesn't understand when it reads a line from the source code file, we call the misunderstood text an error, and the assembler displays an error message. For example, the following line of assembly language will confuse the assembler and summon an error message: MOV AX,VX

Linking
In traditional assembly language work, what actually happens is that the assembler writes an intermediate object code file with an .OBJ extension to disk. You can't run this .OBJ file, even though it generally contains all the machine instructions that your assembly language source code file specified. The .OBJ file needs to be processed by another translator program, the linker. The linker performs a number of operations on the .OBJ file, most of which would be meaningless to you at this point. The most obvious task the linker does is to weave several .OBJ files into a single .

Linking
Why create multiple .OBJ files when writing a single executable program? One of two major reasons is size. A middling assembly language application might be 50,000 lines long. Cutting that single monolithic .ASM file up into multiple 8,000-line .ASM files would make the individual . ASM files smaller and much easier to understand.

Linking
The other reason is to avoid assembling completed portions of the program every time any part of the program is assembled. One thing you'll be doing is writing assembly language procedures, which are small detours from the main run of steps and tests that can be taken from anywhere within the assembly language program. Once you write and perfect a procedure, you can tuck it away in an .ASM file with other completed procedures, assemble it, and then simply link the resulting . OBJ file into the working .ASM file. The alternative is to waste time by reassembling perfected source code over and over again every time you assemble the main portion of the program.

Linking
This is shown in figure above. In the upper-right corner is a row of .OBJ files. These .OBJ files were assembled earlier from correct .ASM files, yielding binary disk files containing ready-togo machine instructions. When the linker links the .OBJ file produced from your in-progress .ASM file, it adds in the previously assembled .OBJ files, which are called modules. The single .EXE file that the linker writes to disk contains the machine instructions from all of the .OBJ files handed to the linker when then linker is invoked.

Once the in-progress .ASM file is completed and made correct, its .OBJ module can be put up on the rack with the others and added to the next in-progress .ASM source code file. Little by little you construct your application program out of the modules you build one at a time. A very important bonus is that some of the procedures in an .OBJ module may be used in a future assembly language program that hasn't even been begun yet. Creating such libraries of "toolkit" procedures can be an extraordinarily effective way to save time by reusing code over and over again, without even passing it through the assembler again!

Testing the .EXE File


If you receive no linker errors, the linker will create and fill a single .EXE file with the machine instructions present in all of the .OBJ files named on the linker command line. The .EXE file is your executable program. You can run it by simply naming it on the DOS command line and pressing Enter: C:\ASM>FOO

Debuggers and Debugging


The final, and almost certainly the most painful, part of the assembly language development process is debugging. Debugging is simply the systematic process by which bugs are located and corrected. A debugger is a utility program designed specifically to help you locate and identify bugs.

Debuggers and Debugging


Debugger programs are among the most mysterious and difficult to understand of all programs. Debuggers are part X-ray machine and part magnifying glass. A debugger loads into memory with your program and remains in memory, side by side with your program. The debugger then puts tendrils down into both the operating system (for our purposes, DOS, and later Linux) and into your program and enables some truly peculiar things to be done.

Debuggers and Debugging


One of the problems with debugging computer programs is that they operate so quickly. Thousands of machine instructions can be executed in a single second, and if one of those instructions isn't quite right, it's past and gone long before you can identify which one it is by staring at the screen. A debugger allows you to execute the machine instructions in a program one at a time, allowing you to pause indefinitely between each one to examine the effects of the last instruction on the screen. The debugger also lets you look at the contents of any location in memory, and the values stored in any register, during that pause between instructions.

Vous aimerez peut-être aussi