Vous êtes sur la page 1sur 174

EC6703

EMBEDDED AND REAL


TIME SYSTEMS
UNIT I INTRODUCTION TO EMBEDDED
COMPUTING AND ARM PROCESSORS

Complex systems and micro processors


Embedded system design process
Design example: Model train controller
Instruction sets preliminaries
ARM Processor
CPU: programming input and output
supervisor mode, exceptions and traps
Co-processors
Memory system mechanisms
CPU performance
CPU power consumption.
What is an Embedded System (ES)?
Embedded basically reflects the facts that they are an integral part of
the system.

“ It is a computer system that is built to control one or a few dedicated


functions, and is not designed to be programmed by the end user in
the same way that a desktop computer is”

It Contains processing cores that are either Micro-controllers or DSPs.

“An embedded system is some combination of computer hardware


and software, either fixed in capability or programmable, that is
specifically designed for a particular function.”
What is an Embedded System (ES)? Contd…

“Computing systems performing specific tasks


within a framework of real-world constraints”

“An Embedded System is a microprocessor based


system that is embedded as a subsystem, in a
larger system (which may or may not be a
computer system).”

“An ES is designed to run on its own without


human intervention, and may be required to
respond to events in real time”
Application of ES

Automotive: ECS, ABS; Aircraft


Network Appliances: Routers, Modems
Cell Phones, PDA, Mouse, E-Star Power
Printers, Hand Mixers, Toasters!
What is a real time system?
• A real-time system is one that must process
information and produce a response within a
specified time, else risk severe consequences,
including failure.
Modern Embedded Systems

DSP Application Analog


Code Specific Gates I/O

Processor
Memory
Cores

• Embedded systems employ a combination of


– application-specific h/w (boards, ASICs, FPGAs etc.)
• performance, low power
– s/w on prog. processors: DSPs, controllers etc.
• flexibility, complexity
– mechanical transducers and actuators
INTRODUCTION TO EMBEDDED COMPUTING AND
ARM PROCESSORS

We first need to understand how and why microprocessors are


used for control, user interface, signal processing, and many other
tasks.

The microprocessor has become so common that it is easy to


forget how hard some things are to do without it.
COMPLEX SYSTEMS AND MICROPROCESSORS

Embedded Computer System:

“It is any device that includes a programmable computer but


is not itself intended to be a general-purpose computer”.

Thus a PC is not itself an embedded computing system,


although PCs are often used to build embedded computing
systems.
But a fax machine or a clock built from a microprocessor is
an embedded computing system.
EX: Automobiles, cell phones, and even household
appliances
COMPLEX SYSTEMS AND MICROPROCESSORS contd…

Designers in many fields must be able to identify where microprocessors can be used,
design a hardware platform with I/O devices that can support the required tasks, and
implement software that performs the required processing.

Embedding Computers

Characteristics of Embedded Computing


Applications: output analog
Complex algorithms
User interface (Sophisticated functionality)
Real time input analog
Multirate CPU
Manufacturing cost
Power and energy
Finally, most embedded computing systems are mem
designed by small teams on tight deadlines. embedded
computer
COMPLEX SYSTEMS AND MICROPROCESSORS contd…

Why Use Microprocessors?


■ Microprocessors are a very efficient way to implement digital systems.
■ Microprocessors make it easier to design families of products that can be built
to provide various feature sets at different price points and can be extended to
provide new features to keep up with rapidly changing markets.
■ Microprocessors execute programs very efficiently.
Modern RISC processors can execute one instruction per clock cycle most
of the time, and high performance processors can execute several instructions per
cycle (MIPS).
■ Microprocessor manufacturers spend a great deal of money to make their CPUs
run very fast.
■ Microprocessors generally dominate new fabrication lines because they can be
manufactured in large volume and are guaranteed to command high prices.
■ Microprocessors are very efficient utilizers of logic. The generality of a
microprocessor and the need for a separate memory may suggest that
microprocessor-based designs are inherently much larger than custom logic
designs
COMPLEX SYSTEMS AND MICROPROCESSORS contd…

Why not use PCs for all embedded computing?

PCs are widely used and provide a very flexible programming


environment. Components of PCs are, in fact, used in many
embedded computing systems.

1. Real-time performance requirements often drive us to different


architectures.

2. Low power and low cost also drive us away from PC architectures
and toward multiprocessors. Personal computers are designed to
satisfy a broad mix of computing requirements and to be very
flexible. Those features increase the complexity and price of the
components. They also cause the processor and other components
to use more energy to perform a given function.
COMPLEX SYSTEMS AND MICROPROCESSORS contd…

Challenges in Embedded Computing System Design

How much hardware do we need?


How do we meet deadlines?
How do we minimize power consumption?
How do we design for upgradability?
Does it really work? (Reliability)
■ Complex testing
■ Limited observability and controllability
■ Restricted development environments
COMPLEX SYSTEMS AND MICROPROCESSORS contd…

Performance in Embedded Computing


In order to understand the real-time behavior of an embedded computing
system, we have to analyze the system at several different levels of abstraction

Those layers include:


■ CPU: The CPU clearly influences the behavior of the program,
particularly when the CPU is a pipelined processor with a cache.

■ Platform: The platform includes the bus and I/O devices. The
platform components that surround the CPU are responsible for
feeding the CPU and can dramatically affect its performance.

■ Program: Programs are very large and the CPU sees only a small
window of the program at a time. We must consider the structure
of the entire program to determine its overall behavior.
Contd….

■ Task: We generally run several programs simultaneously on a


CPU, creating a multitasking system. The tasks interact with each
other in ways that have profound implications for performance.

■ Multiprocessor: Many embedded systems have more than one


processor— they may include multiple programmable CPUs as well
as accelerators. Once again, the interaction between these
processors adds yet more complexity to the analysis of overall
system performance.
THE EMBEDDED SYSTEM DESIGN PROCESS
THE EMBEDDED SYSTEM DESIGN PROCESS
Embedded system design process aimed at two objectives:

1. It will give us an introduction to the various steps in embedded


system design before we delve into them in more detail.
2. It will allow us to consider the design methodology itself.

A design methodology is important for three reasons. ,


First, it allows us to keep a Scorecard on a design to ensure that we have
done everything we need to do, such as optimizing performance or performing
functional tests.
Second, it allows us to develop computer-aided design(CAD) tools.
Developing a single program that takes in a concept for an embedded system and
emits a completed design would be a daunting task, but by first breaking the
process into manageable steps, we can work on automating (or at least semi
automating) the steps one at a time.
Third, a design methodology makes it much easier for members of a
design team to communicate.
THE EMBEDDED SYSTEM DESIGN PROCESS Contd..
THE EMBEDDED SYSTEM DESIGN PROCESS Contd..

The major goals of the design:


■ manufacturing cost
■ performance (both overall speed and deadlines); and
■ power consumption.

We must also consider the tasks we need to perform at every step in


the design process. At each step in the design, we add detail:

■ We must analyze the design at each step to determine how we


can meet the specifications.
■ We must then refine the design to add detail.
■ And we must verify the design to ensure that it still meets all
system goals, such as cost, speed, and so on.
THE EMBEDDED SYSTEM DESIGN PROCESS Contd..

1. Requirements

-Creating the architecture and components.


-First, we gather an informal description from the customers known
as requirements, and we refine the requirements into a specification
that contains enough information to begin designing the system
architecture.

Requirements may be functional or nonfunctional


Typical nonfunctional requirements include:
■ Performance
■ Cost (manufacturing cost ; Nonrecurring engineering(NRE) costs)
■ Physical size and weight
■ Power consumption
THE EMBEDDED SYSTEM DESIGN PROCESS Contd..

Sample requirements form.


THE EMBEDDED SYSTEM DESIGN PROCESS Contd..

EX:
GPS Moving Map

What requirements might we have


for our GPS moving map?
THE EMBEDDED SYSTEM DESIGN PROCESS Contd..

Here is an initial list:


■ Functionality: This system is designed for highway driving and similar uses, not
nautical or aviation uses that require more specialized databases and functions.
The system should show major roads and other landmarks available in standard
topographic databases.
■ User interface: The screen should have at least 400600 pixel resolution. The
device should be controlled by no more than three buttons. A menu system
should pop up on the screen when buttons are pressed to allow the user to make
selections to control the system.
■ Performance: The map should scroll smoothly. Upon power-up, a display should
take no more than one second to appear, and the system should be able to verify
its position and display the current map within 15 s.
■ Cost: The selling cost (street price) of the unit should be no more than $100.
■ Physical size and weight: The device should fit comfortably in the palm of the
hand.
■ Power consumption: The device should run for at least eight hours on four AA
batteries.
THE EMBEDDED SYSTEM DESIGN PROCESS Contd..

Finally the Requirement form,


THE EMBEDDED SYSTEM DESIGN PROCESS Contd..
2.Specification

Specification serves as the contract between the customer


and the architects.

Specification is probably the least familiar phase of this


methodology for neophyte designers, but it is essential to creating
working systems with a minimum of designer effort.

The specification should be understandable enough so that


someone can verify that it meets system requirements and overall
expectations of the customer.
THE EMBEDDED SYSTEM DESIGN PROCESS Contd..

A specification of the GPS system would include several components:

■ Data received from the GPS satellite constellation.


■ Map data.
■ User interface.
■ Operations that must be performed to satisfy customer requests.
■ Background actions required to keep the system running, such as
operating the GPS receiver.
THE EMBEDDED SYSTEM DESIGN PROCESS Contd..
3. Architecture Design
The specification does not say how the system does things, only
what the system does. Describing how the system implements those
functions is the purpose of the architecture.

The architecture is a plan for the overall structure of the system that
will be used later to design the components that make up the
architecture.

The creation of the architecture is the first phase of what many


designers think of as design.
THE EMBEDDED SYSTEM DESIGN PROCESS Contd..
3. Architecture Design

Figure below shows a sample system architecture in the form of a block


diagram that shows major operations and data flows among them.

Architectural descriptions must be designed to satisfy both functional and


nonfunctional requirements. Not only must all the required functions be
present, but we must meet cost, speed, power,and other nonfunctional
constraints.
Starting out with a system architecture and refining that to hardware and
software architectures,
THE EMBEDDED SYSTEM DESIGN PROCESS Contd..
3. Architecture Design

Hardware

Software
THE EMBEDDED SYSTEM DESIGN PROCESS Contd..
4. Designing Hardware and Software Components

The components will in general include both hardware—FPGAs, boards, and so on and
software modules.

Some of the components will be ready-made.

You will have to design some components yourself.


THE EMBEDDED SYSTEM DESIGN PROCESS Contd..
5. System Integration
Bugs are typically found during system integration, and good planning can help us find
the bugs quickly.

By building up the system in phases and running properly chosen tests, we can often find
bugs more easily.

If we debug only a few modules at a time, we are more likely to uncover the simple bugs
and able to easily recognize them.

System integration is difficult because it usually uncovers problems. It is often hard to


observe the system in sufficient detail to determine exactly what is wrong— the
debugging facilities for embedded systems are usually much more limited than what you
would find on desktop systems. As a result, determining why things do not stet work
correctly and how they can be fixed is a challenge in itself.

Careful attention to inserting appropriate debugging facilities during design can help
ease system integration problems, but the nature of embedded computing means that
this phase will always be a challenge.
FORMALISMS FOR SYSTEM DESIGN
(STRUCTURAL DESCRIPTION & BEHAVIORAL DESCRIPTION)

It is often helpful to conceptualize these tasks( Top-Down Design process) in diagrams.


Luckily, there is a visual language that can be used to capture all these design tasks: the
Unified Modeling Language(UML)

UML is an object-oriented modeling language


UML was designed to be useful at many levels of abstraction in the design process.

UML is useful because it encourages design by successive refinement and


progressively adding detail to the design, rather than rethinking the design at each new
level of abstraction.

Object-Oriented design emphasizes two concepts of importance:


■ It encourages the design to be described as a number of interacting objects, rather than
a few large monolithic blocks of code.
■ At least some of those objects will correspond to real pieces of software or hardware in
the system. We can also use UML to model the outside world that interacts with our
system, in which case the objects may correspond to people or other machines. It is
sometimes important to implement something we think of at a high level as a single object
using several distinct pieces of code or to otherwise break up the object correspondence
in the implementation.
The principal component of an object-oriented design is, naturally enough, the object.
An object includes a set of attributes that define its internal state.
When implemented in a programming language, these attributes usually become variables
or constants held in a data structure.

An object describing a display (such as a CRT screen) is shown in UML notation in Figure

The text in the folded-corner page icon


is a note; it does not correspond to an
object in the system and only serves as a
comment.

The name is underlined to show that


this is a description of an object and not
of a class.

A class is a form of type definition—all objects derived from the same class have the
same characteristics, although their attributes may have different values.
A class defines the attributes that an object may have.
It also defines the operations that determine how the object interacts with the rest of
the world.
There are several types of relationships that can exist between objects and classes:

■ Association occurs between objects that communicate with each other but have no
ownership relationship between them.

■ Aggregation describes a complex object made of smaller objects.

■ Composition is a type of aggregation in which the owner does not allow access to the
component objects.

■ Generalization allows us to define one class in terms of another.


so far what we have seen is STRUCTURAL DESCRIPTION of UML

Next is BEHAVIORAL DESCRIPTION


To specify the behavior of an operation is by state machine.
These state machines will not rely on the operation of a clock, as in hardware; rather,
changes from one state to another are triggered by the occurrence of events.

3 types of Events defined by UML:

1. A signal is an asynchronous
occurrence. It is defined in UML by
an object that is labeled as a
<<signal>>.
2. A call event follows the model of a
procedure call in a programming
language.

3. A time-out event causes the machine


to leave a state after a certain amount
of time. The label tm(time-value) on
the edge gives the amount of time after
which the transition occurs.
Design example: Model train controller

Learning of UML through Model Train Controller


Model train setup

rcvr motor

power
supply

console

ECC command address header


Console
Model train controller -

The user sends messages to the train with a control box attached to the tracks.

The control box may have familiar controls such as a throttle, emergency stop button, and so
on.

Since the train receives its electrical power from the two rails of the track, the control box
can send signals to the train over the tracks by modulating the power supply voltage.

The control panel sends packets over the tracks to the receiver on the train.

The train includes analog electronics to sense the bits being transmitted and a control system
to set the train motor’s speed and direction based on those commands.

Each packet includes an address so that the console can control several trains on the same
track; the packet also includes an error correction code (ECC) to guard against transmission
errors.

This is a one-way communication system—the model train cannot send commands back to the
user.

For design, we start with requirement first


1.Requirements
Here is a basic set of requirements for the system:
■ The console shall be able to control up to eight trains on a single track.
■ The speed of each train shall be controllable by a throttle to at least 63 different
levels in each direction (forward and reverse).
■There shall be an inertia control that shall allow the user to adjust the responsiveness
of the train to commanded changes in speed.
■ There shall be an emergency stop button.
■ An error detection scheme will be used to transmit messages.

We can put the requirements into our chart format:


2. Specifications
The Digital Command Control (DCC) standard
was created by the National Model Railroad Association to support interoperable digitally-
controlled model trains.

DCC was created to provide a standard that could be built by any manufacturer so that
hobbyists could mix and match components from multiple vendors.

The DCC standard is given in two documents:

■ Standard S-9.1, the DCC Electrical Standard, defines how bits are encoded on
the rails for transmission.
■ Standard S-9.2, the DCC Communication Standard, defines the packets that
carry information.

Any DCC-conforming device must meet these specifications.

The DCC standard does not specify many aspects of a DCC train system. It doesn’t
define the control panel, the type of microprocessor used, the programming language
to be used, or many other aspects of a real model train system. The standard
concentrates on those aspects of system design that are necessary for interoperability.
Basic system commands

command name parameters

set-speed speed
(positive/negative)
set-inertia inertia-value (non-
negative)
estop none
Typical control sequence

:console :train_rcvr
set-inertia
set-speed

set-speed

estop

set-speed
Conceptual Specification
Digital Command Control specifies some important aspects of the system,
particularly those that allow equipment to interoperate. But DCC deliberately does
not specify everything about a model train control system

There are clearly two major subsystems:


The command unit and the train-board
component as shown in Fig1.

Fig1: Class diagram for the train controller messages.

Fig2: UML collaboration diagram for major subsystems of the train controller system
Fig: A UML class diagram for the train controller showing the composition of the subsystems
Console physical object classes

knobs* pulser*

train-knob: integer pulse-width: unsigned-


speed-knob: integer integer
inertia-knob: unsigned- direction: boolean
integer
emergency-stop: boolean

sender* detector*

send-bit() read-bit() : integer


Panel and motor interface classes

panel motor-interface

speed: integer
train-number() : integer
speed() : integer
inertia() : integer
estop() : boolean
new-settings()
Transmitter and receiver classes

transmitter receiver

current: command
new: boolean
send-speed(adrs: integer,
speed: integer)
send-inertia(adrs: integer, read-cmd()
val: integer) new-cmd() : boolean
set-estop(adrs: integer) rcv-type(msg-type:
command)
rcv-speed(val: integer)
rcv-inertia(val:integer)
Formatter class
Formatter class holds state for
each train, setting for current
formatter train.
The operate() operation
performs the basic formatting
current-train: integer task.
current-speed[ntrains]: integer
current-inertia[ntrains]:
unsigned-integer
current-estop[ntrains]: boolean

send-command()
panel-active() : boolean
operate()
Control input sequence diagram
:knobs :panel :formatter :transmitter
change in read panel
control panel-active
change in speed/

settings
inertia/estop

panel settings send-command


read panel
send-speed,
send-inertia.
panel settings
send-estop
read panel
change in
train number

train panel settings


change in

number
new-settings

set-knobs
Formatter operate behavior

update-panel()

panel-active() new train number

idle

send-command()
other
Panel-active behavior

T
current-train = train-knob
panel*:read-train() update-screen
changed = true
F

T
panel*:read-speed() current-speed = throttle
changed = true

F
... ...
Instruction sets preliminaries
In this topic, we begin our study of microprocessors by studying instruction sets—”The
programmer’s interface to the hardware”

The instruction set is the key to analyzing the performance of programs.


By understanding the types of instructions that the CPU provides, we gain insight into
alternative ways to implement a particular function.

Computer Architecture Taxonomy

A Harvard architecture.
A von Neumann architecture computer.
Which Architecture is Best Suited for
µp and DSP?
Von Neumann Architecture Harvard Architecture

Stored program
concept (store
program code along
with data)
Computer Architecture Contd…

The CPU has several internal registers that store values used
internally. One of those registers is the program counter
(PC),which holds the address in memory of an instruction. The
CPU fetches the instruction from memory, decodes the
instruction, and executes it.

Processing signals in real-time places great strains on the data


access system in two ways:
First, large amounts of data flow through the CPU; and
second, that data must be processed at precise intervals, not
just when the CPU gets around to it. Data sets that arrive
continuously and periodically are called streaming data.
Computer Architecture in terms of Instruction Set

• Complex instruction set computer (CISC):


– many addressing modes;
– many operations.
– Different instruction formats of varying lengths.
• Reduced instruction set computer (RISC):
- load/store; (data operands must first be loaded into the
CPU and then stored back to main memory to save the
results.)
- Fewer and simpler instructions
– pipelinable instructions.
• Complex Instruction Set Computers(CISC) • Reduced Instruction Set
– Single instruction procedure entries Computers(RISC)
and exits – Pipeline execution
– Variable length instruction sets with • Starting a second instruction
many formats before the first one has
– Complex sequence of operations over finished
many clock cycles – A fixed (32 bit) instruction size
– Processors based on CISC were sold on with few formats.
the sophistication and number of their – A load-store architecture where
addressing modes, data types, etc instructions that process data
– Developed in the 1970’s when operate only on registers and are
computers had slow main memory so separate from instructions that
processors were controlled by faster access memory
ROMs – A large register bank of 32-bit
– Frequently used operations are drawn registers, all of which can be used
from ROM as microcode sequences for any purpose, to allow the
rather than having instructions pulled load-store architecture to operate
from main memory efficiently
– Hard-wired instruction decode
logic
– Single-cycle execution
RISC Architecture
Advantages/Disadvantages

• Advantages  Disadvantages
• A smaller die size  Poor code density compared with
CISC’s
– A simpler processor requires
 Doesn’t execute x86 code
fewer transistors and less
silicon area.
• A shorter development time
– Less design effort and
therefore a lower cost
• A higher performance
– Simpler instructions are
executed faster.
Instruction set characteristics
• Fixed vs variable length.
• Addressing modes.
• Number of operands.
• Types of operations supported.
Programming model
• Programming model: Registers visible to the
programmer.
• Some registers are not visible (IR).
ARM – What is it?
• ARM stands for Advanced RISC Machines

• An ARM processor is basically any 16/32bit


microprocessor designed and licensed by ARM Ltd, a
microprocessor design company headquartered in
England, founded in 1990 by Herman Hauser

• A characteristic feature of ARM processors is their low


electric power consumption, which makes them
particularly suitable for use in portable devices.

• It is one of the most used processors currently on the


market
Examples of ARM Based Products

The Toshiba 46HM94 46-inch The Nano IPod Samsung S3FJ9SK Smartcard IC
Television
History of ARM

• Acorn Computers: a British computer company founded in Cambridge,


England, in 1978, by Hermann Hauser and Chris Curry. The company
produced a number of computers which were especially popular in the
UK. These included the Acorn Electron, the BBC Micro and the Acorn
Archimedes. Acorn's BBC Micro computer dominated the UK educational
computer market during the 1980s and early 1990s.
• VLSI Technology, Inc. produced the first ARM processor based on Acorn
designs.
• ARM based PCs did not sell well, Acorn acquired by Olivetti in 1985
• ARM contracted to develop for Apple for the Apple Newton Handheld
built by VLSI.
• The company was broken up into several independent operations in 2000,
one of which, notably, was ARM Holdings
• ARM holdings primary business model is to license its RISC based designs
to other manufactures.
ARM Features

• The ARM7 is a low-power, general purpose 32-bit


RISC microprocessor macrocell (32-bit data &
address bus) for use in application or customer-
specific integrated circuts (ASICs or CSICs).
• Its simple, elegant and fully static design is
particularly suitable for cost and power-sensitive
applications.
• The ARM7’s small die size makes it ideal for
integrating into a larger custom chip that could also
contain RAM, ROM, logic, DSP and other cells.
ARM Features contd…

• Big and Little Endian (with the lowest-order byte residing


in the low-order bits of the word) operating modes
• High performance RISC 17 MIPS(million instruction per
second) sustained @ 25 MHz (25 MIPS peak) @ 3V
• Low power consumption 0.6mA/MHz @ 3V fabricated in .8µm
CMOS
• Fast interrupt response for real-time applications
• Virtual Memory System Support
• Excellent high-level language support
• Simple but powerful instruction set
ARM Architecture
• RISC features incorporated by ARM
– A load-store Architecture
– Fixed-length 32-bit instructions
– 3-address instruction formats
• RISC features not incorporated into ARM
– Pipelining
• Delayed branches
– Single-cycle execution of all instructions

ARM7 is a von Neumann architecture machine,


while ARM9 uses a Harvard architecture.
INSTRUCTION FORMAT
The Registers of ARM (Programmers model)

• ARM has 37 registers all of which are 32-


bits long.
–1 dedicated program counter
–1 dedicated current program status
register
–5 dedicated saved program status
registers
–30 general purpose registers
• The current processor mode governs which of
several banks is accessible. Each mode can
access
• a particular set of r0-r12 registers
• a particular r13 (the stack pointer, sp) and
r14 (the link register)
• the program counter, r15 (pc)
• the current program status register, CPSR
Privileged modes (except System) can also
access
– a particular SPSR (saved program status
register)
• Visible Registers
– User
Addressable
– System
Addressable
ARM Architecture
Instruction Set Foundation
• Current Program Status Register
– Used in user-level programs to store the condition code bits.
• N: Negative; the last ALU operation which changed the flags
produced a negative result
• Z: Zero; the last ALU operation which changed the flags
produced a zero result
• C: Carry; the last ALU operation which changed the flags
generated a carry-out.
• V: Overflow; the last arithmetic ALU operation which
changed the flags generated an overflow into the sign bit.
ARM Organization and Implementation
• 3-stage pipeline organization
– Principal components
• The register bank
• The barrel shifter
– Can shift or rotate one operand by any number of bits
• The ALU
• The address register and incrementer
– Select and hold all memory addresses and generate
sequential addresses
• The data registers( holds data passing to and from memory)
• The instruction decoder and associated control logic (refer
next slide)
Process Instruction Flow
Process Instruction Flow
•In a single-cycle data processing instruction, two register operands
are accessed.
• The value on the B bus is shifted and combined with the value on
the A bus in the ALU, then the result is written back into the
register bank.
•The program counter value is in the address register, from where
it is fed into the incrementer, then the incremented value is copied
back into r15(PC) in the register bank and also into the address
register to be used as the address for the next instruction fetch
ARM Organization and Implementation
• ARM processors employ a simple 3-stage pipeline with the following
pipeline stages
– Fetch
• The instruction is fetched from memory and placed in the
instruction pipeline
– Decode
• The instruction is decoded and the data path control signals
prepared for the next cycle. In this stage the instruction ‘owns’
the decode logic but not the data path
– Execute
• The instruction ‘owns’ the data path; the register bank is read, an
operand shifted, the ALU result generated and written back into a
destination register
Summary
• The ARM processor has a rich history both in academia and in the
commercial space. It uses innovative architectural design to achieve
high performance with low power consumption.
• It is highly utilized in mobile and embedded devices due to its
power characteristics and is one of the most populous processors
currently used.
• It utilizes the RISC instruction set to achieve this performance. It also
uses a variety of organizational designs such as pipelining, in
addition to the instruction set.
• The ARM processor is a robust development platform that will be in
use for many years to come.
ARM Processor Core
 Current low-end ARM core for applications like
digital mobile phones
 TDMI
 T: Thumb, 16-bit instruction set
 D: on-chip Debug support, enabling the processor to
halt in response to a debug request
 M: enhanced Multiplier, yield a full 64-bit result, high
performance
 I: EmbeddedICE hardware
 Von Neumann architecture
 3-stage pipeline
The Thumb instruction set
• The Thumb instruction set is a subset of the most
commonly used 32-bit ARM instructions.
• Thumb instructions are each 16 bits long, and have a
corresponding 32-bit ARM instruction that has the
same effect on the processor model.
• Thumb instructions operate with the standard ARM
register configuration, allowing excellent
interoperability between ARM and Thumb states.
• On execution, 16-bit Thumb instructions are
transparently decompressed to full 32-bit ARM
instructions in real time, without performance loss.
DIFFERENT STATES
• When the processor is executing in ARM state:
– All instructions are 32 bits wide
– All instructions must be word aligned

• When the processor is executing in Thumb state:


– All instructions are 16 bits wide
– All instructions must be halfword aligned

• When the processor is executing in Jazelle state:


– All instructions are 8 bits wide
– Processor performs a word access to read 4
instructions at once
82
CPUs
• Input and output.
• Supervisor mode, exceptions, traps.
• Co-processors.
Input and Output Devices
• Input and output devices usually have some
analog or Non-Electronic Component
• For instance, a disk drive has a rotating disk and
analog read/write electronics.
• But the digital logic in the device that is most
closely connected to the CPU very strongly
resembles the logic you would expect in any
computer system.
Input and Output Devices

status
reg

mechanism
CPU

data
reg
• Devices typically have several registers:
• ■ Data registers hold values that are treated
as data by the device, such as the data read or
written by a disk.
• ■ Status registers provide information about
the device’s operation, such as whether the
current transaction has completed.
I/O Application: 8251 UART
• Universal asynchronous receiver transmitter
(UART) : provides serial communication.
• 8251 functions are integrated into standard PC
interface chip.
• Allows many communication parameters to be
programmed.
Serial communication

• Characters are transmitted separately:

no
char

start ... stop


bit 0 bit 1 bit n-1

time
Serial communication parameters
• Baud (bit) rate.
• Number of bits per character (5 to 8).
• Parity/no parity.
• Even/odd parity.
• Length of stop bit (1, 1.5, 2 bits).
8251 CPU interface
• The UART includes one 8-bit register that buffers
characters between the UART and the CPU bus.

• The Transmitter Ready output indicates that the


transmitter is ready to accept a data character; the
Transmitter Empty signal goes high when the UART
has no characters to send.

• On the receiver side, the Receiver Ready pin goes high


when the UART has a character ready to be read by
the CPU.
8251 CPU interface

status
(8 bit)

CPU xmit/ serial


8251 rcv port

data
(8 bit)
Programming I/O devices
• Two types of instructions can support I/O:
– special-purpose I/O instructions;
– memory-mapped load/store instructions.

• Intel x86 provides in, out instructions. Most


other CPUs use memory-mapped I/O.

• But ARM…………………….. ?
Programming I/O devices contd…
1.ARM memory-mapped I/O
(Programs using normal R/W instructions to
communicate with the devices)
• Example
• Define location for device:
DEV1 EQU 0x1000
• Read/write code:
LDR r1,#09 ; set up device address
LDR r0,[r1] ; read DEV1
LDR r0,#8 ; set up value to write
STR r0,[r1] ; write value to device
Programming I/O devices contd…
2.Poke and Peek (as like push and pop)
• To write I/O devices through High Level Language
– Done through pointers, since C compiler hides
variables address from us

• Traditional HLL interfaces:


int peek(char *location) {
return *location; }

void poke(char *location, char newval)


{
(*location) = newval; }
Programming I/O devices contd…
3.Busy/wait output
• Simplest way to program device.
– Use instructions to test when device is ready.
current_char = mystring;
while (*current_char != ‘\0’) {
poke(OUT_CHAR,*current_char);
while (peek(OUT_STATUS) != 0);
current_char++;
}
INTERRUPT
INTERRUPTS
Interrupts in ARM
ARM7 supports two types of interrupts:
1.Fast interrupt requests (FIQs) and
2. Interrupt requests (IRQs).
An FIQ takes priority over an IRQ. The interrupt table is always kept in the
bottom memory addresses, starting at location 0.
The entries in the table typically contain subroutine calls to the appropriate
handler.
The ARM7 performs the following steps when responding to an interrupt
■ saves the appropriate value of the PC to be used to return,
■ copies the CPSR into a saved program status register (SPSR),
■ forces bits in the CPSR to note the interrupt, and
■ forces the PC to the appropriate interrupt vector.
When leaving the interrupt handler, the handler should:
■ restore the proper PC value,
■ restore the CPSR from the SPSR, and
■ clear interrupt disable flags.
ARM interrupt latency

• Worst-case latency to respond to interrupt is


27 cycles:
– Two cycles to synchronize external request.
– Up to 20 cycles to complete current instruction.
– Three cycles for data abort.
– Two cycles to enter interrupt handling state.
Generic interrupt mechanism
continue intr?
execution N Assume priority selection is
Y handled before this
point.
N intr priority >
ignore current
priority?
Y

ack
Y
Y N
bus error timeout? vector?
Y

call table[vector]
Supervisor mode
• Complex systems are often implemented as several
programs that communicate with each other. These
programs may run under the command of an operating
system. It may be desirable to provide hardware checks
to ensure that the programs do not interfere with each
other.
• For example,
• By erroneously writing into a segment of memory used
by another program.
• In such cases it is often useful to have a supervisor
mode provided by the CPU.
• Normal programs run in user mode.
• The supervisor mode has privileges that user modes do
not.
For example, The Memory Management Unit (MMU)
systems allow the addresses of memory locations to be
changed dynamically.
Control of the memory management unit (MMU) is
typically reserved for supervisor mode to avoid the
obvious problems that could occur when program bugs
cause inadvertent changes in the memory management
registers.

The ARM instruction that puts the CPU in supervisor


mode is called SWI:
i.e, SWI CODE_1
Supervisor mode Contd….

• SWI causes the CPU to go into supervisor mode and sets


the PC to 0x08 or 08H.
• The argument to SWI is a 24-bit immediate value that is
passed on to the supervisor mode code; it allows the
program to request various services from the supervisor
mode.

• In supervisor mode, the bottom 5 bits of the CPSR are


all set to 1 to indicate that the CPU is in supervisor
mode.
• The old value of the CPSR just before the SWI is
stored in a register called the saved program status
register (SPSR).

• Several SPSRs for different modes; the supervisor


mode SPSR is referred to as SPSR_svc.

• To return from supervisor mode , the supervisor


restores the PC from register r14

• and restores the CPSR from the SPSR_svc.


Exceptions
An exception is an internally detected error.

A simple example is division by zero.


One way to handle this problem would be to check every
divisor before division to be sure it is not zero, but this
would both substantially increase the size of numerical
programs and cost a great deal of CPU time evaluating
the divisor’s value.
• The CPU can more efficiently check the divisor’s value
during execution.
• Since the time at which a zero divisor will be found is not
known in advance, this event is similar to an interrupt
except that it is generated inside the CPU.

• The exception mechanism provides a way for the program


to react to such unexpected events.
• Just as interrupts can be seen as an extension of the
subroutine mechanism, exceptions are generally
implemented as a variation of an interrupt.
Exceptions Contd….
• Since both deal with changes in the flow of control of a program,
it makes sense to use similar mechanisms. However, exceptions
are generated internally.
• Exceptions in general require both prioritization and vectoring.
Exceptions must be prioritized because a single operation may
generate more than one exception.
• for example, an illegal operand and an illegal memory access.

• The priority of exceptions is usually fixed by the CPU architecture.

• Vectoring provides a way for the user to specify the handler for the
exception condition.

• The vector number for an exception is usually predefined by the


architecture ; it is used to index into a table of exception handlers.
ARM’s Exceptions (1/6)
• Exceptions arise whenever the normal flow
of a program has to be halted temporarily
– For example to service an interrupt from a peripheral.

• ARM supports 7 types of exception and has


a privileged processor mode for each type
of exception.
• ARM Exception vectors
Address Exception Mode in Entry
0x00000000 Reset Supervisor
0x00000004 Undefined instruction Undefined
0x00000008 Software Interrupt Supervisor

• ` 0x0000000C
0x00000010
Abort (prefetch)
Abort (data)
Abort
Abort
0x00000014 Reserved Reserved
0x00000018 IRQ IRQ
0x0000001C FIQ FIQ
ARM’s Exceptions (2/6)
• When handling an exception, the ARM7TDMI:
Preserves the address of the next instruction in the
appropriate Link Register
Copies the CPSR into the appropriate SPSR
Forces the CPSR mode bits to a value which depends
on the exception
Forces the PC to fetch the next instruction from the
relevant exception vector
It may also set the interrupt disable flags to prevent
otherwise unmanageable nestings of exceptions.
If the processor is in THUMB state when an exception
occurs, it will automatically switch into ARM state
ARM’s Exceptions (3/6)
• On completion, the exception handler:
– Moves the Link Register, minus an offset where
appropriate, to the PC. (The offset will vary
depending on the type of exception.)
– Copies the SPSR back to the CPSR
– Clears the interrupt disable flags, if they were
set on entry
ARM’s Exceptions (4/6)
• Reset
– When the processor’s Reset input is asserted
• CPSR  Supervisor + I + F
• PC  0x00000000

• Undefined Instruction
– If an attempt is made to execute an instruction that is undefined
• LR_undef  Undefined Instruction Address + #4
• PC  0x00000004, CPSR  Undefined + I
• Return with : MOVS pc, lr

• Prefetch Abort
– Instruction fetch memory abort, invalid fetched instruction
• LR_abt  Aborted Instruction Address + #4, SPSR_abt  CPSR
• PC  0x0000000C, CPSR  Abort + I
• Return with : SUBS pc, lr, #4
ARM’s Exceptions (5/6)
• Data Abort
– Data access memory abort, invalid data
• LR_abt  Aborted Instruction + #8, SPSR_abt 
CPSR
• PC  0x00000010, CPSR  Abort + I
• Return with : SUBS pc, lr, #4 or SUBS pc, lr, #8

• Software Interrupt
– Enters Supervisor mode
• LR_svc  SWI Address + #4, SPSR_svc  CPSR
• PC  0x00000008, CPSR  Supervisor + I
• Return with : MOV pc, lr
ARM’s Exceptions (6/6)
• Interrupt Request
– Externally generated by asserting the processor’s IRQ input
• LR_irq  PC - #4, SPSR_irq  CPSR
• PC  0x00000018, CPSR  Interrupt + I
• Return with : SUBS pc, lr, #4

• Fast Interrupt Request(FIQ)


– Externally generated by asserting the processor’s FIQ input
• LR_fiq  PC - #4, SPSR_fiq  CPSR
• PC  0x0000001C, CPSR  Fast Interrupt + I + F
• Return with : SUBS pc, lr, #4
• Handler @0x1C speeds up the response time
Traps
• A Trap, also known as a software interrupt, is an
instruction that explicitly generates an exception
condition.
• The most common use of a trap is to enter supervisor
mode.
• The entry into supervisor mode must be controlled to
maintain security—if the interface between user and
supervisor mode is improperly designed , a user
program may be able to sneak code into the supervisor
mode that could be executed to perform harmful
operations.

• The ARM provides the SWI interrupt for software


interrupts. This instruction causes the CPU to enter
Co-processor
CPU architects often want to provide flexibility in what features
are implemented in the CPU. One way to provide such flexibility
at the instruction set level is to allow co-processors, which are
attached to the CPU and implement some of the instructions.

• EX:
– Floating-point units are often structured as co-
processors.
• ARM allows up to 16 designer-selected co-
processors.
• The unit occupies two co-processor units in the
ARM architecture, numbered 1 and 2, but it
appears as a single unit to the programmer .
Co-processor contd….

To support co-processors, certain opcodes must be reserved in


the instruction set for co-processor operations.

Co-processor instructions can load and store co-processor


registers or can perform internal operations.

A CPU may, of course, receive co-processor instructions even when


there is no coprocessor attached. Most architectures use illegal
instruction traps to handle these situations.
CPUs
• Caches.
• Memory management.
Caches and CPUs

address data

cache

controller
cache main
CPU
memory
address

data data
Cache definition : The Cache Memory is the volatile
computer memory which is very nearest to the CPU so also called
CPU memory, all the Recent Instructions are Stored into the Cache
Memory.
It is the fastest memory that provides high-speed data access to a
computer microprocessor.
Cache operation
• Many main memory locations are mapped
onto one cache entry.
• May have caches for:
– instructions;
– data;
– data + instructions (unified).
• Memory access time is no longer
deterministic.
Terms
• Cache hit: required location is in cache.
• Cache miss: required location is not in cache.
• Working set: set of locations used by program
in a time interval.
Types of misses
• Compulsory (cold): location has never been
accessed.
• Capacity: working set is too large.
• Conflict: multiple locations in working set map
to same cache entry.
Memory system performance
• h = cache hit rate.
• tcache = cache access time, tmain = main memory
access time.
• Average memory access time:
– tav = htcache + (1-h)tmain
Multiple levels of cache

CPU L1 cache L2 cache


Multi-level cache access time
• h1 = cache hit rate.
• h2 = hit rate on L2.
• Average memory access time:
– tav = h1tL1 + (h2-h1)tL2 + (1- h2-h1)tmain
Replacement policies
• Replacement policy: strategy for choosing
which cache entry to throw out to make room
for a new memory location.
• Two popular strategies:
– Random.
– Least-recently used (LRU).
Cache organizations
• Fully-associative: any memory location can be
stored anywhere in the cache (almost never
implemented).
• Direct-mapped: each memory location maps
onto exactly one cache entry.
• N-way set-associative: each memory location
can go into one of n sets.
Cache performance benefits
• Keep frequently-accessed locations in fast
cache.
• Cache retrieves more than one word at a time.
– Sequential accesses are faster after first access.
Direct-mapped cache

1 0xabcd byte byte byte ...

valid tag data

cache block

tag index offset

hit value
byte
Write operations
• Write-through: immediately copy write to
main memory.
• Write-back: write to main memory only when
location is removed from cache.
Direct-mapped cache locations
• Many locations map onto the same cache
block.
• Conflict misses are easy to generate:
– Array a[] uses locations 0, 1, 2, …
– Array b[] uses locations 1024, 1025, 1026, …
– Operation a[i] + b[i] generates conflict misses.
Set-associative cache

• A set of direct-mapped caches:

Set 1 Set 2 ... Set n

hit data
Example: direct-mapped vs. set-
associative
address data
000 0101
001 1111
010 0000
011 0110
100 1000
101 0001
110 1010
111 0100
Direct-mapped cache behavior
• After 001 access: • After 010 access:
block tag data block tag data
00 - - 00 - -
01 0 1111 01 0 1111
10 - - 10 0 0000
11 - - 11 - -
Direct-mapped cache behavior, cont’d.
• After 011 access: • After 100 access:
block tag data block tag data
00 - - 00 1 1000
01 0 1111 01 0 1111
10 0 0000 10 0 0000
11 0 0110 11 0 0110
Direct-mapped cache behavior, cont’d.
• After 101 access: • After 111 access:
block tag data block tag data
00 1 1000 00 1 1000
01 1 0001 01 1 0001
10 0 0000 10 0 0000
11 0 0110 11 1 0100
2-way set-associtive cache behavior
• Final state of cache (twice as big as direct-
mapped):
set blk 0 tag blk 0 data blk 1 tag blk 1 data
00 1 1000 - -
01 0 1111 1 0001
10 0 0000 - -
11 0 0110 1 0100
2-way set-associative cache behavior
• Final state of cache (same size as direct-
mapped):
set blk 0 tag blk 0 data blk 1 tag blk 1 data
0 01 0000 10 1000
1 10 0111 11 0100
Example caches
• StrongARM:
– 16 Kbyte, 32-way, 32-byte block instruction cache.
– 16 Kbyte, 32-way, 32-byte block data cache (write-
back).
• C55x:
– Various models have 16KB, 24KB cache.
– Can be used as scratch pad memory.
Scratch pad memories
• Alternative to cache:
– Software determines what is stored in scratch
pad.
• Provides predictable behavior at the cost of
software control.
• C55x cache can be configured as scratch pad.
Memory management units

• Memory management unit (MMU) translates


addresses:

logical physical
address address
memory
main
CPU management
memory
unit
Memory management tasks
• Allows programs to move in physical memory
during execution.
• Allows virtual memory:
– memory images kept in secondary storage;
– images returned to main memory on demand
during execution.
• Page fault: request for location not resident in
memory.
Address translation
• Requires some sort of register/table to allow
arbitrary mappings of logical to physical
addresses.
• Two basic schemes:
– segmented;
– paged.
• Segmentation and paging can be combined
(x86).
Segments and pages

page 1
page 2
segment 1

memory

segment 2
Segment address translation

segment base address logical address

segment lower bound range


range error
segment upper bound check

physical address
Page address translation

page offset

page i base

concatenate

page offset
Page table organizations

page
descriptor
page descriptor

flat tree
Caching address translations
• Large translation tables require main memory
access.
• TLB: cache for address translation.
– Typically small.
ARM memory management
• Memory region types:
– section: 1 Mbyte block;
– large page: 64 kbytes;
– small page: 4 kbytes.
• An address is marked as section-mapped or
page-mapped.
• Two-level translation scheme.
ARM address translation
Translation table 1st index 2nd index offset
base register

descriptor concatenate
1st level table

concatenate

descriptor
2nd level table
physical address
CPUs
• CPU performance
• CPU power consumption.
Elements of CPU performance
• Cycle time.
• CPU pipeline.
• Memory system.
Pipelining
• Several instructions are executed
simultaneously at different stages of
completion.
• Various conditions can cause pipeline bubbles
that reduce utilization:
– branches;
– memory system delays;
– etc.
Performance measures
• Latency: time it takes for an instruction to get
through the pipeline.
• Throughput: number of instructions executed
per time period.
• Pipelining increases throughput without
reducing latency.
ARM7 pipeline
• ARM 7 has 3-stage pipe:
– fetch instruction from memory;
– decode opcode and operands;
– execute.
ARM pipeline execution

fetch decode execute add r0,r1,#5

sub r2,r3,r6 fetch decode execute

cmp r2,#3 fetch decode execute

time
1 2 3
Pipeline stalls
• If every step cannot be completed in the same
amount of time, pipeline stalls.
• Bubbles introduced by stall increase latency,
reduce throughput.
ARM multi-cycle LDMIA instruction

ldmia fetch decode ex ld r2ex ld r3


r0,{r2,r3}

sub fetch decode ex sub


r2,r3,r6

cmp fetch decode ex cmp


r2,#3

time
Control stalls
• Branches often introduce stalls (branch
penalty).
– Stall time may depend on whether branch is
taken.
• May have to squash instructions that already
started executing.
• Don’t know what to fetch until condition is
evaluated.
ARM pipelined branch

bne foo fetch decode ex bne ex bne ex bne

sub fetch decode


r2,r3,r6

foo add fetch decode ex add


r0,r1,r2

time
Delayed branch
• To increase pipeline efficiency, delayed branch
mechanism requires n instructions after
branch always executed whether branch is
executed or not.
Memory system performance
• Caches introduce indeterminacy in execution
time.
– Depends on order of execution.
• Cache miss penalty: added time due to a
cache miss.
Types of cache misses
• Compulsory miss: location has not been
referenced before.
• Conflict miss: two locations are fighting for
the same block.
• Capacity miss: working set is too large.
CPU power consumption
• Most modern CPUs are designed with power
consumption in mind to some degree.
• Power vs. energy:
– heat depends on power consumption;
– battery life depends on energy consumption.
CMOS power consumption
• Voltage drops: power consumption
proportional to V2.
• Toggling: more activity means more power.
• Leakage: basic circuit characteristics; can be
eliminated by disconnecting power.
CPU power-saving strategies
• Reduce power supply voltage.
• Run at lower clock frequency.
• Disable function units with control signals
when not in use.
• Disconnect parts from power supply when not
in use.
Power management styles
• Static power management: does not depend
on CPU activity.
– Example: user-activated power-down mode.
• Dynamic power management: based on CPU
activity.
– Example: disabling off function units.
Application: PowerPC 603 energy
features
• Provides doze, nap, sleep modes.
• Dynamic power management features:
– Uses static logic.
– Can shut down unused execution units.
– Cache organized into subarrays to minimize
amount of active circuitry.
PowerPC 603 activity
• Percentage of time units are idle for SPEC
integer/floating-point:
unit Specint92 Specfp92
D cache 29% 28%
I cache 29% 17%
load/store 35% 17%
fixed-point 38% 76%
floating-point 99% 30%
system register 89% 97%
Power-down costs
• Going into a power-down mode costs:
– time;
– energy.
• Must determine if going into mode is
worthwhile.
• Can model CPU power states with power state
machine.
Application: StrongARM SA-1100
power saving
• Processor takes two supplies:
– VDD is main 3.3V supply.
– VDDX is 1.5V.
• Three power modes:
– Run: normal operation.
– Idle: stops CPU clock, with logic still powered.
– Sleep: shuts off most of chip activity; 3 steps, each about
30 s; wakeup takes > 10 ms.
SA-1100 power state machine
Prun = 400 mW

run
10 s
160 ms
90 s
10 s
90 s
idle sleep

Pidle = 50 mW Psleep = 0.16 mW

Vous aimerez peut-être aussi