Académique Documents
Professionnel Documents
Culture Documents
Processor
Memory
Cores
Designers in many fields must be able to identify where microprocessors can be used,
design a hardware platform with I/O devices that can support the required tasks, and
implement software that performs the required processing.
Embedding Computers
2. Low power and low cost also drive us away from PC architectures
and toward multiprocessors. Personal computers are designed to
satisfy a broad mix of computing requirements and to be very
flexible. Those features increase the complexity and price of the
components. They also cause the processor and other components
to use more energy to perform a given function.
COMPLEX SYSTEMS AND MICROPROCESSORS contd…
■ Platform: The platform includes the bus and I/O devices. The
platform components that surround the CPU are responsible for
feeding the CPU and can dramatically affect its performance.
■ Program: Programs are very large and the CPU sees only a small
window of the program at a time. We must consider the structure
of the entire program to determine its overall behavior.
Contd….
1. Requirements
EX:
GPS Moving Map
The architecture is a plan for the overall structure of the system that
will be used later to design the components that make up the
architecture.
Hardware
Software
THE EMBEDDED SYSTEM DESIGN PROCESS Contd..
4. Designing Hardware and Software Components
The components will in general include both hardware—FPGAs, boards, and so on and
software modules.
By building up the system in phases and running properly chosen tests, we can often find
bugs more easily.
If we debug only a few modules at a time, we are more likely to uncover the simple bugs
and able to easily recognize them.
Careful attention to inserting appropriate debugging facilities during design can help
ease system integration problems, but the nature of embedded computing means that
this phase will always be a challenge.
FORMALISMS FOR SYSTEM DESIGN
(STRUCTURAL DESCRIPTION & BEHAVIORAL DESCRIPTION)
An object describing a display (such as a CRT screen) is shown in UML notation in Figure
A class is a form of type definition—all objects derived from the same class have the
same characteristics, although their attributes may have different values.
A class defines the attributes that an object may have.
It also defines the operations that determine how the object interacts with the rest of
the world.
There are several types of relationships that can exist between objects and classes:
■ Association occurs between objects that communicate with each other but have no
ownership relationship between them.
■ Composition is a type of aggregation in which the owner does not allow access to the
component objects.
1. A signal is an asynchronous
occurrence. It is defined in UML by
an object that is labeled as a
<<signal>>.
2. A call event follows the model of a
procedure call in a programming
language.
rcvr motor
power
supply
console
The user sends messages to the train with a control box attached to the tracks.
The control box may have familiar controls such as a throttle, emergency stop button, and so
on.
Since the train receives its electrical power from the two rails of the track, the control box
can send signals to the train over the tracks by modulating the power supply voltage.
The control panel sends packets over the tracks to the receiver on the train.
The train includes analog electronics to sense the bits being transmitted and a control system
to set the train motor’s speed and direction based on those commands.
Each packet includes an address so that the console can control several trains on the same
track; the packet also includes an error correction code (ECC) to guard against transmission
errors.
This is a one-way communication system—the model train cannot send commands back to the
user.
DCC was created to provide a standard that could be built by any manufacturer so that
hobbyists could mix and match components from multiple vendors.
■ Standard S-9.1, the DCC Electrical Standard, defines how bits are encoded on
the rails for transmission.
■ Standard S-9.2, the DCC Communication Standard, defines the packets that
carry information.
The DCC standard does not specify many aspects of a DCC train system. It doesn’t
define the control panel, the type of microprocessor used, the programming language
to be used, or many other aspects of a real model train system. The standard
concentrates on those aspects of system design that are necessary for interoperability.
Basic system commands
set-speed speed
(positive/negative)
set-inertia inertia-value (non-
negative)
estop none
Typical control sequence
:console :train_rcvr
set-inertia
set-speed
set-speed
estop
set-speed
Conceptual Specification
Digital Command Control specifies some important aspects of the system,
particularly those that allow equipment to interoperate. But DCC deliberately does
not specify everything about a model train control system
Fig2: UML collaboration diagram for major subsystems of the train controller system
Fig: A UML class diagram for the train controller showing the composition of the subsystems
Console physical object classes
knobs* pulser*
sender* detector*
panel motor-interface
speed: integer
train-number() : integer
speed() : integer
inertia() : integer
estop() : boolean
new-settings()
Transmitter and receiver classes
transmitter receiver
current: command
new: boolean
send-speed(adrs: integer,
speed: integer)
send-inertia(adrs: integer, read-cmd()
val: integer) new-cmd() : boolean
set-estop(adrs: integer) rcv-type(msg-type:
command)
rcv-speed(val: integer)
rcv-inertia(val:integer)
Formatter class
Formatter class holds state for
each train, setting for current
formatter train.
The operate() operation
performs the basic formatting
current-train: integer task.
current-speed[ntrains]: integer
current-inertia[ntrains]:
unsigned-integer
current-estop[ntrains]: boolean
send-command()
panel-active() : boolean
operate()
Control input sequence diagram
:knobs :panel :formatter :transmitter
change in read panel
control panel-active
change in speed/
settings
inertia/estop
number
new-settings
set-knobs
Formatter operate behavior
update-panel()
idle
send-command()
other
Panel-active behavior
T
current-train = train-knob
panel*:read-train() update-screen
changed = true
F
T
panel*:read-speed() current-speed = throttle
changed = true
F
... ...
Instruction sets preliminaries
In this topic, we begin our study of microprocessors by studying instruction sets—”The
programmer’s interface to the hardware”
A Harvard architecture.
A von Neumann architecture computer.
Which Architecture is Best Suited for
µp and DSP?
Von Neumann Architecture Harvard Architecture
Stored program
concept (store
program code along
with data)
Computer Architecture Contd…
The CPU has several internal registers that store values used
internally. One of those registers is the program counter
(PC),which holds the address in memory of an instruction. The
CPU fetches the instruction from memory, decodes the
instruction, and executes it.
• Advantages Disadvantages
• A smaller die size Poor code density compared with
CISC’s
– A simpler processor requires
Doesn’t execute x86 code
fewer transistors and less
silicon area.
• A shorter development time
– Less design effort and
therefore a lower cost
• A higher performance
– Simpler instructions are
executed faster.
Instruction set characteristics
• Fixed vs variable length.
• Addressing modes.
• Number of operands.
• Types of operations supported.
Programming model
• Programming model: Registers visible to the
programmer.
• Some registers are not visible (IR).
ARM – What is it?
• ARM stands for Advanced RISC Machines
The Toshiba 46HM94 46-inch The Nano IPod Samsung S3FJ9SK Smartcard IC
Television
History of ARM
status
reg
mechanism
CPU
data
reg
• Devices typically have several registers:
• ■ Data registers hold values that are treated
as data by the device, such as the data read or
written by a disk.
• ■ Status registers provide information about
the device’s operation, such as whether the
current transaction has completed.
I/O Application: 8251 UART
• Universal asynchronous receiver transmitter
(UART) : provides serial communication.
• 8251 functions are integrated into standard PC
interface chip.
• Allows many communication parameters to be
programmed.
Serial communication
no
char
time
Serial communication parameters
• Baud (bit) rate.
• Number of bits per character (5 to 8).
• Parity/no parity.
• Even/odd parity.
• Length of stop bit (1, 1.5, 2 bits).
8251 CPU interface
• The UART includes one 8-bit register that buffers
characters between the UART and the CPU bus.
status
(8 bit)
data
(8 bit)
Programming I/O devices
• Two types of instructions can support I/O:
– special-purpose I/O instructions;
– memory-mapped load/store instructions.
• But ARM…………………….. ?
Programming I/O devices contd…
1.ARM memory-mapped I/O
(Programs using normal R/W instructions to
communicate with the devices)
• Example
• Define location for device:
DEV1 EQU 0x1000
• Read/write code:
LDR r1,#09 ; set up device address
LDR r0,[r1] ; read DEV1
LDR r0,#8 ; set up value to write
STR r0,[r1] ; write value to device
Programming I/O devices contd…
2.Poke and Peek (as like push and pop)
• To write I/O devices through High Level Language
– Done through pointers, since C compiler hides
variables address from us
ack
Y
Y N
bus error timeout? vector?
Y
call table[vector]
Supervisor mode
• Complex systems are often implemented as several
programs that communicate with each other. These
programs may run under the command of an operating
system. It may be desirable to provide hardware checks
to ensure that the programs do not interfere with each
other.
• For example,
• By erroneously writing into a segment of memory used
by another program.
• In such cases it is often useful to have a supervisor
mode provided by the CPU.
• Normal programs run in user mode.
• The supervisor mode has privileges that user modes do
not.
For example, The Memory Management Unit (MMU)
systems allow the addresses of memory locations to be
changed dynamically.
Control of the memory management unit (MMU) is
typically reserved for supervisor mode to avoid the
obvious problems that could occur when program bugs
cause inadvertent changes in the memory management
registers.
• Vectoring provides a way for the user to specify the handler for the
exception condition.
• ` 0x0000000C
0x00000010
Abort (prefetch)
Abort (data)
Abort
Abort
0x00000014 Reserved Reserved
0x00000018 IRQ IRQ
0x0000001C FIQ FIQ
ARM’s Exceptions (2/6)
• When handling an exception, the ARM7TDMI:
Preserves the address of the next instruction in the
appropriate Link Register
Copies the CPSR into the appropriate SPSR
Forces the CPSR mode bits to a value which depends
on the exception
Forces the PC to fetch the next instruction from the
relevant exception vector
It may also set the interrupt disable flags to prevent
otherwise unmanageable nestings of exceptions.
If the processor is in THUMB state when an exception
occurs, it will automatically switch into ARM state
ARM’s Exceptions (3/6)
• On completion, the exception handler:
– Moves the Link Register, minus an offset where
appropriate, to the PC. (The offset will vary
depending on the type of exception.)
– Copies the SPSR back to the CPSR
– Clears the interrupt disable flags, if they were
set on entry
ARM’s Exceptions (4/6)
• Reset
– When the processor’s Reset input is asserted
• CPSR Supervisor + I + F
• PC 0x00000000
• Undefined Instruction
– If an attempt is made to execute an instruction that is undefined
• LR_undef Undefined Instruction Address + #4
• PC 0x00000004, CPSR Undefined + I
• Return with : MOVS pc, lr
• Prefetch Abort
– Instruction fetch memory abort, invalid fetched instruction
• LR_abt Aborted Instruction Address + #4, SPSR_abt CPSR
• PC 0x0000000C, CPSR Abort + I
• Return with : SUBS pc, lr, #4
ARM’s Exceptions (5/6)
• Data Abort
– Data access memory abort, invalid data
• LR_abt Aborted Instruction + #8, SPSR_abt
CPSR
• PC 0x00000010, CPSR Abort + I
• Return with : SUBS pc, lr, #4 or SUBS pc, lr, #8
• Software Interrupt
– Enters Supervisor mode
• LR_svc SWI Address + #4, SPSR_svc CPSR
• PC 0x00000008, CPSR Supervisor + I
• Return with : MOV pc, lr
ARM’s Exceptions (6/6)
• Interrupt Request
– Externally generated by asserting the processor’s IRQ input
• LR_irq PC - #4, SPSR_irq CPSR
• PC 0x00000018, CPSR Interrupt + I
• Return with : SUBS pc, lr, #4
• EX:
– Floating-point units are often structured as co-
processors.
• ARM allows up to 16 designer-selected co-
processors.
• The unit occupies two co-processor units in the
ARM architecture, numbered 1 and 2, but it
appears as a single unit to the programmer .
Co-processor contd….
address data
cache
controller
cache main
CPU
memory
address
data data
Cache definition : The Cache Memory is the volatile
computer memory which is very nearest to the CPU so also called
CPU memory, all the Recent Instructions are Stored into the Cache
Memory.
It is the fastest memory that provides high-speed data access to a
computer microprocessor.
Cache operation
• Many main memory locations are mapped
onto one cache entry.
• May have caches for:
– instructions;
– data;
– data + instructions (unified).
• Memory access time is no longer
deterministic.
Terms
• Cache hit: required location is in cache.
• Cache miss: required location is not in cache.
• Working set: set of locations used by program
in a time interval.
Types of misses
• Compulsory (cold): location has never been
accessed.
• Capacity: working set is too large.
• Conflict: multiple locations in working set map
to same cache entry.
Memory system performance
• h = cache hit rate.
• tcache = cache access time, tmain = main memory
access time.
• Average memory access time:
– tav = htcache + (1-h)tmain
Multiple levels of cache
cache block
hit value
byte
Write operations
• Write-through: immediately copy write to
main memory.
• Write-back: write to main memory only when
location is removed from cache.
Direct-mapped cache locations
• Many locations map onto the same cache
block.
• Conflict misses are easy to generate:
– Array a[] uses locations 0, 1, 2, …
– Array b[] uses locations 1024, 1025, 1026, …
– Operation a[i] + b[i] generates conflict misses.
Set-associative cache
hit data
Example: direct-mapped vs. set-
associative
address data
000 0101
001 1111
010 0000
011 0110
100 1000
101 0001
110 1010
111 0100
Direct-mapped cache behavior
• After 001 access: • After 010 access:
block tag data block tag data
00 - - 00 - -
01 0 1111 01 0 1111
10 - - 10 0 0000
11 - - 11 - -
Direct-mapped cache behavior, cont’d.
• After 011 access: • After 100 access:
block tag data block tag data
00 - - 00 1 1000
01 0 1111 01 0 1111
10 0 0000 10 0 0000
11 0 0110 11 0 0110
Direct-mapped cache behavior, cont’d.
• After 101 access: • After 111 access:
block tag data block tag data
00 1 1000 00 1 1000
01 1 0001 01 1 0001
10 0 0000 10 0 0000
11 0 0110 11 1 0100
2-way set-associtive cache behavior
• Final state of cache (twice as big as direct-
mapped):
set blk 0 tag blk 0 data blk 1 tag blk 1 data
00 1 1000 - -
01 0 1111 1 0001
10 0 0000 - -
11 0 0110 1 0100
2-way set-associative cache behavior
• Final state of cache (same size as direct-
mapped):
set blk 0 tag blk 0 data blk 1 tag blk 1 data
0 01 0000 10 1000
1 10 0111 11 0100
Example caches
• StrongARM:
– 16 Kbyte, 32-way, 32-byte block instruction cache.
– 16 Kbyte, 32-way, 32-byte block data cache (write-
back).
• C55x:
– Various models have 16KB, 24KB cache.
– Can be used as scratch pad memory.
Scratch pad memories
• Alternative to cache:
– Software determines what is stored in scratch
pad.
• Provides predictable behavior at the cost of
software control.
• C55x cache can be configured as scratch pad.
Memory management units
logical physical
address address
memory
main
CPU management
memory
unit
Memory management tasks
• Allows programs to move in physical memory
during execution.
• Allows virtual memory:
– memory images kept in secondary storage;
– images returned to main memory on demand
during execution.
• Page fault: request for location not resident in
memory.
Address translation
• Requires some sort of register/table to allow
arbitrary mappings of logical to physical
addresses.
• Two basic schemes:
– segmented;
– paged.
• Segmentation and paging can be combined
(x86).
Segments and pages
page 1
page 2
segment 1
memory
segment 2
Segment address translation
physical address
Page address translation
page offset
page i base
concatenate
page offset
Page table organizations
page
descriptor
page descriptor
flat tree
Caching address translations
• Large translation tables require main memory
access.
• TLB: cache for address translation.
– Typically small.
ARM memory management
• Memory region types:
– section: 1 Mbyte block;
– large page: 64 kbytes;
– small page: 4 kbytes.
• An address is marked as section-mapped or
page-mapped.
• Two-level translation scheme.
ARM address translation
Translation table 1st index 2nd index offset
base register
descriptor concatenate
1st level table
concatenate
descriptor
2nd level table
physical address
CPUs
• CPU performance
• CPU power consumption.
Elements of CPU performance
• Cycle time.
• CPU pipeline.
• Memory system.
Pipelining
• Several instructions are executed
simultaneously at different stages of
completion.
• Various conditions can cause pipeline bubbles
that reduce utilization:
– branches;
– memory system delays;
– etc.
Performance measures
• Latency: time it takes for an instruction to get
through the pipeline.
• Throughput: number of instructions executed
per time period.
• Pipelining increases throughput without
reducing latency.
ARM7 pipeline
• ARM 7 has 3-stage pipe:
– fetch instruction from memory;
– decode opcode and operands;
– execute.
ARM pipeline execution
time
1 2 3
Pipeline stalls
• If every step cannot be completed in the same
amount of time, pipeline stalls.
• Bubbles introduced by stall increase latency,
reduce throughput.
ARM multi-cycle LDMIA instruction
time
Control stalls
• Branches often introduce stalls (branch
penalty).
– Stall time may depend on whether branch is
taken.
• May have to squash instructions that already
started executing.
• Don’t know what to fetch until condition is
evaluated.
ARM pipelined branch
time
Delayed branch
• To increase pipeline efficiency, delayed branch
mechanism requires n instructions after
branch always executed whether branch is
executed or not.
Memory system performance
• Caches introduce indeterminacy in execution
time.
– Depends on order of execution.
• Cache miss penalty: added time due to a
cache miss.
Types of cache misses
• Compulsory miss: location has not been
referenced before.
• Conflict miss: two locations are fighting for
the same block.
• Capacity miss: working set is too large.
CPU power consumption
• Most modern CPUs are designed with power
consumption in mind to some degree.
• Power vs. energy:
– heat depends on power consumption;
– battery life depends on energy consumption.
CMOS power consumption
• Voltage drops: power consumption
proportional to V2.
• Toggling: more activity means more power.
• Leakage: basic circuit characteristics; can be
eliminated by disconnecting power.
CPU power-saving strategies
• Reduce power supply voltage.
• Run at lower clock frequency.
• Disable function units with control signals
when not in use.
• Disconnect parts from power supply when not
in use.
Power management styles
• Static power management: does not depend
on CPU activity.
– Example: user-activated power-down mode.
• Dynamic power management: based on CPU
activity.
– Example: disabling off function units.
Application: PowerPC 603 energy
features
• Provides doze, nap, sleep modes.
• Dynamic power management features:
– Uses static logic.
– Can shut down unused execution units.
– Cache organized into subarrays to minimize
amount of active circuitry.
PowerPC 603 activity
• Percentage of time units are idle for SPEC
integer/floating-point:
unit Specint92 Specfp92
D cache 29% 28%
I cache 29% 17%
load/store 35% 17%
fixed-point 38% 76%
floating-point 99% 30%
system register 89% 97%
Power-down costs
• Going into a power-down mode costs:
– time;
– energy.
• Must determine if going into mode is
worthwhile.
• Can model CPU power states with power state
machine.
Application: StrongARM SA-1100
power saving
• Processor takes two supplies:
– VDD is main 3.3V supply.
– VDDX is 1.5V.
• Three power modes:
– Run: normal operation.
– Idle: stops CPU clock, with logic still powered.
– Sleep: shuts off most of chip activity; 3 steps, each about
30 s; wakeup takes > 10 ms.
SA-1100 power state machine
Prun = 400 mW
run
10 s
160 ms
90 s
10 s
90 s
idle sleep