Académique Documents
Professionnel Documents
Culture Documents
Joseph R. Marshall 703-367-1326 Joe.Marshall@BAESystems.com Jeff Robertson 703-367-3476 Jeff.Robertson@BAESytems.com 9300 Wellington Road Manassas, Virginia 20110 AbstractOriginally12 conceived for fault tolerance control of its associated general purpose processor, the Embedded Microcontroller (EMC) present in BAE Systems Power PCI Bridge Application Specific Integrated Circuit (ASIC) has evolved into a processing workhorse finding applications spanning memory controllers, I/O processors as well as continuing to support the RAD750 PowerPC processor. Development tools have also evolved from a simple assembler to a full development environment including compiler and simulator integrated with the PowerPC tools supporting the RAD750. This paper describes the evolution of the EMC within the Power PCI Bridge, development of its support tools and some of its applications as both a capable assistant to the RAD750 as well as a standalone processing element. Power and performance improvements are highlighted. Comparison to other processor cores that might be used in space are also shown. Discussion of future enhancements will also be mentioned. TABLE OF CONTENTS 1. INTRODUCTION .............................................. 1 2. RAD750 BOARD ARCHITECTURE ................. 1 3. BRIDGE ASICS AND THEIR CORES ............... 2 4. EMC CORE EVOLUTION ............................... 3 5. PERFORMANCE COMPARISON ....................... 4 6. DEVELOPMENT ENVIRONMENT..................... 5 7. RUNTIME ENVIRONMENT .............................. 6 8. FUTURE ENHANCEMENTS .............................. 7 9. SUMMARY ...................................................... 7 ACRONYM LIST ................................................. 8 REFERENCES...................................................... 8 BIOGRAPHY ....................................................... 8 This is the beginning of numerous spacecraft applications of this high performance general purpose processor for military, civil and commercial space. In order to utilize a PowerPC processor such as the RAD750, a bridge device is required. BAE Systems has developed and enhanced its Power PCI Bridge ASIC to match the RAD750 and provide high speed standardized interfaces such as the Peripheral Component Interconnect (PCI), Joint Test Access Group (JTAG), memory and Universal Asynchronous Receiver Transmitters (UART). This chip was designed using reusable cores connected internally to each other through a high speed cross-bar switch. The reuse of these cores was demonstrated when the ASIC was upgraded and then again when most of the cores were combined with a new high speed interface to form a SpaceWire Bridge ASIC.
1. INTRODUCTION
The RAD750 Processor, a radiation-hardened version of the PowerPC 750 Central Processing Unit (CPU), was flown for the first time in 2005 as part of the Deep Impact mission that successfully impacted the comet Tempel 1 on July 4th.
1 2
0-7803-9546-8/06/$20.002006 IEEE IEEEAC paper #1456, Version 6, Updated Jan. 12, 2006
RAD6000 (typically taking seconds to reset) and monitor the processor for potential checkstops and other critical errors. This engine was called the Processor Availability Sequencer (PAS). It had a very limited number of instructions, inputs and actions all centered on processor control. It did not have access to most of the functional interfaces, only those needed to control the RAD6000. It did have access to all scan strings within the RAD6000 CPU; thus with enough detailed low level programming, one could cause the processor to access external elements; though no one ever had the time or need to do this. In the late 1990s, we began work on the next generation of radiation-hardened commercial processor. We utilized many of the lessons we learned with the RAD6000. This processor, named the RAD750, is a radiation hardened version of the IBM PowerPC 750. Since no single-chip bridge ASIC was available, we created a bridge ASIC for the RAD750 which we named the Power PCI ASIC. [2] The original Power PCI ASIC provided many of the same interfaces provided by the LIO. For this reason, it was possible to reuse chips using the PCI interface in creating board solutions for the RAD750. We have now developed a number of RAD750 boards utilizing this interface. These are best summarized by Figure 1, which shows the typical elements of a RAD750 Processing board and its I/O fabric. On a given board, only a subset of the memory and I/O shown would be implemented.
PCI Bus(s) Test JTAG
+3.3V
serve directly in place of the PCI Bridge due to many of the same interfaces being present on that ASIC. See the next section for more detail on the cores that make up the ASICs. BAE Systems has created a variety of these standalone boards including a reconfigurable processor board using Xilinx Virtex field programmable gate arrays (FPGA) [3], a large memory processing board that provided multiple high speed as well as PCI ports to the data [3], and a solid state recorder board for the Messenger Mission to Mercury. The key component for each of these boards is the presence of the Power PCI ASIC as the PCI interface and board controller. Making this control possible is an embedded microcontroller (EMC) inside each of these ASICs. The EMC was designed as a much more general purpose processor than the PAS so that many of the limitations of the PAS could be overcome. The EMC has access to anything connected to the bridge it resides in. It has a small cache for instructions and a set of general purpose registers. All instructions are single cycle. It runs at the same speed as the bridge chip and thus can execute 66 million operations per second with the current generation of bridge chips. These features, combined with the better embedded features of the RAD750, has resulted in more widespread use of the EMC. Two years after the original Power PCI ASIC was developed, an enhanced Power PCI (EPPC) ASIC was created that expanded the capabilities of the bridge functions and further improved the EMC.
PCI Bus(s) Test JTAG
Regulator(s) +2.5/1.8/1.5V
SpaceWire Bridge
L2 Cache (1 MB)
60X I/F
SpaceWire Ports
+3.3V
Regulator(s) +2.5/1.8/1.5V
SpaceWire Bridge
SpaceWire Ports
Data Port(s)
+5.0V
RAD750
+5.0V
Digital POR
Other Interfaces
PCI or Memory
Other Interfaces
OSC
OSC
UART
60X I/F
UART
PCI Bridge
PCI or Memory
PCI Bridge
PCI or Memory
1553 A 1553 B
Memory I/F
SMMIT DXE
1553 A 1553 B
ASIC or FPGA
ASIC or FPGA
SDRAM 128-540 MB
SRAM 1-52 MB
SDRAM 128-2048 MB
SRAM 1-52 MB
Figure 1. RAD750 Processing Board Fabric The Power PCI Bridge ASIC enables a second type of processing board to be created; these standalone boards do not contain a RAD750 but may contain other processing elements, memory and/or I/O. Figure 2 shows the typical connections and elements spanning this class of boards. Note the ability to utilize the 60X interface as a second high speed interface port to memory with some limitations on handshaking and performance when synchronous dynamic random access memory (SDRAM) is used. When this high speed interface is not required, The SpaceWire bridge may 2
Besides the need for various memory and I/O interfaces, the recent Power PCI, EPPC and SpaceWire ASICs all address a special need of embedded processing, that is to off-load and distribute data bottlenecks from specific processors. In the initial Power PCI ASIC, the EMC registers, instruction set and cache were designed to be able to handle direct memory access (DMA) transfers without requiring fetching instructions after the initial setup, thereby keeping the memory interface free for the DMA operations. In the EPPC, a separate DMA engine was added to handle DMAs and a scratchpad RAM to hold programs and supplement the EMC register space. The EMC also contains sufficient logical instructions to operate as an I/O controller or packet handler. It can process data as it is dropped in memory from a DMA or bus operation, as well as set up the DMA controller to start a transfer.
Embedded MicroController UART
interrupt. Stopped, the final PAS state represents complete inactivity and is entered by command (for debug) or by error.
Reset off Reset off Reset off Single Step or Stop Command Monitor Command Error Stop
Initialize
Sequence
any Reset Error or Halt
Monitor
Stopped
External Reset
Start Command
External Reset
JTAG Master
On-Chip Bus
JTAG Slave
The PAS is built around a small set of 32-bit registers. These include an accumulator, program counter, control register, breakpoint register, program counter storage register, vector control register and vector address registers for each of the PAS seven vectors. PAS microinstructions are formatted as shown in Figure 5. These 32 bit instructions are protected with a single error correct double error detect (SECDED) error correcting code (ECC). The PAS instruction set included OR, AND, XOR, LOAD, INCREMENT, STORE, BRANCH, BRANCH CONDITION, BRANCH & STORE Program Counter (PC), RETURN from STORED PC, MONITOR and STOP. The PAS could load from or store to a limited number of registers in the LIO, based on a defined need for reset or error handling. This limitation and the RAD6000 architecture kept the PAS from performing as a standalone microcontroller beyond its basic purpose of reset and error handling.
Op-code 0:7 Register Address 0:7 Immediate Data 0:15 ECC 0:7
Figure 5. PAS Instruction Format The EMC core improved significantly on the PAS instruction set and capabilities. A block diagram of the EMC is shown in Figure 6. The EMC connects to the On Chip Bus (OCB) of the ASIC where it is located via a Master and Slave Interface. The OCB gives the EMC access to any interface within the ASIC it resides and allows it to run programs from within the chip or external to it. Instructions are prefetched from the OCB and stored in the instruction pre-fetch buffer. They are then decoded and passed to the execution unit based on a 32 bit program counter to operate on one of the 24 32-bit General Purpose Registers (GPR) or five Special Purpose Registers (SPR). Any of the eight vector interrupts to the EMC are stored in 3
one of the SPRs and cause a vectoring of instruction execution. The Master OCB interface is used by the EMC for Load and Store instructions. All EMC major registers are mapped to the Slave OCB interface for debug purposes.
Instruction Pre-fetch Buffer Instruction Decode
core and may be tied to other ASIC cores or external discrete input pins on an ASIC. The EMC instruction set uses six different 32 bit reference formats as shown in Figure 8. These are used to create 44 different instructions. Included in this set are logical, arithmetic, load, store, branch, link, and debug commands.
Format A
Opcode 31
Src1 15
Src2 10
Reserved 0 Immediate
Format B
Opcode 31
Execution Unit Special Vector Purpose Interrupts Registers General Purpose Registers OCB Slave Interface
OCB
Format C
0 Immediate 0 Immediate 0
Opcode 31
Format D
Opcode 31
Figure 6. EMC Core Block Diagram The EMC operates in one of five states. These are illustrated in Figure 7. The EMC always transitions to or remains in the Reset state as long as the ASIC is in reset or the signal EMC_Enable is inactive. Once reset is removed and EMC_Enable is active, the EMC enters its principal state, that of Sequence. While in sequence the EMC fetches and executes instructions based on its program counter. The Monitor state allows the EMC to stop executing instructions and wait for a vector to appear on one of its inputs. This provides the lowest power yet error free state. Once a vector is received, the EMC will then return to the Sequence state and execute instructions. The Stop state is provided mostly for debug. It is typically reached based on an instruction or on a debug action such as single step or stop on address. The final EMC state, Error is used to keep the EMC in a known state after a critical error has been detected within the EMC or on an interface access.
Not reset and EMC Enabled
Format I
Opcode 31 25
Reserved 0
Figure 8. EMC Instruction Formats The EMC continues in the tradition of the PAS with significant fault tolerance detection capabilities. All state machines are coded as one-hot to detect single bit errors. Parity is checked on all registers every cycle. Various sequences and protocols are checked. Invalid instructions and accesses to invalid address space are flagged. Any of these result in a transition to the error state or a vector interrupt depending on the severity. Error state transitions must be removed by reset; vectors are resolved by EMC software. When the EPPC was developed, the EMC core was improved based on user inputs from the initial EMC. Multiply and bit shift instructions were added. The instruction cache was increased from 96 bytes to 2048 bytes. Branch prediction was added to better pre-fetch instructions around conditional logic. Execution was allowed under load and store. The number of GPRs were increased from 8 to 24. Taken together with the addition of scratchpad static random access memory (SRAM) within the chip and a separate DMA controller allows the EMC to be used as a small standalone microcontroller.
Sequence
Stop Inst or Cmd, Step Cmd Done Or Breakpoint Step or Go Command Critical Error
Reset
any Reset
Error
Monitor
Stop
Figure 7. EMC Core Operational States The EMC has eight prioritized vector inputs. The highest of these, Vector 7 is non-maskable. This is tied to a watch dog timer that must be reset as part of the recovery routine or else an external reset is required. Vector 6 is utilized for core processing errors such as invalid instructions or errors during fetches from the OCB. The last 6 vectors are connected to external interrupt events outside of the EMC 4
5. PERFORMANCE COMPARISON
Table 1 compares the major performance features of the PAS, EMC and enhanced EMC processor with the current RAD750 [6] shown for reference.
6. DEVELOPMENT ENVIRONMENT
PAS code was expected to be minimal and likely copied from previous versions due to its limited flexibility. For this reason, only a PAS Assembler was created. The PAS Assembler allowed one to write PAS instructions by hand and assemble them and link them into a load module containing RAD6000 code. Since the PAS instructions are located in the memory block as the start-up read only memory (SUROM), it was necessary to link these together when creating a RAD6000 load module for electrically erasable programmable read only memory (EEPROM) or PROM.
Parameter Raw Processing Cache Size Max Frequency TID Special Instructions Exception Handling Accessibility PAS 5 MOPS None 33 MHz 1 MRad (LIO) Error Vector Handling Error Only Limited LIO Registers and I/O None 1 0 9 EMC 33 MOPS 96 Bytes Instr 33 MHz 1 MRad (PPCI) Error Vector and Monitoring Error and Functional All OCB devices 4 GB 8 0 5 EPPC EMC 66 MOPS 2 KB Instr 66 MHz 200 KRad (EPPC) EMC + Shift and Multiply Error and Functional All OCB devices 4 GB 32 0 5 RAD750 264 MOPS 32 KB Instr 32 KB Data 132 MHz 200 KRad (R25) Power Architecture Error and Functional All 60x connected devices 4 GB 32 32 Multitude (14 in user mode)
C-Code
lcc (compiler)
Generated Assembler
emcmap
Executable (elf format)
Binary
Debug
Figure 9. EMC C Development System A driving requirement for the EMC toolset development was that it was done as quickly and cheaply as possible. The simulator/debugger was initially developed in early 2002. The compiler and major updates to the simulator/debugger were made in 2005. Actual time spent working on the tools was about 4 labor months. This productivity was achieved because the majority of the tools are based on pre-existing commercial off the shelf (COTS) tools. The GNU tools were either part of the CYGWIN distribution or GNU binutils targeted to the PowerPC. The C compiler itself is base on the lcc compiler which was designed to easily accept new backends. The C compiler consists of a C Pre-Processor (CPP) and the lcc compiler. The Pre-Processor comes with the CYGWIN environment. The Pre-Processor is the program that interprets the #include, #if and other pre processor directives and generates output for the C compiler. The lcc is a retarget-able American National Standards Institute (ANSI) C compiler designed by Christopher Fraser and David Hanson. The design of lcc makes it relatively easy to retarget it to different architectures. BAE Systems retargeted this compiler to the EMC architecture. Lcc was chosen because of the substantial documentation available (see references) and simplicity of the retargeting procedures. Using the GNU C compiler was briefly considered, but rejected as to daunting a task. BAE Systems EMC backend to the lcc compiler is ANSI C compatible except that it does not support floating point operations. The EMC does not support floating point operations, and adding software floating point emulation would not be an effective use of engineering resource since it would never realistically be used. The output of the lcc compiler is EMC assembler source code. The source code is assembled using the GNU 5
Table 1 Embedded Processor Performance Comparison The EMC C Development System runs on a Microsoft Windows XP Professional based personal computer. The major components are a C-Compiler, an EMC Assembler, a linker, a mapper and a debugger, and the CYGWIN Linux environment. A make facility is used to integrate these components. Figure 9 shows the overview of the system.
assembler, targeted to the Power PC, available via GNUs binutils distribution. The EMC assembler instructions are implemented via a series of assembler macros which convert the EMC mnemonics into EMC op codes. Using the GNU assembler in combination with its macro capabilities significantly reduced development time, and made the EMC object file compatible with the GNU linker and other utilities. Several Pseudo Instructions were created which combines multiple assembler operations into a single mnemonic. The GNU linker is use to link multiple EMC object files into a single executable. The linker has a multitude of options available to it. For imbedded systems the ability to link the executable so that it is loaded at one location, but is intended to execute at another is one of the more useful. This allows executable code, data and constants to be place in EEPROM or PROM and copied to RAM for execution. Many additional binutils commands (such as archive to build libraries) are available. The EMC Mapper takes the executable and linking format (ELF) load image output of the linker and converts it into two files. One file is the binary image which can be loaded into memory or the EMC Symbolic Debugger. The other is debug information which can be used by Debugger. The EMC Symbolic Debugger (see Figure 10) is a window based debugger and simulator. It simulates the EMC instruction set and supplies source level C debugging.
(O/S) supplied interfaces. The compiler must meet the calling conventions defined by the O/S. In the case of the EMC there was no O/S. This had both the advantage that we did not have to meet a pre-existing (and possibly hard to implement) interface. It had the disadvantage that we had to invent one. The biggest design decision for the runtime environment was the register usage convention. The convention that was chosen was made to simplify compiler design and allow the EMC to support multi-tasking operations. The EMC consists of 24 general purpose 32 bit registers named R0..R15 and R24..R31. In addition there are 8 special purpose registers. Of the special purpose registers the compiler is indirectly aware of the Program Counter (PC) and the Condition Status Register (CSR) Table 2 is the final register usage convention. Many of the decisions (such as using R2 to return a value from an function and using R4..R7 for parameters) reflect conventions used on many other processors. The decision to use R3 as a working registers simplified the compiler design. The compiler wants to generate code that accesses data relative to the stack.
An instruction: LD dest, Src, Offset Which loads a destination register base on the stack pointer (or other register) and an offset would be very useful. Unfortunately the EMC does not support this type of command. It does support, however, the instruction: LD Dest,Src1,Src2 This loads a destination register from the address Src1+Src2. The EMC assembler uses R3 (which is never used directly by the compiler) to build the instruction we really want. The instruction LDRO Dest, Src1, Immed Is a pseudo instruction which is converted by the assembler into the instructions: ADDI R3, ZERO, Immed LD Dest,R3,Src1 The assembler use R3 in this manner for several other instructions. One of the trickier tasks is to handle interrupts with out changing any of the registers before saving or after restoring them. The CSR is the most difficult case since it will change if any arithmetic operation is performed. To solve this problem R15 is not used by the compiler and only use in assembler code when interrupts disabled (normally in an interrupt handler). R15 is loaded with the address to store 6
7. RUNTIME ENVIRONMENT
In the simplest cases a C program requires a very minimal runtime environment. Many features one thinks of as part of the complier (malloc, prinf, etc.) are actually subroutines written in C or assembler that access Operating System
the CSR and other registers (The load instructions do not modify the CSR).
8. FUTURE ENHANCEMENTS
We have begun development of our next generation bridge ASIC, named Golden Gate. This will match the RH15 RAD750 currently under development in speed and capability. Among other changes the EMC will be enhanced to add some specialized instructions that match a users want list such as counting the number of zeroes in a word. When that effort is complete, the EMC will be even more usable as both a companion to the RAD750 and as a standalone processor for lower or distributed processing applications. All EMC tools will also be updated to match.
9. SUMMARY
Based on the small hard core developed for the RAD6000 family, the EMC has been created and enhanced to be a versatile and high performance processing core. It has been used on a variety of processing, memory and I/O nodes. With the full set of tools including a compiler, assembler and linker, it will be utilized not just with the RAD750 but as a strong hard controller for a variety of spacecraft processing, memory and I/O nodes in the future. R8..R14 R15 aka IR
Usage Used as the Stack Pointer. The stack grows by subtracting from R0. The stack grows when a function call is made. When a function is called R1 contains the address of the return location. Is the function return value. For example R2 will contain the integer value returned by int foo(). A working register not used by the Compiler but used by the Assembler to build more complex Pseudo Instructions required by the Compiler. The first 4 words worth of parameters passed to a function. For example in the function int calc(int a, int b, int c, int d); a would be in R4, b in R5, etc. These registers are scratch registers used by the compiler when evaluating expressions. This register is not used by the compiler. It is reserved for use by assembly code which is handling interrupts or executing with interrupts disabled. This register (logically) contains the value 0 when used as a source register. It can also be a target register when the results of an operation is not needed. For example the compiler might generate SUB ZERO,R4,R5 to compare R4 and R5 (the command will set the Condition Status Register, but discard the numeric result of the subtraction). PC is the program Count, the other registers are not directly used by the compiler except when accessed by via Built In Functions.
These registers are used to store local variables. The compiler will automatically chose variables to place in a register. The programmer can override the compiler decision by using the register attribute when declaring a local variable. For instance: {int register i; int register j; } will place i and j in these registers. Table 2. EMC Register Usage
ACRONYM LIST
ANSI American National Standards Institute ASIC Application Specific Integrated Circuit CAT Clock and Test COTS Commercial off the Shelf CPP C Pre-Processor CPU Central Processing Unit CSR Control and Status Register ECC Error Correcting Code EEPROM Electrically Erasable PROM ELF Executable and Linking Format EMC Embedded Micro Controller EPPC Enhanced Power PCI FPGA Field Programmable Gate Array GNU GNUs not Unix (recursive acronym) GPR General Purpose Register I/O Input / Output IR Instruction Register JTAG Joint Test Access Group LD - Load LIO Local Input / Output OCB On Chip Bus O/S Operating System PAS Processor Availability Sequencer PC Program Counter PCI Peripheral Component Interconnect PROM Programmable Read Only Memory RAD6000 Radiation Hardened RISC System 6000 RAD750 Radiation Hardened PowerPC 750 RAM Random Access Memory RISC Reduced Instruction Set Computer SDRAM Synchronous Dynamic Random Access Memory SECDED Single Error Correct Double Error Detect SP Stack Pointer SPR Special Purpose Register SRAM Static Random Access Memory SUROM Start Up Read Only Memory UART Universal Asynchronous Receiver Transmitter
[4] Glenn Parker Rakow et. al., NASA / BAE Systems SpaceWire Efforts, International SpaceWire Seminar (ISWS) 2003 Proceedings, November 2003. [5] Joseph R. Marshall & Myrna Milliser, Application of Reusable Cores to System-On-A-Chip, Government Microcircuits Application Conference 2001, March 2001. [6] Richard W. Berger et. al., April 2001, The RAD750 A Radiation Hardened PowerPC Processor for High Performance Spaceborne Applications, IEEE Aerospace Conference 2001, April 2001. [7] Christopher Fraser and David Hanson, A Retargetable C compiler: Design and Implementation, 3rd Edition 2003. Published by Addison-Weseley. ISBN 0805316701. [8] Cygwin: a Linux-like environment for Windows. http://www.cygwin.com [9] Binutils: Collection of binary utilities. http://directory.fsf.org/GNU/binutils.html RAD750 is a registered trademark of BAE Systems and is a radiation-hardened licensed version of the PowerPC 750. PowerPC is a registered trademark of IBM. CompactPCI is a registered trademark of the PCI Industrial Computers Manufacturers Group.
BIOGRAPHY
REFERENCES
[1] Joseph R. Marshall, Dynamic Space Processor Architecture Built on Commercial Open System Interface Standards, 18th Digital Avionics Systems Conference Proceedings, October 1999. [2] Joseph R. Marshall and Richard W. Berger, A Processor Solution for the Second Century of Powered Space Flight, 19th Digital Avionics Systems Conference Proceedings, October 2000. [3] Joseph R. Marshall, A Reconfigurable Digital Processing System for Space, 20th Digital Avionics Systems Conference, October 2001.
Joe Marshall is a Senior Principal Systems Engineer in embedded aerospace processor systems at BAE Systems in Manassas, Virginia. He has developed and / or led development of hardware, software and subsystems at GTE, IBM, Loral, Lockheed Martin and BAE Systems with the last 17 years focused on Spacecraft Processing Systems and their applications. Joe is lead architect for BAE Systems reconfigurable processing architecture and products. He was chair of the AIAA Computer Systems Technical Committee from 2001 to 2005 and specializes in dependable systems, reconfigurable systems, processor architectures, interconnects and systems engineering. Joe has a BSEE and MSEE from Purdue University and has been certified twice as a System Architect by Lockheed Martin.
Jeff Robertson is a Senior Principal Systems Engineer at BAE Systems in Manssass Va. He has over 23 years of experience in embedded systems for both space and terrestrial systems. Currently he is the principal investigator for BAE SYSTEMS Power Aware Computing and Communications (PAC/C) Awareness and Management of Power for Space (AMPS) project. AMPS is sponsored by DARPA and is performing research in adaptive power for space based applications. He received his BS in Computer Science from the University of Dayton.