Vous êtes sur la page 1sur 11

Networking Solutions

Giles Gillison August 2002

New and emerging technology for wireless connectivity is driving a proliferation of product development targeted at next-generation applications to enable the networked business and home. As a result, the volume-driven customer premises equipment (CPE) sector of the networking market is undergoing a transformation to consumer-market status. Success in the CPE market will require a design platform and product development process that is flexible, that meets the performance needs of a variety of new and emerging network protocols, enables rapid time-to-market, and delivers cost and power-efficient end products. ARM cores are already at the heart of many network equipment designs, offering low cost and power, with excellent embedded performance. A dual core architecture is ideally suited to many CPE applications, especially where protocol translation is a requirement. Addressing the key time-to-market issue, ARM has introduced the PrimeXsys Dual Core Platform (946 DCP), based on the successful ARM946E-S microprocessor core. The 946 DCP provides the development team with a design platform capable of running a real-time operating system out of the box, that is flexible, easy to extend and differentiate.

Networking Market Drivers


The segmentation of networking markets is often presented as a confusing array of sub-segments, which references the layers in the protocol stack. However, in broad terms, networking applications can be divided into two easily definable segments, each having quite different implementation requirements. Firstly Infrastructure and backbone equipment, such as complex switches and routers, tend to demand high bandwidth, and therefore require high-performance subsystems. In general, this kind of equipment is expensive, and shipped in relatively low volumes to network operators and large enterprises. Secondly, at the other end of the spectrum, CPE such as Network Interface Cards (NICs), cable modems, xDSL interfaces and wireless LAN access points tend to have lower bandwidth requirements. Gateway devices provide a bridge (and more) between WAN and LAN traffic and protocols and may perform specific applications, such as voice over IP (VOIP) to enable voice services to be delivered via broadband. CPE has the potential for high product volumes, and is therefore more aligned with the business models of many of ARM's semiconductor partners (Figure 1).

Consumers are now buying second or third PCs for home or home office use, stimulating sales in low-cost PC networking and wireless LAN solutions. There are real business benefits for large and small enterprises in terms of the reduced cost of installing and extending a wireless LAN. In addition, manufacturers are building network connectivity into an increasing variety of business and consumer equipment. Intelligent networked printers, networked games consoles, even high-end digital imaging products such as video cameras now offer network connectivity. In the future, home control systems will see the network opportunity expand further to other domestic applications.

Infrastructure Equipment Backbone


Access/Edge

ISP Enterprise Central Office Customer Premises Equipment

Consumer/SOHO

Figure 1. Bandwidth and Unit-volume in the Networking Market All of these applications are expanding the requirement for solutions from the broadband network that bridge and translate into the premises network.

Technology Drivers
Increasing bandwidth is the most obvious technology driver for networking equipment. Sylla [1] predicts that bandwidth will follow a growth similar to Moores Law i.e. a doubling of client bandwidth every 18-24 months, whereas core / optical bandwidth doubles approximately every 8 months [2].

However, increasing bandwidth is not the only factor. Functionality and intelligence is growing in the CPE space also, and so simply using bigger and faster processors will not lead to the most efficient solutions.

Functional convergence, increased security and multiple, evolving protocols are all driving CPE complexity. Translation between different protocols is a common requirement for many CPE applications. For example, the core functionality within a wireless LAN interface will be conversion between Ethernet and a variant of the IEEE802.11 interface standard. The product requirement may also specify multiple access points coming into the interface box, resulting in the need to manage multiple packet streams. Even for non-portable networking products, achieving low power dissipation is extremely important in controlling cost in the end product. A low power design will put less stringent requirements on the casing design and PCB manufacturing tolerance, and is less likely to need active cooling through a fan. Relaxing these types of manufacturing issues can also improve reliability.

Wired Connectivity

EtherNet USB

xDSL Cable Modem Fixed Wireless Satellite Set-top box

Broadband Gateway

Wireless Connectivity

Bluetooth 802.11,a,b SWAP HOP

Phones Wireless Voice VOIP

Integration

Figure 2. Product Convergence Driving Application Integration For good real-time performance, a processor implementation should be able to manage interrupts efficiently, as well as having sufficient performance to implement the required data processing. Context switching is also a key requirement being able to switch between interrupt service routines and mission-mode code effectively will result in much greater performance. Indeed, the delay or latency incurred in serving interrupts and performing context switching must often be minimized. Memory management is an issue that should be carefully considered for these applications. Deeply embedded systems typically run a fixed set of programs with no requirement for the introduction of new programs. Such applications do not demand a full memory management unit (MMU) with address translation. A simpler memory

protection unit (MPU) is usually sufficient to enable the use of an RTOS, allowing multiple programs to run but with less overhead when context switching. Cores that include MMUs may not enable interrupt servicing to be performed as efficiently as an MPU-based core, and also place bigger demands on external memory requirements for address translation. As CPE moves towards the home consumer market, the price of the end product becomes a more critical factor in the competitive environment. Equally, time-tomarket advantage can be fundamental in achieving market share. The consumerization of end-user networking equipment will undoubtedly compress the time between releases of new product variants and derivatives. These competing pressures call for a balanced approach between optimizing cost without compromising time-to-market, and at the same time enabling design flexibility for future derivative product needs. Both the business and technical requirements mandate a system-on-chip (SoC) approach, with a processor solution capable of meeting the specific control and signal processing needs of a diverse range of networking applications. The implications for the technical needs of the design suggest that while a gross overspecification of processor performance would lead to unnecessary costs in the end product, some performance headroom and flexibility is desirable if successive product derivatives are to be introduced without the need for fundamental re-design. Summarising these drivers, development teams must be equipped to meet both the technical and business challenges in the implementation of subscriber equipment.

ARM in Networking
ARM has built a solid foundation in providing cores to satisfy the needs of the networking market. In 2001, ARM partners shipped an estimated 45 million cores into this market. This represents the manufacturing output from over 500 design wins in the networking space Networking design wins have been reflected across the ARM core portfolio. Because of the desire to satisfy the balance between good embedded performance, low power and cost, many current CPE applications make use of cores from the ARM7 and ARM9 microarchitecture families. With the drive for increased bandwidth and more complex protocol translation, many developers are considering (if not already implementing) a move to dual core designs. CPE is typical of the kind of application that has multiple, complex control requirements. Many system architectures have been built around a single core for managing the higher-level functions, such as system configuration operations and the RTOS, which is then integrated with dedicated state machines to perform

operations such as packet processing. However, such implementations are often very complex to design and debug especially when the design involves multiple proprietary state machines. Properly supported, embedded software solutions are easier to maintain and adapt than complex proprietary hardware solutions. They can be programmed in high-level languages rather than demanding low-level design techniques, following a wellknown programmers model, and also allow access to a large base of highly experienced engineers. Because ASIC integration has increased dramatically, the CPU core now forms a relatively small part of completed devices in terms of area. It is more cost-effective and realistic than ever before to consider replacing multiple dedicated hardware control blocks with a second or third CPU, to handle data access operations for example. These trends have led ARM to introduce the PrimeXsys Dual Core Platform (946 DCP), based on the integration of two ARM946E-S microprocessor cores.

PrimeXsys Reviewed
ARM introduced the first PrimeXsys Platform in 2001. The PrimeXsys Wireless Platform based on the ARM926EJ-S core (926 PWP) provides a re-usable platform solution that is intended to support the development of high-end consumer products such as PDAs and 2.5G/3G phones, and hand-held games machines. Other OS-based consumer products that involve functions such as audio codecs or video streaming, and applications that require Java support, can also benefit from the PrimeXsys Platform. As well as the core, the 926 PWP comprises other blocks such as LCD controller, vectored interrupt controller, watchdog and timers that enable a consumer OS to be booted on the device. The 926 PWP provides more than just extendable hardware IP. It encompasses a validation methodology, development tools, application software, and crucially, a choice of pre-ported operating systems. Using PrimeCell Peripherals and other ARM IP, the 926 PWP can be extended for connectivity, storage and key applications in audio, video, and graphics. These options can be complemented with proprietary and third-party IP to enable wireless baseband processing, for example. More recently, the introduction of the Dual Core Platform has brought the benefit of pre-integration to networking and embedded control applications. This platform also provides a pre-integrated RTOS, multi-core debugging capability and real-time trace options, further reducing time-to-market.

Both of these platforms are intended to provide a standard and scalable system-onchip (SoC) infrastructure, offering a high level of reusability to the developer. In engineering these platforms, the aim has been to move the starting point for the developer to a higher level from the core or IP library, and at the same time encourage differentiation.

ARM946E-S Features
The ARM946E-S combines the ARM9E-S core with instruction (I) and data (D) caches, tightly coupled memory (TCM), write buffer, and memory protection unit (MPU) for embedded applications running an RTOS. Both I and D caches, and TCMs are configurable. This memory architecture enables the designer to scale the cache and TCM sizes according to the application requirements. The ARM9E-S microarchitecture enables fast interrupt response and context switching. It is very well suited to hosting small, real-time OS (RTOS) such as Wind Rivers VxWorks or Mentor Graphics Nucleus, which are very applicable to CPE designs. The DSP-enhanced instruction set means that some DSP requirements can be implemented directly on the CPU core, without the need for a separate DSP processor. The EDN Embedded Microprocessor Benchmark Consortium (EEMBC www.eembc.org) provides a suite of benchmarks designed to reflect real-world applications for embedded processors. The EEMBC Netmark networking benchmarks focus on routing. They include the Open Shortest Path First (OSPF/ Dijkstra) algorithm, the packet-flow routing benchmark, and the route-look-up algorithm. These benchmarks simulate some of the functions that a processor would perform in a networking application. The benchmark suite includes a compressive routing benchmark, which actually performs packet parsing and routing table lookups for a realistic set of IP frames. This benchmark is particularly effective as it can use more than one size of routing table. This avoids the one-size-fits all approach of synthetic benchmarks. ARM has applied the Netmark benchmarks to the ARM946E-S core, and preliminary results are available from ARM under NDA.

PrimeXsys Dual Core Platform


The PrimeXsys Dual Core Platform (Figure 3) provides an extendable, pre-integrated base level of IP that can support RTOS out-of-the-box. The 946 DCP exploits the multi-layer AMBA on-chip bus architecture that supports multiple bus masters in a full cross-bar AHB bus matrix giving, potentially, very high bandwidths. IP added to the 946 DCP can access the system bus either as a master or slave device. Alternatively, the AMBA Peripheral Bus (APB) offers more power-efficient on-chip connectivity for slower peripherals.

Each core has its own Vectored Interrupt Controller (VIC) and Embedded Trace Macrocell (ETM). The ETM monitors the ARM instruction and data buses at full core speeds, using the MultiTrace analyzer to buffer the collected information before transmission to the Trace Debug Tools. Table 1 provides a summary of the main IP blocks within the 946 DCP.

Core = 160MHz x 2 = 352MIPS 2 x 946E-S 2 x VIC 2 x ETM AHB Peripherals DDR Memory Controller DMA Controller Static Memory Controller AHB Bus Matrix APB Peripherals RTC GPIO Timers Watchdog System control UART SSP 4 8 4 2 5 22 8 70 95 10 20 280 30 74

Total

632 kGates

Table 1. Dual Core Platform Block Gate Count (kGates) The 946 DCP has been designed to be easily extendable by the semiconductor partner or system integrator, enabling differentiated solutions that meet the needs of the target application. As well as providing multiple bus master and slave ports for extending the hardware integration with other IP, PrimeXsys Platforms address the hardware and software development process. Other peripheral IP is of course available through ARMs PrimeCell library. On of the key values of the PrimeXsys Platform will be in setting a new standard in IP beyond the CPU core, which will enable third party partners to provide other differentiated IP solutions in both hardware and software specifically for PrimeXsys platforms. The PrimeXsys Technology Foundation (Figure 3) provides a design sub-system layer, which encompasses the CPU cores and other pre-integrated components essential to providing a stable base design for evolution into an application-specific

platform. The pre-integrated components would include, as a minimum, the additions to the core necessary to enable the operating system to be booted. In the case of a dual core technology foundation, the pre-integrated blocks add the logic necessary to implement functions such as inter-core communication and debugging. Other foundations would incorporate other functions appropriate to creating a functionally useful sub-system architecture, without necessarily constraining the foundation to a particular application-specific purpose. As well as the hardware sub-system, the Technology Foundation includes the software parts of the sub-system not just the pre-ported OS, but also the drivers and libraries necessary to drive any special hardware blocks within the Foundation. ARM PrimeXsys Technology Foundations will serve as the basis for applicationspecific PrimeXsys Platform Solutions. The Dual Core Platform is an attractive target for third parties to develop both hardware and software IP to enable functionality such as TCP/IP (through software protocol stacks), security processing such as encryption/decryption algorithms, private key solutions and many other applications.

Figure 3. ARM PrimeXsys 946 Dual Core Platform

Dual Core Performance


The key parameters that define the performance of a system implementation include the CPU performance, memory bandwidth, system bus bandwidth and interrupt response latency.

The architecture of the PrimeXsys Dual Core Platform has been designed to provide sufficient memory bandwidth for a range of CPE applications. Table 2 shows estimated memory bandwidth for the 946 DCP.
Bus clock (MHz) 75 133 Bandwidth (MB/s) 100 177

Table 2. PrimeXsys 946 DCP Estimated Memory Bandwidth*


*Table 2 Assumes:
32-bit wide SDRAM AHB Access type 70% reads, 30% writes All accesses are word wide 50% four beat bursts, 30% 8 beat bursts, 20% 16 beat bursts Memory Accesses: o 50% bank open, correct page o 30% bank closed o 20% bank open, incorrect page

Peripheral IEEE802.11 PCMCIA Ethernet USB 1v1 USB 2v0

Peak Bandwidth (MB/s) 7 13 25 1.5 60

Average Bandwidth (MB/s) 7 7* 7* 0* 0*

Table 3. Estimated Application Bandwidth Requirements


*Table 3 Conditions: Average bandwidth assumes that PCMCIA and USB interface, or Ethernet and USB interface, not required in parallel

The application bandwidth requirements in Table 3 can provide an estimate for CPE interface combinations. For example a PCMCIA/USB1v1 interface would require a peak bandwidth of 21.5MB/s, whilst an access point application consisting of 802.11, Ethernet and USB2v0 interfaces would need a platform capable of providing a peak memory bandwidth of at least 92MB/s.

PrimeXsys Development
The aim of the PrimeXsys Platform is to deliver far more than just the inherent value within pre-integrated IP. For the system company to fully realize the value within a platform, it must be possible to build a differentiated solution on top of the platform quickly and accurately. To enable this, PrimeXsys systems include a number of testbenches to ensure the development process is accelerated and delivers highquality designs.

The AMBA compliance testbench ensures that IP added to the AMBA bus is connected properly and conforms to the AMBA on-chip bus standard. Further testbenches, currently based on Verisitys testbench automation language E, enable rapid automatic generation of functional tests, data and temporal checking, functional coverage analysis, and HDL simulation control. PrimeXsys testbenches are supplied for system integration and system validation purposes. A software development model (SDM) is also provided. This consists of an instruction set simulator (ISS) model of the cores, plus C models of added IP blocks running within an ARMulator Test Environment. Although this model is capable of running the RTOS, the speed of the simulation makes the ARMulator limited for real application testing. This environment is most useful for verifying register integrity, driver initialization and the OS porting. For even faster system emulation an FPGA-based development board can provide an environment for application development and test, ahead of the delivery of prototype silicon.

Debugging Multi-core Systems


There are significant benefits of having a dual core architecture for certain applications, but traditionally, the difficulty associated with debugging a dual core system has been a barrier to adoption for some design teams. At any time, the RTOS may be executing a number of different threads. Particular applications will be executing within those tasks for example, an encode-decode function, or handling an incoming data packet from a particular channel. If the system is managing two channels, this may result in one or two sets of tasks running simultaneously, either on the same core or across different cores. It is possible that the RTOS may also be running across both cores. Whatever the system configuration, successful debug depends on being able to track the code execution resulting from the event arising from the incoming data packet, the resulting interrupt, context switch and data processing. Attempting to debug code in this situation can be very difficult with separate debug systems running independently on each core. PrimeXsys 946 DCP provides debug integration specifically for complex interdependent multi-threaded applications. The 946 DCP enables cross-triggering of a breakpoint on a core by a number of sources, including a second core and additional IP. A designer can set a trigger point and single-step from the trigger, giving the same ease of debug as a single core system. The ARM RealView Multicore Debugger is a single emulator and debugger providing synchronized debugging for multiple ARM cores.

Summary
SoC implementation for CPE applications needs to ensure an appropriate level of performance, excellent time-to-market with low cost and power to achieve success. ARM is enabling faster time to market, reducing risk and delivering a highly competitive base platform for SoC solutions with the introduction of the ARM PrimeXsys Dual Core Platform. Establishing the foundations of the PrimeXsys Platform as a standard, in the same way that the ARM architecture has been treated, will ensure that ARM partners benefit from continued amortisation of development costs across the ARM partnership, and a continued supply of innovative third party IP for hardware and software.

References
[1] C.A. Eldering, M.L. Sylla, J.A. Eisenach, "Is There a Moore's Law for Bandwidth?" IEEE Communications Magazine, October 1999, Vol. 37 No. 10 [2] George Gilder, author of Telecosm, asserted that bandwidth grows at least three times faster than computer power.

Vous aimerez peut-être aussi