Vous êtes sur la page 1sur 6

Voice-over-IP (VoIP) has been a shining star in still a tough communication sector.

Both
cable operators and traditional telecom providers are offering or adding VoIP capabilities to
their networks. And, going forward, they plan to increase their VoIP deployments to end
users.
One of reasons that VoIP has become so successful is through the development of analog
telephone adapter (ATA) boxes that allow consumers to tap VoIP services while still using
their existing telephone sets. While a number of these systems have been deployed, there
is a clear demand at the carrier level to build more ATA systems. The problem, however, is
that these systems are extremely challenging. At the same time, they have to be cheap
enough to be covered by a subsidy at retail or recovered over time from the customer's
monthly bill.
These challenges require designers to re-evaluate the software and hardware design
approaches they are taking during the development of an ATA. In this article, we'll look at
some of the key hardware and software decisions that must be made. We'll also show
potential technology choices that will allow designers to meet the performance and cost
demands of today's operators.
Hardware Design Consideration
Consumer VoIP ATA hardware developers need to balance many factors to produce a costeffective product that delivers toll-quality voice. Core processor selection is the first and
perhaps most critical decision since this device will dictate the bulk of the bill of materials
cost, controlling essential functions including external peripherals, non-volatile and run-time
data storage.
Depending on a designer's familiarity with a particular processor, he or she may choose to
use a familiar device and integrate an external digital signal processor (DSP) for voice
processing operations such as compression and decompression and echo cancellation.
Typically, a 32-bit RISC processor handles network and signaling protocols such as TCP/IP
and the session initiation protocol (SIP). While it is possible to use a RISC-only processor
to handle both voice and control functions, it would likely only support a single voice
channel using a simple codec.
A dual-processor core implementation gives the engineer greater flexibility and the added
benefit of using a single package. An example would be to use a processor with a RISC
core and a DSP core. There are various low-cost devices available on the market and this
type of device currently drives much of the existing consumer VoIP product landscape. A
dual-core environment does require dual-processor software development, debugging and
the management of inter-processor communication. This can add a development burden to
the software front. Most semiconductor vendors have tool solutions to assist with this
process.
Another option is to use a RISC/DSP processor with dual execution units but a single
instruction pipeline. A RISC/DSP solution is capable of supporting both voice processing
and network protocol functions without the need for inter-processor communication. As a
result, inter-processor communication overhead between the RISC and the DSP is
eliminated, which helps ensure good voice quality.
Most voice processing algorithms require the use of multiply-accumulate (MAC) and other
mathematical operations in loops. A full-featured DSP with on-chip memory, single-cycle
MAC instructions, zero-overhead loops, barrel shifters, and modulo addressing improves
system performance significantly. An integrated RISC/DSP processor eliminates the need

and cost of a separate DSP, simplifies the overall design complexity, and can reduce timeto-market.
Peripherals and Memory Matter
On-chip peripherals are also a key to design a cost-effective VoIP ATA product. For
example, having on-chip dual Fast Ethernet MACs eliminates the need for external
Ethernet controllers and ensures the product can scale to multi-port LAN/WAN
configurations.
Most dual MAC devices include an integrated Ethernet bridge that can auto-forward
Ethernet frames without CPU intervention, allowing it to focus on other tasks such as
routing, in the case of a simple device without integrated router or firewall functionality. A
processor running at 200 MIPS or so can also support network routing, network address
translation (NAT) and firewall functions at near wire speed. Business class devices require
strong cryptography and integrated hardware acceleration for symmetric ciphers. As a
result, message digest will become more important over time.
A pulse coder modulation (PCM) interface with transmit and receive FIFOs is also required
to enable the designer to seamlessly connect a variety of audio codecs as well as audio
codec/SLIC combo devices for VoIP adapters.
Like the peripherals, the memory subsystem plays an important role in the VoIP ATA
product design. This subsystem can be broken down into four components: instruction and
data cache, DSP memory, flash, and SDRAM. Large on-chip instruction and data cache(s)
enable the core processor to run at its full speed and a multiway set associative cache can
improve the hit rate. Dedicated on-chip DSP memories help keep program coefficients and
voice sample data on-chip maintaining processing throughput, while external memory
including flash and SDRAM are required for program storage, code execution and run-time
data storage.
A designer should look for devices that provide a glueless interface for external SDRAM
and flash memory devices and keep in mind that processors with a 16-bit fixed-length
instruction set have higher code density than those with a 32-bit or variable length
instruction set. This is important to note since code density ultimately determines the
external memory requirement and better density means cost-savings to OEM customers.
One of the more advanced technologies available to system designers is system-inpackage (SiP)-based products. This is a sophisticated packaging technology that offers
designers an option to stack SDRAM and flash memory with the processor die in a single
BGA package. SiP technology offers various advantages not the least of which is a
reduction in PCB form factor. A decrease in PCB complexity through the elimination of
external memory components and buses improves PCB reliability and ensures the
minimization of EMI and switching noise. The result is a cost-effective design, which
mitigates any concern over memory component availability.
DSP Algorithms
In the ATA design, the DSP's primary task is to process speech codec algorithms.
Compression of voice data is necessary to conserve network bandwidth utilization. For
interoperability reasons a typical VoIP product supports a few common ITU codecs, such
as G.711, G.723.1, G.726, and G.729A. These codecs offer trade-offs among bit-rate,
implementation complexity, and voice quality.
For example a toll-quality codec, G.711 is a simple codec that uses less than 1 MIPS of
DSP, but takes up 64 kbit/s of the network bandwidth, not including the overhead for RTP,

UDP and IP headers. G.723.1 uses only 5.3 or 6.3 kbit/s of network bandwidth plus
overhead and delivers near toll-quality voice, but consumes significantly more DSP
resources (both MIPS and memory). G.726 supports multiple bit-rates (16-40 kbps), and
G.729A supports 8 kbit/s. Both codecs deliver near toll-quality voice, and are not as
demanding on DSP resources (Table 1).

Codec optimization to minimize DSP loading enables the VoIP product to support more
voice channels without using a faster processor or adding another processor. Codec
verification for bit accuracy is required to ensure high voice quality and compliance with the
ITU standards. Designed with a multiple-stage pipeline, the RISC/DSP processor running
at 200 MHz offering 260 MIPS performance can support three channels of voice stream
using a high complexity codec along with a real-time operating system and networking
protocols.
In addition to codecs, line and acoustic echo cancellation algorithms are needed to remove
at least near end echo resulting from the unmatched SLIC hybrid impedance to inexpensive
consumer-grade phones, room or handset feed through echo, and in some cases far end
echo.
OEMs can obtain a license for most codecs from many software vendors for various
platforms. Such license grants the OEM the right to use a specific implementation of a
codec algorithm. However, these codec algorithms have patents owned by many patent
holders. Providing a license and patent indemnification with broad coverage of major
countries in North America, Asia, and Europe will give OEMs the peace of mind to sell VoIP
products worldwide.
Software Considerations
All ATA devices require an operating system (OS). While rolling-your-own or using a
traditional RTOS or simple task switcher is possible be prepared for a long development
and testing cycle and an on-going commitment to maintain your code base. A standards
compliant OS will allow the focus to be kept on product differentiation and your code reuse
for the next generation of product will be significantly higher.
There are various OS options available, but keep in mind the key requirements for any
consumer product: price and price. This usually eliminates any third-party proprietary OSes
such as VxWorks and steers designers in the direction of Linux and other open-source
embedded OSes. These OSes tend to give you complete, standards compliant networking
stacks and broad microcontroller support in a reasonable memory footprint, but configuring
and deploying them can sometimes be anything but off-the-shelf.
An excellent example of the benefit provided by an OS is to look at a relatively minor
product change resulting from the addition of a second Ethernet port to an ATA. While in
hardware this is a relatively minor design change, in software the intended use changes
dramatically from an end-point to a router. Thus, implementing the functionality such as
DHCP server, NAT, PAT, PPPoE, bridging, MAC or IP address cloning will require some
form of QoS and a firewall stack. Here's where a mature OS really wins over rolling-yourown or proprietary OSes where each component represents added cost.

QoS and Latency


Perhaps the most discussed issue in consumer VoIP is quality of service (QoS). While
there are more opinions on implementation strategies than consumer VoIP subscribers it's
important to understand the limitations and intentions of QoS. Ultimately, VoIP service is
susceptible to quality disruptions of various types largely attributable to delay jitter or packet
loss during transmission over public networks. While this is not a concern for data traffic,
the ATA must have an effective jitter buffer, otherwise it will not meet customers' toll quality
expectations. QoS can only help with this, but it can't cure the problem.
QoS is meant to tag "high priority" packet traffic so that it will not be delayed or dropped
due to congestion with lower-priority traffic. Ideally this can be accomplished between the
ATA and broadband connection fairly simply by using a bandwidth reservation protocol
such as RSVP. By requesting a predetermined amount of bandwidth from the broadband
connection the audio stream encounters no bottleneck locally and quality is better
guaranteed but only until it reaches the public Internet.
Another QoS-related consideration is latency associated with packet size. On a 128-kbit
upstream connection of low-quality ADSL, a maximum length Ethernet frame of 1500 bytes
can take almost 2 voice frame times. To address this, ATA devices with integrated NAT and
router functions that are directly connected to a broadband modem can reduce the
maximum segment size (MSS) for the duration of a VoIP call or predictably insert periodic
breaks in outgoing packet stream to allow for the insertion of voice packets regardless of
the data load.
In a similar scenario where the ATA is passing data traffic, terminating or initiating packets
the device can employ traffic shaping. This involves bandwidth limiting particular types of
packets and sending the time critical packets prior to those waiting to be sent. This allows
the dynamic categorization of packets including minimum and maximum bandwidth and
levels of priority for both incoming and outgoing data streams. If packets are lost, the TCP
flow control mechanisms at each end will be triggered to reduce send rates.
Much of the Internet fabric will also respond to the "hints" embedded in the TCP packet
header type of service (ToS) field. The packet classes of differentiated services (DiffServ)
are used to indicate how the packet should be handled, and by honoring and generating
this field, we gain some QoS. Routers will use DiffServ fields to place voice packets in
higher priority queues, ensuring that they receive a higher proportion of the available
bandwidth and experience less delay and loss.
Implementing traffic shaping can be accomplished in any number of ways. The hierarchical
token bucket (HTB) theory of traffic shaping along with other methods helps classify (and
modify) packets into various queues using almost any criteria. HTB uses the concepts of
tokens and buckets along with the class-based system and filters to allow complex and
granular control over traffic.
HTB allows the IP stack to easily and predictably manage the bandwidth that any queue
uses. It allows for minimum and maximum bandwidth allocations. It also allows "lower"
priority queues to temporarily borrow currently unused bandwidth from queues with higher
priority.
One of the major advantages of HTB is that queues are organized in a "tree" where each
classification inherits bandwidth restrictions from its parent node, thus allowing designers to
control traffic in a very granular fashion. This is another advantage to using a mature OS,
rather than custom or a proprietary RTOS.

Firewall and Traversal Techniques


NAT has enabled the Internet to grow beyond what would have been possible with IPv4
because of the limitations of the 32-bit address space. It is also the most challenging issue
to address when trying to reach end user devices behind one or more NAT firewalls.
NATs come in four typical flavors: full-cone, address restricted cone, port restricted cone
and symmetric NAT. There are several methods to traverse these NATs including:
application layer gateways (ALGs), media tunnels, third party proxies, or simple transversal
of UDP through NAT (STUN). Since providing an ALG, tunnel or third-party proxy requires
the co-operation of the premises NAT device or additional equipment, it's highly impractical
for a consumer level deployment and therefore as the ATA vendor, we are on our own to
solve the NAT problem.
STUN is the most deployed option and will traverse most NAT firewalls. STUN works by
using a lightweight UDP protocol and an external STUN server to identify the type of
translation performed by NAT firewall(s). It will then identify specifically the exact translation
the NAT has chosen to do on a particular UDP connection used for RTP or SIP. This
information is gathered without the specific co-operation of the NAT firewall and is then
used to establish the SIP and RTP sessions. While virtually all consumer premises
equipment uses a flavor of cone NAT, in a corporate environment it is more likely to
encounter symmetric NAT. In this case, an ALG or local proxy is unfortunately needed.
Provisioning and Management
There are two strategies employed by VoIP providers to address the configuration of VoIP
ATAs: pre-configuration prior to shipping and auto provisioning typically using TFTP. It's
important to note that much like custom signaling modifications made by carriers to the SIP
standard, every service provider has its own unique provisioning model. Pre-configuration
is impractical from a scalability stand-point so let's identify what is required to remotely
provision a device.
Typically the configuration information would contain the SIP user ID, caller ID, password,
subscribed features and any other account information including perhaps location
information required by E-911. Server information would also need to be discovered, which
includes a SIP PROXY SERVER , firmware upgrade server, media and feature servers.
Parameters and variables will need to be defined including QoS tolerances, firmware
revisions, ring types, timers and counters.
The most sensible implementation is to feed a backend management information base
(MIB) database with parsed values from the incoming configuration file. This database can
then hook the appropriate resources and apply the changes to the device.
Alternately, scripting can be used to accomplish much of the same result, but likely at
portability and extensibility cost. Consumer products and carrier products have differing
management requirements. While a carrier is going to want a device that integrates with
their existing network management systems (NMS), a consumer requires web-based tools
and has little use for SNMP. Again, this is where the database architecture excels, allowing
multiple points of entry while retaining an organized structure.
Robustness and Upgrades
Since the consumer views the ATA as nothing more than an adapter, robustness is critical.
To address this in higher-end embedded devices such as set-top boxes (STBs) designers
can afford the luxury of memory resources capable of storing multiple images. In the event
of failure these devices arbitrate between images and thereby mitigate failure risk. On a
small embedded device data storage accounts for a large portion of the total BOM cost.

To marry these considerations, developers can build in the same robustness by segmenting
flash into functional blocks that can each have redundant images, but not all at once,
reducing the memory overhead of failover from 2x to some lesser factor. By segmenting in
this fashion an extremely low-level arbiter in conjunction with digital signatures using PKI
and a watchdog timer can identify and isolate corruption or unauthorized, possibly
malicious new firmware instead of using the previous known good segment. The device
can be designed to operate normally in all cases of a failed firmware upgrade, meeting the
requirements of Cable Labs, for instance, while eliminating the cost of completely
redundant flash memory. In the event of a catastrophic failure where only initialization code
remains functional the device can be configured to failover into a second 'disaster recovery'
mode where its low-level initialization code will attempt to seek a set of external images.
Segmenting flash has a second advantage more apparent during in-field upgrades. Since
each segmented section of flash can be upgraded independently allowing an upgrade of
required components only, added features or localization are possible without replacing the
complete firmware. This conserves bandwidth and more importantly expedites the push
upgrade of potentially hundreds of thousands of in-field devices.
Wrap Up
As we look forward, the next generation of ATAs will need to carefully address currently
unresolved provisioning and security issues. Designers need to examine the hardware and
software impacts of these requirements and ensure existing platforms provide the
extensibility to future-proof for these considerations. Single function ATAs in the market
today are likely not the long-term solution but they will provide the backbone by which the
first wave of consumer VoIP will be rolled out over. As the market adopts the technology, we
as the designers need to continue innovating the features, services and security that will
ensure the successful long-term viability.
About the Authors
Jeff Dionne is CEO and chief architect of Arcturus Networks. He has over 15 years of
experience in electrical engineering, hardware design and software. Jeff can be reached
atjdionne@arcturus.com.
Brian Davis is director of the advanced solutions group at Renesas Technology America.
Brian has extensive experience working with semiconductor and software solutions for
personal computers, PDAs, smart phones, communication gateways, and other embedded
systems.Brian can be reached atbrian.davis@renesas.com.