Académique Documents
Professionnel Documents
Culture Documents
(3651/4 x 24)
99.9999% 32 seconds
99.999% 5 minutes, 15 seconds
99.99% 52 minutes, 36 seconds
99.95% 4 Hours, 23 minutes
99.9% 8 Hours, 46 minutes
99.5% 1 day, 19 hours, 48 minutes
99% 3 days, 15 hours, 40 minutes
2. Power supplies.
3. Switch matrix.
4. Any other hardware component that can cause a total failure.
It does not include:
1. Shut-down of the operating system.
2. Loss of electrical power.
3. Network loss.
4. Time out for application software upgrades and fixes (can be 1-3 hours per month).
5. Preventive maintenance (hours per month).
6. The fact that some call servers must be shut down when line cards, trunk cards or gateways are
installed.
7. Complete server shutdown to install operating system changes or new releases.
When those nonincluded factors are tabulated, the downtime can grow to one to three days a
year, in addition to the time for the failures that are included in the definition. Assuming 48
hours (2,880 minutes) downtime a year, then:
The bottom line: The longer the backup power lasts, the less downtime per year; it takes more
than 1 hour of backup power to meet 99.999-percent availability. Note also that the downtime
in Table 2 is the result of power loss plus the software-reboot time (per above, an average of 6
minutes). Therefore:
Raw AC with 6 minutes reboot = 99.96% availability
But note that result assumes that all loss of availability is due to power failure-that there is no
loss of availability due to hardware, software or network failure. Moreover, to meet the five-
nines level, there can be only one power failure of seconds duration combined with only one
software reboot per year.
Nortel recommends an 8-hour battery UPS, but the amount of downtime you can tolerate will
depend on the business, organization and location. For example, the Department of Defense
has a goal for 8 hours of service via battery backup. They have power generators that can
switch-in to replace the battery UPS in seconds, and they may test those generators daily to
make sure they're ready.
Hospitals also need long-term support for the entire facility and all of its normal power users
(lights, etc.). Scott Silliman, director of communications, St. Johns Hospital in Springfield, IL,
installed battery backup on the hospital's servers and routers to avoid the several-minute reboot
time for the server and router software. The PBX has generator backup with an 8-second
startup time for the PBX network and the rest of the hospital, and the generators are tested
every two weeks.
Can VOIP PBXs Meet The Challenge?
There are three approaches to providing IP-PBXs:
IP-enabled PBXs are legacy PBXs, equipped with IP adapters for line and trunk cards. These
offer the reliability and field experience that comes with a mature product. If your existing PBX
delivered five-nines, the addition of an IP line card and trunk line should not reduce the
availability. Its hardware and software are known quantities. The power availability depends on
the UPS and generator investment, not the PBX design. There is no network connecting the
pieces together to reduce availability.
Converged PBXs have both circuit- and packet-switching processors with analog/digital and IP
line cards. The reliability of a converged PBX depends on its design. If there are two processors
and two switch matrices, one circuit and one packet switch, the availability of the system will
rival that of a redundant configuration. If one node fails, the other can still operate. If the
circuit-switch portion is built upon proven technology, it will probably deliver five-nines, as will
the packet-switch hardware.
If, however, both processors and switch matrices are new, then the hardware availability metric
will be a prediction, and software availability will also only be an estimate. The power
availability is the same as a legacy PBX, and no network is involved unless some of the line or
trunk cards are remotely located in gateways.
The underlying network, at best, will deliver 99.9 percent, the figure Sprint's website quotes for
the carrier's frame-relay service level agreement. This means remote devices will deliver less
than 99.9 percent availability
Client/server IP-PBXs are all-packet-switched systems; they come with IP phones but can
also support legacy interfaces. Since these are new systems, metrics for availability can only be
estimated. The hardware can probably deliver five-nines, if the configuration is redundant
(parallel primary/backup devices). As for software, it's hard to know without more field
experience. If there are frequent software releases, reliability, at least in the short term, will not
be great. The power reliability, on the other hand, is the same as all other forms of PBXs.
The client/server version supports remote gateways (IP line cards), and needs a network in
between the gateway and server. This reduces the reliability and therefore the availability of the
PBX features and functions. Dial tone may be provided locally (i.e., in the gateway), but this is
of little or no value if the server is inaccessible. Distributed control (servers) can be an
advantage- with two or more servers to back up each other, one site can fail and the remote
site can take over. But as noted above, the underlying network may not be able to deliver more
than 99.9 percent.
So, can IP-PBXs deliver five-nines? There's no single answer, but given the breakdown of
system components discussed above, the availability of IP-PBXs looks to be something like this:
Hardware: 99.999%
Software: 99.5% (this is really a guess)
Network: 99.9%
Power: 99.98 %
Multiplying them together produces a total availability of 99.38 percent (.9938). Is that enough?
That's for you to decide. But, whatever you do, you want to ensure that your system provides
the highest-level availability possible, given your investment. Here's a checklist to follow:
Have your vendor demonstrate how your hardware configuration meets your availability
requirements. Do not let the vendor give you some general model for the hardware.
Get MTBF and MTTR figures. You want a short MTTR (in minutes, not hours). How is the MTTR
accomplished? Redundant components, fast hardware swap by your automatic switchover?
Discuss your electrical power needs with the local power company, UPS supplier and generator
supplier to determine what is or can be realistically delivered. Exercise the backup power at least
every two weeks to insure proper operation.
Have the PBX vendor demonstrate the method used for determining software reliability. Separate
the demonstration into two parts: operating system and application software. Are the reliability
figures a calculated prediction, field experience or a guess? Find out how the vendor tests new
software for reliability. Was the software testing functional only or was it stressed, for example,
loaded with traffic?
Focus on new software releases. Do you really need to install it? Can installed software be easily
removed? Can software modules be suspended (isolated) so that the PBX can continue operating
when there is a release problem?
Check the service level agreement in your network contract, and verify that the stated availability
is being delivered. What is the network restoration time? Is the local access line (loop) as reliable
as needed? What is the backup procedure and MTTR for the local loop? These become important
with distributed call processing servers.
A good tutorial on this topic is "The Change Costs of System Availability," from Enabling
Technologies Group, Inc. It also discusses the difference between high-availability (HA) and
continuous availability (CA) systems, and the attendant costs and risks.
Are Five-Nines Really Necessary?
While there's no question that high availability is essential in a voice networking system, when
you come right down to it, five-nines may not be a realistic or even necessary goal. An office
that is in operation 12 hours per day, 5 days a week and 52 weeks per year would require its
PBX to be in use 187,200 minutes a year out of a possible 525,960 minutes. That equates to
36.6 percent of the full year.
If any changes, fixes or failures occur outside this time period, none of the users would ever
know about them or be affected, provided the problem is fixed before the next business day
resumes. The fix might be a repair, a reboot or an automatic reconfiguration, and as we all
know, most changes and fixes are made during off-hours for that very reason-to keep everyone
from being affected.
So, the metric of five-nines, in and of itself, seems like an unnecessary goal. My personal
opinion: It's nice to have, but you may not need it.
Reliability Prediction
It takes two years for a new product to generate accurate, performance-based MTBF and MTTR
measurements. Therefore, methods have been created to develop reliability prediction models,
and the two most popular techniques are MIL-HDBK 217 and the Telcordia prediction models.
There are also the mechanical models NSWC-94/L07, CNET 93 and HRD5
MIL-HDBK 217: The original standard for reliability, it was designed by the military but is also
used by commercial organizations. The latest version-Revision F Notice 2-was released in
February 1995. It provides mathematical models for reliability prediction for a huge range of
electronic devices-from phones to space vehicles to satellites, and defines two analysis
techniques: Parts count and parts stress.
Parts count analysis is often used in early product design, when detailed information is not
available or when only a rough estimate is required. Parts stress analysis provides a more
accurate estimate, by taking into account more detailed information about the components that
make up the product.
Telcordia Issue 1: The Telcordia reliability prediction model, developed by Bell Labs, is the
successor to Bellcore Issue 6 and was released in May 2000.
It uses modified equations from MIL HDBK 217 to better reflect what telephone equipment
experiences in the field. Parts count and parts stress analysis are supported, but they are called
Calculation Methods. There are 10 Telcordia Calculation Methods, each of which is designed to
take into consideration different information.
In comparing the two, a large number of factors need to be considered, but since the Telcordia
prediction model was designed for the commercial telecommunications industry, it is the better
method to use for a PBX or IP-based phone system.