Vous êtes sur la page 1sur 34

Sri Vidya College of Engineering & Technology

Voice over Internet Protocol

MCC-NOTES

Basics of IP transport

(Voice over IP, VoIP) is a family of technologies, methodologies, communication protocols, and transmission techniques for the delivery of voice communications and multimedia sessions over Internet Protocol (IP) networks, such as the Internet. Other terms frequently encountered and often used synonymously with VoIP are IP telephony, Internet telephony, voice over broadband (VoBB), broadband telephony, and broadband phone.

Internet telephony refers to communications servicesVoice, fax, SMS, and/or voice- messaging applicationsthat are transported via the Internet, rather than the public switched telephone network (PSTN). The steps involved in originating a VoIP telephone call are signaling and media channel setup, digitization of the analog voice signal, encoding, packetization, and transmission as Internet Protocol (IP) packets over a packet- switched network. On the receiving side, similar steps (usually in the reverse order) such as reception of the IP packets, decoding of the packets and digital-to-analog conversion reproduce the original voice stream. [1] Even though IP Telephony and VoIP are terms that are used interchangeably, they are actually different; IP telephony has to do with digital telephony systems that use IP protocols for voice communication while VoIP is actually a subset of IP Telephony. VoIP is a technology used by IP telephony as a means of transporting phone calls. [2]

VoIP systems employ session control protocols to control the set-up and tear-down of calls as well as audio codecs which encode speech allowing transmission over an IP network as digital audio via an audio stream. The codec used is varied between different implementations of VoIP (and often a range of codecs are used); some implementations rely on narrowband and compressed speech, while others support high fidelity stereo codecs.

There are three types of VoIP tools that are commonly used; IP Phones, Software VoIP and Mobile and Integrated VoIP. The IP Phones are the most institutionally established but still the least obvious of the VoIP tools. The use of software VoIP has increased during the global recession of 2008-2010, as many persons, looking for ways to cut costs have turned to these tools for free or inexpensive calling or video conferencing applications. [citation needed] Software VoIP can be further broken down into three classes or subcategories; Web Calling, Voice and Video Instant Messaging and Web Conferencing. Mobile and Integrated VoIP is just another example of the adaptability of VoIP. VoIP is available on many smartphones and internet devices so even the users of portable devices that are not phones can still make calls or send SMS text messages over 3G or WIFI. [3]

calls or send SMS text messages over 3G or WIFI . [ 3 ] EC2037-MULTIMEDIA COMPRESSION
calls or send SMS text messages over 3G or WIFI . [ 3 ] EC2037-MULTIMEDIA COMPRESSION

Sri Vidya College of Engineering & Technology

MCC-NOTES

Protocols

Voice over IP has been implemented in various ways using both proprietary and open protocols and standards. Examples of technologies used to implement Voice over IP include:

H.323

The H.323 protocol was one of the first VoIP protocols that found widespread implementation for long-distance traffic, as well as local area network services. However, since the development of newer, less complex protocols, such as MGCP and SIP, H.323 deployments are increasingly limited to carrying existing long-haul network traffic. In particular, the Session Initiation Protocol (SIP) has gained widespread VoIP market penetration.

A notable proprietary implementation is the Skype protocol, which is in part based on the principles of Peer-to-Peer (P2P) networking.

Benefits

Operational cost

VoIP can be a benefit for reducing communication and infrastructure costs. Examples include:

Routing phone calls over existing data networks to avoid the need for separate voice and data networks. [12]

Conference calling, IVR, call forwarding, automatic redial, and caller ID features that traditional telecommunication companies (telcos) normally charge extra for, are available free of charge from open source VoIP implementations. [citation needed]

Flexibility

VoIP can facilitate tasks and provide services that may be more difficult to implement using the PSTN. Examples include:

The ability to transmit more than one telephone call over a single broadband connection.

than one telephone call over a single broadband connection. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
than one telephone call over a single broadband connection. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

MCC-NOTES

Secure calls using standardized protocols (such as Secure Real-time Transport Protocol). Most of the difficulties of creating a secure telephone connection over traditional phone lines, such as digitizing and digital transmission, are already in place with VoIP. It is only necessary to encrypt and authenticate the existing data stream.

Location independence. Only a sufficiently fast and stable Internet connection is needed to get a connection from anywhere to a VoIP provider.

Integration with other services available over the Internet, including video conversation, message or data file exchange during the conversation, audio conferencing, managing address books, and passing information about whether other people are available to interested parties.

Unified Communications, the integration of VoIP with other business systems including E-mail, Customer Relationship Management (CRM), and Web systems.

VoIP Challenges

1. Quality of service

Communication on the IP network is inherently less reliable in contrast to the circuit- switched public telephone network, as it does not provide a network-based mechanism to ensure that data packets are not lost, and are delivered in sequential order. It is a best- effort network without fundamental Quality of Service (QoS) guarantees. Therefore, VoIP implementations may face problems mitigating latency and jitter. [13][14]

By default, network routers handle traffic on a first-come, first-served basis. Network routers on high volume traffic links may introduce latency that exceeds permissible thresholds for VoIP. Fixed delays cannot be controlled, as they are caused by the physical distance the packets travel; however, latency can be minimized by marking voice packets

as being delay-sensitive with methods such as DiffServ. [13]

A VoIP packet usually has to wait for the current packet to finish transmission, although

it is possible to preempt (abort) a less important packet in mid-transmission, although this

is not commonly done, especially on high-speed links where transmission times are short

even for maximum-sized packets. [15] An alternative to preemption on slower links, such as dialup and DSL, is to reduce the maximum transmission time by reducing the maximum transmission unit. But every packet must contain protocol headers, so this increases relative header overhead on every link along the user's Internet paths, not just

the bottleneck (usually Internet access) link. [15]

ADSL modems provide Ethernet (or Ethernet over USB) connections to local equipment, but inside they are actually Asynchronous Transfer Mode (ATM) modems. They use ATM Adaptation Layer 5 (AAL5) to segment each Ethernet packet into a series of 53- byte ATM cells for transmission and reassemble them back into Ethernet packets at the receiver. A virtual circuit identifier (VCI) is part of the 5-byte header on every ATM cell, so the transmitter can multiplex the active virtual circuits (VCs) in any arbitrary order. Cells from the same VC are always sent sequentially.

Cells from the same VC are always sent sequentially. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
Cells from the same VC are always sent sequentially. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

MCC-NOTES

However, the great majority of DSL providers use only one VC for each customer, even those with bundled VoIP service. Every Ethernet packet must be completely transmitted before another can begin. If a second PVC were established, given high priority and reserved for VoIP, then a low priority data packet could be suspended in mid- transmission and a VoIP packet sent right away on the high priority VC. Then the link would pick up the low priority VC where it left off. Because ATM links are multiplexed on a cell-by-cell basis, a high priority packet would have to wait at most 53 byte times to begin transmission. There would be no need to reduce the interface MTU and accept the resulting increase in higher layer protocol overhead, and no need to abort a low priority packet and resend it later.

ATM has substantial header overhead: 5/53 = 9.4%, roughly twice the total header overhead of a 1500 byte TCP/IP Ethernet packet (with TCP timestamps). This "ATM tax" is incurred by every DSL user whether or not he takes advantage of multiple virtual circuits - and few can. [13]

ATM's potential for latency reduction is greatest on slow links, because worst-case latency decreases with increasing link speed. A full-size (1500 byte) Ethernet frame takes 94 ms to transmit at 128 kb/s but only 8 ms at 1.5 Mb/s. If this is the bottleneck link, this latency is probably small enough to ensure good VoIP performance without MTU reductions or multiple ATM PVCs. The latest generations of DSL, VDSL and VDSL2, carry Ethernet without intermediate ATM/AAL5 layers, and they generally support IEEE 802.1p priority tagging so that VoIP can be queued ahead of less time-critical traffic. [13]

Voice, and all other data, travels in packets over IP networks with fixed maximum capacity. This system may be more prone to congestion [citation needed] and DoS attacks [16] than traditional circuit switched systems; a circuit switched system of insufficient capacity will refuse new connections while carrying the remainder without impairment, while the quality of real-time data such as telephone conversations on packet-switched networks degrades dramatically. [13]

Fixed delays cannot be controlled as they are caused by the physical distance the packets travel. They are especially problematic when satellite circuits are involved because of the long distance to a geostationary satellite and back; delays of 400600 ms are typical.

When the load on a link grows so quickly that its switches experience queue overflows, congestion results and data packets are lost. This signals a transport protocol like TCP to reduce its transmission rate to alleviate the congestion. But VoIP usually uses UDP not TCP because recovering from congestion through retransmission usually entails too much latency. [13] So QoS mechanisms can avoid the undesirable loss of VoIP packets by immediately transmitting them ahead of any queued bulk traffic on the same link, even when that bulk traffic queue is overflowing.

same link, even when that bulk traffic queue is overflowing. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
same link, even when that bulk traffic queue is overflowing. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

MCC-NOTES

The receiver must resequence IP packets that arrive out of order and recover gracefully when packets arrive too late or not at all. Jitter results from the rapid and random (i.e., unpredictable) changes in queue lengths along a given Internet path due to competition from other users for the same transmission links. VoIP receivers counter jitter by storing incoming packets briefly in a "de-jitter" or "playout" buffer, deliberately increasing latency to improve the chance that each packet will be on hand when it is time for the voice engine to play it. The added delay is thus a compromise between excessive latency and excessive dropout, i.e., momentary audio interruptions.

Although jitter is a random variable, it is the sum of several other random variables that are at least somewhat independent: the individual queuing delays of the routers along the

Internet path in question. Thus according to the central limit theorem, we can model jitter

as a gaussian random variable. This suggests continually estimating the mean delay and

its standard deviation and setting the playout delay so that only packets delayed more than several standard deviations above the mean will arrive too late to be useful. In practice, however, the variance in latency of many Internet paths is dominated by a small number (often one) of relatively slow and congested "bottleneck" links. Most Internet backbone links are now so fast (e.g. 10 Gb/s) that their delays are dominated by the transmission medium (e.g. optical fiber) and the routers driving them do not have enough buffering for queuing delays to be significant.

It has been suggested to rely on the packetized nature of media in VoIP communications

and transmit the stream of packets from the source phone to the destination phone simultaneously across different routes (multi-path routing). [17] In such a way, temporary failures have less impact on the communication quality. In capillary routing it has been

suggested to use at the packet level Fountain codes or particularly raptor codes for transmitting extra redundant packets making the communication more reliable. [citation

A number of protocols have been defined to support the reporting of QoS/QoE for VoIP

calls. These include RTCP Extended Report (RFC 3611), SIP RTCP Summary Reports, H.460.9 Annex B (for H.323), H.248.30 and MGCP extensions. The RFC 3611 VoIP Metrics block is generated by an IP phone or gateway during a live call and contains information on packet loss rate, packet discard rate (because of jitter), packet loss/discard burst metrics (burst length/density, gap length/density), network delay, end system delay, signal / noise / echo level, Mean Opinion Scores (MOS) and R factors and configuration information related to the jitter buffer.

RFC 3611 VoIP metrics reports are exchanged between IP endpoints on an occasional basis during a call, and an end of call message sent via SIP RTCP Summary Report or one of the other signaling protocol extensions. RFC 3611 VoIP metrics reports are intended to support real time feedback related to QoS problems, the exchange of information between the endpoints for improved call quality calculation and a variety of other applications.

quality calculation and a variety of other applications. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
quality calculation and a variety of other applications. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

Layer-2 quality of service

MCC-NOTES

A number of protocols that deal with the data link layer and physical layer include quality-of-service mechanisms that can be used to ensure that applications like VoIP work well even in congested scenarios. Some examples include:

IEEE 802.11e is an approved amendment to the IEEE 802.11 standard that defines a set of quality-of-service enhancements for wireless LAN applications through modifications to the Media Access Control (MAC) layer. The standard is considered of critical importance for delay-sensitive applications, such as Voice over Wireless IP.

IEEE 802.1p defines 8 different classes of service (including one dedicated to voice) for traffic on layer-2 wired Ethernet.

The ITU-T G.hn standard, which provides a way to create a high-speed (up to 1 gigabit per second) Local area network using existing home wiring (power lines, phone lines and coaxial cables). G.hn provides QoS by means of "Contention-Free Transmission Opportunities" (CFTXOPs) which are allocated to flows (such as a VoIP call) which require QoS and which have negotiated a "contract" with the network controller.

2. Susceptibility to power failure

Telephones for traditional residential analog service are usually connected directly to telephone company phone lines which provide direct current to power most basic analog handsets independently of locally available power.

IP Phones and VoIP telephone adapters connect to routers or cable modems which typically depend on the availability of mains electricity or locally generated power. [18] Some VoIP service providers use customer premise equipment (e.g., cablemodems) with battery-backed power supplies to assure uninterrupted service for up to several hours in case of local power failures. Such battery-backed devices typically are designed for use with analog handsets.

Some VoIP service providers implement services to route calls to other telephone services of the subscriber, such a cellular phone, in the event that the customer's network device is inaccessible to terminate the call.

The susceptibility of phone service to power failures is a common problem even with traditional analog service in areas where many customers purchase modern telephone units that operate with wireless handsets to a base station, or that have other modern phone features, such as built-in voicemail or phone book features.

features, such as built-in voicemail or phone book features. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
features, such as built-in voicemail or phone book features. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

3. Emergency calls

MCC-NOTES

The nature of IP makes it difficult to locate network users geographically. Emergency calls, therefore, cannot easily be routed to a nearby call center. Sometimes, VoIP systems may route emergency calls to a non-emergency phone line at the intended department; in the United States, at least one major police department has strongly objected to this practice as potentially endangering the public. [19][20]

A fixed line phone has a direct relationship between a telephone number and a physical

location. If an emergency call comes from that number, then the physical location is known.

In the IP world, it is not so simple. A broadband provider may know the location where

the wires terminate, but this does not necessarily allow the mapping of an IP address to that location. [citation needed] IP addresses are often dynamically assigned, so the ISP may

allocate an address for online access, or at the time a broadband router is engaged. The ISP recognizes individual IP addresses, but does not necessarily know to which physical location it corresponds. [citation needed] The broadband service provider knows the physical location, but is not necessarily tracking the IP addresses in use. [20]

There are more complications since IP allows a great deal of mobility. For example, a broadband connection can be used to dial a virtual private network that is employer- owned. When this is done, the IP address being used will belong to the range of the employer, rather than the address of the ISP, so this could be many kilometres away or even in another country. To provide another example: if mobile data is used, e.g., a 3G mobile handset or USB wireless broadband adapter, then the IP address has no

relationship with any physical location, since a mobile user could be anywhere that there

is

network coverage, even roaming via another cellular company.

In

short, there is no relationship between IP address and physical location, so the address

itself reveals no useful information for the emergency services.

At the VoIP level, a phone or gateway may identify itself with a SIP registrar by using a username and password. So in this case, the Internet Telephony Service Provider (ITSP) knows that a particular user is online, and can relate a specific telephone number to the user. However, it does not recognize how that IP traffic was engaged. Since the IP address itself does not necessarily provide location information presently, today a "best efforts" approach is to use an available database to find that user and the physical address the user chose to associate with that telephone numberclearly an imperfect solution. [20]

number — clearly an imperfect solution . [ 2 0 ] EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
number — clearly an imperfect solution . [ 2 0 ] EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

MCC-NOTES

VoIP Enhanced 911 (E911) is a method by which VoIP providers in the United States support emergency services. The VoIP E911 emergency-calling system associates a physical address with the calling party's telephone number as required by the Wireless Communications and Public Safety Act of 1999. All VoIP providers that provide access to the public switched telephone network are required to implement E911, [20] a service for which the subscriber may be charged. Participation in E911 is not required and customers may opt-out of E911 service. [20]

One shortcoming of VoIP E911 is that the emergency system is based on a static table lookup. Unlike in cellular phones, where the location of an E911 call can be traced using Assisted GPS or other methods, the VoIP E911 information is only accurate so long as subscribers are diligent in keeping their emergency address information up-to-date. In the United States, the Wireless Communications and Public Safety Act of 1999 leaves the burden of responsibility upon the subscribers and not the service providers to keep their emergency information up to date. [20]

Lack of redundancy

With the current separation of the Internet and the PSTN, a certain amount of redundancy is provided. An Internet outage does not necessarily mean that a voice communication outage will occur simultaneously, allowing individuals to call for emergency services and many businesses to continue to operate normally. In situations where telephone services become completely reliant on the Internet infrastructure, a single-point failure can isolate communities from all communication, including Enhanced 911 and equivalent services in other locales. [original research?] However, the internet as designed by DARPA in the early 1980s was specifically designed to be fault tolerant under adverse conditions. Even during the 9/11 attacks on the World Trade Centers the internet routed data around the failed nodes that were housed in or near the towers. So single point failures while possible in some geographic areas are not the norm for the internet as a whole.

Number portability

Local number portability (LNP) and Mobile number portability (MNP) also impact VoIP business. In November 2007, the Federal Communications Commission in the United States released an order extending number portability obligations to interconnected VoIP providers and carriers that support VoIP providers. [21] Number portability is a service that allows a subscriber to select a new telephone carrier without requiring a new number to be issued. Typically, it is the responsibility of the former carrier to "map" the old number to the undisclosed number assigned by the new carrier. This is achieved by maintaining a database of numbers. A dialed number is initially received by the original carrier and quickly rerouted to the new carrier. Multiple porting references must be maintained even if the subscriber returns to the original carrier. The FCC mandates carrier compliance with these consumer-protection stipulations.

compliance with these consumer-protection stipulations. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
compliance with these consumer-protection stipulations. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

MCC-NOTES

A voice call originating in the VoIP environment also faces challenges to reach its destination if the number is routed to a mobile phone number on a traditional mobile carrier. VoIP has been identified in the past as a Least Cost Routing (LCR) system, which is based on checking the destination of each telephone call as it is made, and then sending the call via the network that will cost the customer the least. [22] This rating is subject to some debate given the complexity of call routing created by number portability. With GSM number portability now in place, LCR providers can no longer rely on using the network root prefix to determine how to route a call. Instead, they must now determine the actual network of every number before routing the call.

Therefore, VoIP solutions also need to handle MNP when routing a voice call. In countries without a central database, like the UK, it might be necessary to query the GSM network about which home network a mobile phone number belongs to. As the popularity of VoIP increases in the enterprise markets because of least cost routing options, it needs to provide a certain level of reliability when handling calls.

MNP checks are important to assure that this quality of service is met. By handling MNP lookups before routing a call and by assuring that the voice call will actually work, VoIP service providers are able to offer business subscribers the level of reliability they require.

PSTN integration

E.164 is a global FGFnumbering standard for both the PSTN and PLMN. Most VoIP implementations support E.164 to allow calls to be routed to and from VoIP subscribers and the PSTN/PLMN. [23] VoIP implementations can also allow other identification

techniques to be used. For example, Skype allows subscribers to choose "Skype names" [24] (usernames) whereas SIP implementations can use URIs [25] similar to email

non-E.164

identifiers to E.164 numbers and vice-versa, such as the Skype-In service provided by Skype [26] and the ENUM service in IMS and SIP. [27]

Often

VoIP

implementations

employ methods

of

translating

Echo can also be an issue for PSTN integration. [28] Common causes of echo include impedance mismatches in analog circuitry and acoustic coupling of the transmit and receive signal at the receiving end.

Security

VoIP telephone systems are susceptible to attacks as are any internet-connected devices. This means that hackers who know about these vulnerabilities (such as insecure

record

passwords)

conversations and break into voice mailboxes. [29][30][31]

can

institute

attacks,

harvest

customer

data,

institute denial-of-service attacks, harvest customer data, EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
institute denial-of-service attacks, harvest customer data, EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

 

MCC-NOTES

Another

challenge

is

routing

VoIP

traffic

through

and

translators. Private Session Border Controllers are used along with firewalls to enable VoIP calls to and from protected networks. For example, Skype uses a proprietary protocol to route calls through other Skype peers on the network, allowing it to traverse symmetric NATs and firewalls. Other methods to traverse NATs involve using protocols such as STUN or Interactive Connectivity Establishment (ICE).

Many consumer VoIP solutions do not support encryption, although having a secure phone is much easier to implement with VoIP than traditional phone lines. As a result, it is relatively easy to eavesdrop on VoIP calls and even change their content. [32] An attacker with a packet sniffer could intercept your VoIP calls if you are not on a secure VLAN. However, physical security of the switches within an enterprise and the facility security provided by ISPs make packet capture less of a problem than originally foreseen. Further research has shown that tapping into a fiber optic network without detection is difficult if not impossible. This means that once a voice packet is within the internet backbone it is relatively safe from interception.

There are open source solutions, such as Wireshark, that facilitate sniffing of VoIP conversations. A modicum of security is afforded by patented audio codecs in proprietary implementations that are not easily available for open source applications [citation needed] ; however, such security through obscurity has not proven effective in other fields. [citation needed] Some vendors also use compression, which may make eavesdropping more difficult. [citation needed] However, real security requires encryption and cryptographic authentication which are not widely supported at a consumer level. The existing security standard Secure Real-time Transport Protocol (SRTP) and the new ZRTP protocol are available on Analog Telephone Adapters (ATAs) as well as various softphones. It is possible to use IPsec to secure P2P VoIP by using opportunistic encryption. Skype does not use SRTP, but uses encryption which is transparent to the Skype provider [citation needed] . In 2005, Skype invited a researcher, Dr Tom Berson, to assess the security of the Skype software, and his conclusions are available in a published report. [33]

Securing VoIP

To prevent the above security concerns government and military organizations are using Voice over Secure IP (VoSIP), Secure Voice over IP (SVoIP), and Secure Voice over Secure IP (SVoSIP) to protect confidential and classified VoIP communications. [34] Secure Voice over IP is accomplished by encrypting VoIP with Type 1 encryption. Secure Voice over Secure IP is accomplished by using Type 1 encryption on a classified network, like SIPRNet. [35][36][37][38][39] Public Secure VoIP is also available with free GNU programs. [40]

VoIP is also available with free GNU programs . [ 4 0 ] EC2037-MULTIMEDIA COMPRESSION AND
VoIP is also available with free GNU programs . [ 4 0 ] EC2037-MULTIMEDIA COMPRESSION AND

Sri Vidya College of Engineering & Technology

Caller ID

MCC-NOTES

Caller ID support among VoIP providers varies, but is provided by the majority of VoIP providers.

Many VoIP carriers allow callers to configure arbitrary Caller ID information, thus permitting spoofing attacks. [41] Business grade VoIP equipment and software often makes it easy to modify caller ID information, providing many businesses great flexibility.

The Truth in Caller ID Act has been in preparation in the US Congress since 2006, but as of January 2009 still has not been enacted. This bill proposes to make it a crime in the United States to "knowingly transmit misleading or inaccurate caller identification information with the intent to defraud, cause harm, or wrongfully obtain anything of "

value

Compatibility with traditional analog telephone sets

Some analog telephone adapters do not decode pulse dialing from older phones. They may only work with push-button telephones using the touch-tone system. The VoIP user may use a pulse-to-tone converter, if needed. [43]

Fax handling

Support for sending faxes over VoIP implementations is still limited. The existing voice codecs are not designed for fax transmission; they are designed to digitize an analog representation of a human voice efficiently. However, the inefficiency of digitizing an analog representation (modem signal) of a digital representation (a document image) of analog data (an original document) more than negates any bandwidth advantage of VoIP. In other words, the fax "sounds" simply do not fit in the VoIP channel. An alternative IP- based solution for delivering fax-over-IP called T.38 is available.

The T.38 protocol is designed to compensate for the differences between traditional packet-less communications over analog lines and packet based transmissions which are the basis for IP communications. The fax machine could be a traditional fax machine connected to the PSTN, or an ATA box (or similar). It could be a fax machine with an RJ-45 connector plugged straight into an IP network, or it could be a computer pretending to be a fax machine. [44] Originally, T.38 was designed to use UDP and TCP transmission methods across an IP network. TCP is better suited for use between two IP devices. However, older fax machines, connected to an analog system, benefit from UDP near real-time characteristics due to the "no recovery rule" when a UDP packet is lost or an error occurs during transmission. [45] UDP transmissions are preferred as they do not require testing for dropped packets and as such since each T.38 packet transmission includes a majority of the data sent in the prior packet, a T.38 termination point has a higher degree of success in re-assembling the fax transmission back into its original form

the fax transmission back into its original form EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
the fax transmission back into its original form EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

MCC-NOTES

for interpretation by the end device. This in an attempt to overcome the obstacles of simulating real time transmissions using packet based protocol. [46]

There have been updated versions of T.30 to resolve the fax over IP issues, which is the core fax protocol. Some newer high end fax machines have T.38 built-in capabilities which allow the user to plug right into the network and transmit/receive faxes in native T.38 like the Ricoh 4410NF Fax Machine. [47] A unique feature of T.38 is that each packet contains a portion of the main data sent in the previous packet. With T.38, two successive lost packets are needed to actually lose any data. The data you lose will only be a small piece, but with the right settings and error correction mode, there is an increased likelihood that you will receive enough of the transmission to satisfy the requirements of the fax machine for output of the sent document.

Support for other telephony devices

Another challenge for VoIP implementations is the proper handling of outgoing calls from other telephony devices such as digital video recorders, satellite television receivers, alarm systems, conventional modems and other similar devices that depend on access to a PSTN telephone line for some or all of their functionality.

These types of calls sometimes complete without any problems, but in other cases they fail. If VoIP and cellular substitution becomes very popular, some ancillary equipment makers may be forced to redesign equipment, because it would no longer be possible to assume a conventional PSTN telephone line would be available in consumer's homes.

telephone line would be available in consumer's homes. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
telephone line would be available in consumer's homes. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

MCC-NOTES

H.323 is a recommendation from the ITU Telecommunication Standardization Sector (ITU-T) that defines the protocols to provide audio-visual communication sessions on

any packet network. The H.323 standard addresses call signaling and control, multimedia

transport

and

control,

and

bandwidth

control

for

point-to-point

and

multi-point

conferences. [1]

 

It is widely implemented by voice and videoconferencing equipment manufacturers, is used within various Internet real-time applications such as GnuGK and NetMeeting and is widely deployed worldwide by service providers and enterprises for both voice and video services over IP networks.

It is a part of the ITU-T H.32x series of protocols, which also address multimedia communications over ISDN, the PSTN or SS7, and 3G mobile networks.

H.323 call signaling is based on the ITU-T Recommendation Q.931 protocol and is suited for transmitting calls across networks using a mixture of IP, PSTN, ISDN, and QSIG over ISDN. A call model, similar to the ISDN call model, eases the introduction of IP telephony into existing networks of ISDN-based PBX systems, including transitions to IP-based PBXs.

Within the context of H.323, an IP-based PBX might be a gatekeeper or other call control element which provides service to telephones or videophones. Such a device may provide or facilitate both basic services and supplementary services, such as call transfer, park, pick-up, and hold.

While H.323 excels at providing basic telephony functionality and interoperability, H.323‘s strength lies in multimedia communication functionality designed specifically for IP networks.

H.323 was the first VoIP standard to adopt the Internet Engineering Task Force (IETF) standard Real-time Transport Protocol (RTP) to transport audio and video over IP

networks. [citation needed]

Protocols

H.323 is a system specification that describes the use of several ITU-T and IETF protocols. The protocols that comprise the core of almost any H.323 system are: [6]

H.225.0 Registration, Admission and Status (RAS), which is used between an H.323 endpoint and a Gatekeeper to provide address resolution and admission control services.

H.225.0 Call Signaling, which is used between any two H.323 entities in order to establish communication.

any two H.323 entities in order to establish communication. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
any two H.323 entities in order to establish communication. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

MCC-NOTES

H.245 control protocol for multimedia communication, which describes the messages and procedures used for capability exchange, opening and closing logical channels for audio, video and data, control and indications.

Real-time Transport Protocol (RTP), which is used for sending or receiving multimedia information (voice, video, or text) between any two entities.

Many H.323 systems also implement other protocols that are defined in various ITU-T

other

functionality to the user. Some of those Recommendations are: [citation needed]

to

provide

supplementary

services

support

or

deliver

H.235 series describes security within H.323, including security for both signaling and media.

H.239 describes dual stream use in videoconferencing, usually one for live video, the other for still images.

H.450 series describes various supplementary services.

H.460 series defines optional extensions that might be implemented by an endpoint or a Gatekeeper, including ITU-T Recommendations H.460.17, H.460.18, and H.460.19 for Network address translation (NAT) / Firewall (FW) traversal.

In addition to those ITU-T Recommendations, H.323 implements various IETF Request for Comments (RFCs) for media transport and media packetization, including the Real- time Transport Protocol (RTP).

Codecs

H.323 utilizes both ITU-defined codecs and codecs defined outside the ITU. Codecs that are widely implemented by H.323 equipment include:

Text codecs: T.140

All H.323 terminals providing video communications shall be capable of encoding and decoding video according to H.261 QCIF. All H.323 terminals shall have an audio codec and shall be capable of encoding and decoding speech according to ITU-T Rec. G.711. All terminals shall be capable of transmitting and receiving A-law and μ-law. Support for other audio and video codecs is optional. [5]

for other audio and video codecs is optional . [ 5 ] EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION
for other audio and video codecs is optional . [ 5 ] EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION

Sri Vidya College of Engineering & Technology

MCC-NOTES

H.323 Architecture

The H.323 system defines several network elements that work together in order to deliver rich multimedia communication capabilities. Those elements are Terminals, Multipoint Control Units (MCUs), Gateways, Gatekeepers, and Border Elements. Collectively, terminals, multipoint control units and gateways are often referred to as endpoints.

While not all elements are required, at least two terminals are required in order to enable communication between two people. In most H.323 deployments, a gatekeeper is employed in order to, among other things, facilitate address resolution.

H.323 Network Elements

Terminals

address resolution. H.323 Network Elements Terminals Figure 1 - A complete, sophisticated protocol stack

Figure 1 - A complete, sophisticated protocol stack

Terminals in an H.323 network are the most fundamental elements in any H.323 system, as those are the devices that users would normally encounter. They might exist in the form of a simple IP phone or a powerful high-definition videoconferencing system.

Inside an H.323 terminal is something referred to as a Protocol stack, which implements the functionality defined by the H.323 system. The protocol stack would include an implementation of the basic protocol defined in ITU-T Recommendation H.225.0 and H.245, as well as RTP or other protocols described above.

H.245, as well as RTP or other protocols described above. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
H.245, as well as RTP or other protocols described above. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

MCC-NOTES

The diagram, figure 1, depicts a complete, sophisticated stack that provides support for voice, video, and various forms of data communication. In reality, most H.323 systems

do not implement such a wide array of capabilities, but the logical arrangement is useful

in understanding the relationships.

Multipoint Control Units

A Multipoint Control Unit (MCU) is responsible for managing multipoint conferences

and is composed of two logical entities referred to as the Multipoint Controller (MC) and

the Multipoint Processor (MP). In more practical terms, an MCU is a conference bridge not unlike the conference bridges used in the PSTN today. The most significant

difference, however, is that H.323 MCUs might be capable of mixing or switching video,

in addition to the normal audio mixing done by a traditional conference bridge. Some

MCUs also provide multipoint data collaboration capabilities. What this means to the end

user is that, by placing a video call into an H.323 MCU, the user might be able to see all

of the other participants in the conference, not only hear their voices.

Gateways

Gateways are devices that enable communication between H.323 networks and other

networks, such as PSTN or ISDN networks. If one party in a conversation is utilizing a terminal that is not an H.323 terminal, then the call must pass through a gateway in order

to enable both parties to communicate.

Gateways are widely used today in order to enable the legacy PSTN phones to interconnect with the large, international H.323 networks that are presently deployed by services providers. Gateways are also used within the enterprise in order to enable enterprise IP phones to communicate through the service provider to users on the PSTN.

Gateways are also used in order to enable videoconferencing devices based on H.320 and H.324 to communicate with H.323 systems. Most of the third generation (3G) mobile networks deployed today utilize the H.324 protocol and are able to communicate with H.323-based terminals in corporate networks through such gateway devices.

Gatekeepers

A Gatekeeper is an optional component in the H.323 network that provides a number of

services to terminals, gateways, and MCU devices. Those services include endpoint registration, address resolution, admission control, user authentication, and so forth. Of the various functions performed by the gatekeeper, address resolution is the most important as it enables two endpoints to contact each other without either endpoint having to know the IP address of the other endpoint.

having to know the IP address of the other endpoint. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
having to know the IP address of the other endpoint. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

MCC-NOTES

Gatekeepers may be designed to operate in one of two signaling modes, namely "direct routed" and "gatekeeper routed" mode. Direct routed mode is the most efficient and most widely deployed mode. In this mode, endpoints utilize the RAS protocol in order to learn the IP address of the remote endpoint and a call is established directly with the remote device. In the gatekeeper routed mode, call signaling always passes through the gatekeeper. While the latter requires the gatekeeper to have more processing power, it also gives the gatekeeper complete control over the call and the ability to provide supplementary services on behalf of the endpoints.

H.323 endpoints use the RAS protocol to communicate with a gatekeeper. Likewise, gatekeepers use RAS to communicate with other gatekeepers.

A collection of endpoints that are registered to a single Gatekeeper in H.323 is referred to as a ―zone‖. This collection of devices does not necessarily have to have an associated physical topology. Rather, a zone may be entirely logical and is arbitrarily defined by the network administrator.

Gatekeepers have the ability to neighbor together so that call resolution can happen between zones. Neighboring facilitates the use of dial plans such as the Global Dialing Scheme. Dial plans facilitate ―inter-zone‖ dialing so that two endpoints in separate zones can still communicate with each other.

Border Elements and Peer Elements

with each other. Border Elements and Peer Elements Figure 2 - An illustration of an administrative

Figure 2 - An illustration of an administrative domain with border elements, peer elements, and gatekeepers

domain with border elements, peer elements, and gatekeepers EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
domain with border elements, peer elements, and gatekeepers EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

MCC-NOTES

Border Elements and Peer Elements are optional entities similar to a Gatekeeper, but that do not manage endpoints directly and provide some services that are not described in the RAS protocol. The role of a border or peer element is understood via the definition of an "administrative domain".

An administrative domain is the collection of all zones that are under the control of a single person or organization, such as a service provider. Within a service provider network there may be hundreds or thousands of gateway devices, telephones, video terminals, or other H.323 network elements. The service provider might arrange devices into "zones" that enable the service provider to best manage all of the devices under its control, such as logical arrangement by city. Taken together, all of the zones within the service provider network would appear to another service provider as an "administrative domain".

The border element is a signaling entity that generally sits at the edge of the administrative domain and communicates with another administrative domain. This communication might include such things as access authorization information; call pricing information; or other important data necessary to enable communication between the two administrative domains.

Peer elements are entities within the administrative domain that, more or less, help to propagate information learned from the border elements throughout the administrative domain. Such architecture is intended to enable large-scale deployments within carrier networks and to enable services such as clearing houses.

The diagram, figure 2, provides an illustration of an administrative domain with border elements, peer elements, and gatekeepers

H.323 and Voice over IP services

Voice over Internet Protocol (VoIP) describes the transmission of voice using the Internet or other packet switched networks. ITU-T Recommendation H.323 is one of the standards used in VoIP. VoIP requires a connection to the Internet or another packet switched network, a subscription to a VoIP service provider and a client (an analogue telephone adapter (ATA), VoIP Phone or "soft phone"). The service provider offers the connection to other VoIP services or to the PSTN. Most service providers charge a monthly fee, then additional costs when calls are made. [citation needed] Using VoIP between two enterprise locations would not necessarily require a VoIP service provider, for example. H.323 has been widely deployed by companies who wish to interconnect remote locations over IP using a number of various wired and wireless technologies.

using a number of various wired and wireless technologies. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
using a number of various wired and wireless technologies. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

MCC-NOTES

A codec is an algorithm (OK let‘s be simple – sort of a program!), most of the time installed as a software on a server or embedded within a piece of hardware (ATA, IP Phone etc.), that is used to convert voice (in the case of VoIP) signals into digital data to be transmitted over the Internet or any network during a VoIP call.

The word codec comes from the composed words coder-decoder or compressor- decompressor. Codecs normally achieve the following three tasks (very few do the last one):

Encoding decoding

Compression decompression

Encryption - Decryption

Encoding - decoding

When you talk over normal PSTN phone, your voice is transported in an analog way over the phone line. But with VoIP, your voice is converted into digital signals. This conversion is technically called encoding, and is achieved by a codec. When the digitized voice reaches its destination, it has to be decoded back to its original analog state so that the other correspondent can hear and understand it.

Compression decompression

Bandwidth is a scarce commodity. Therefore, if the data to be sent is made lighter, you can send more in a certain amount of time, and thus improve performance. To make the digitized voice less bulky, it is compressed. Compression is a complex process whereby the same data is stored but using lesser space (digital bits). During compression, the data is confined to a structure (packet) proper to the compression algorithm. The compressed data is sent over the network and once it reaches its destination, it is decompressed back to it original state before being decoded. In most cases, however, it is not necessary to decompress the data back, since the compressed data is already in a ‗consumable‘ state.

Types of compression

When data is compressed, it becomes lighter and hence performance is improved. However, it tends to be that the best compression algorithms decrease the quality of the compressed data. There are two types of compression: lossless and lossy. With lossless compression, you lose nothing, but you can‘t compress that much. With lossy compression, you achieve great downsizing, but you lose in quality. You normally can‘t get the compressed data back to its original state with lossy compression, since the quality had been sacrificed for size. But this is most of the time not necessary.

for size. But this is most of the time not necessary. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
for size. But this is most of the time not necessary. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

MCC-NOTES

A good example of lossy compression is MP3 for audio. When you compress to audio,

you can‘t compress back, you MP3 audio is already very good to listen to, compared to huge pure audio files.

Encryption decryption

Encryption is one of the best tools for achieving security. It is the process of changing

data into such a state that it no one can understand. This way, even if the encrypted data

is intercepted by unauthorized people, the data still remains confidential. Once the

encrypted data reaches destination, it is decrypted back to its original form. Often, when data is compressed, it already is encrypted to a certain extent, since it is altered from its original state.

There are many codecs for audio, video, fax and text. Below is a list of the most common codecs for VoIP. As a user, you may think that you have little to do with what these are, but it is always good to know a minimum about these, since you might have to make decisions one day relating codecs concerning VoIP in your business; or at least might one day understand some words in the Greek VoIP people speak! I won‘t drag you into all the technicalities of codecs, but will just mention them.

If you are a techie and want to know more about each one of these codecs in detail, have

Common VoIP Codecs

Codec

Bandwidth/kbps Comments

G.711

64

Delivers precise speech transmission. Very low processor

G.722

48/56/64

requirements. Needs at least 128 kbps for two-way. Adapts to varying compressions and bandwidth is conserved with network congestion.

G.723.1 5.3/6.3

High compression with high quality audio. Can use with dial- up. Lot of processor power.

G.726

16/24/32/40

An improved version of G.721 and G.723 (different from

 

G.723.1)

G.729

8

Excellent

bandwidth

utilization.

Error

tolerant.

License

GSM

13

required. High compression ratio. Free and available in many hardware and software platforms. Same encoding is used in GSM cellphones (improved versions are often used nowadays).

iLBC

15

Robust to packet loss. Free

 

Speex

2.15 / 44

Minimizes bandwidth usage by using variable bit rate.

44 Minimizes bandwidth usage by using variable bit rate. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
44 Minimizes bandwidth usage by using variable bit rate. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

SIP ARCHITECTURE

MCC-NOTES

Introduction As the Internet became more popular in the 1990s, network programs that allowed communication with other Internet users also became more common. Over the years, a need was seen for a standard protocol that could allow participants in a chat, videoconference, interactive gaming, or other media to initiate user sessions with one another. In other words, a standard set of rules and services was needed that defined how computers would connect to one another so that they could share media and communicate. The Session Initiation Protocol (SIP) was developed to set up, maintain, and tear down these sessions between computers.

By working in conjunction with a variety of other protocols and specialized servers, SIP provides a number of important functions that are necessary in allowing communications between participants. SIP provides methods of sharing the location and availability of users and explains the capabilities of the software or device being used. SIP then makes it possible to set up and manage the session between the parties. Without these tasks being performed, communication over a large network like the Internet would be impossible. It would be like a message in a bottle being thrown in the ocean; you would have no way of knowing how to reach someone directly or whether the person even could receive the message.

Beyond communicating with voice and video, SIP has also been extended to support instant messaging and is becoming a popular choice that‘s incorporated in many of the instant messaging applications being produced. This extension, called SIMPLE, provides the means of setting up a session in much the same way as SIP. SIMPLE also provides information on the status of users, showing whether they are online, busy, or in some other state of presence.

Because SIP is being used in these various methods of communications, it has become a widely used and important component of today‘s communications.

Understanding SIP

SIP was designed to initiate interactive sessions on an IP network. Programs that provide real-time communication between participants can use SIP to set up, modify, and terminate a connection between two or more computers, allowing them to interact and exchange data. The programs that can use SIP include instant messaging, voice over IP (VoIP), video teleconferencing, virtual reality, multiplayer games, and other applications that employ single media or multimedia. SIP doesn‘t provide all the functions that enable these programs to communicate, but it is an important component that facilitates communication between two or more endpoints.

facilitates communication between two or more endpoints. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
facilitates communication between two or more endpoints. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

MCC-NOTES

You could compare SIP to a telephone switchboard operator, who uses other technology to connect you to another party, set up conference calls or other operations on your behalf, and disconnect you when you‘re done. SIP is a type of signaling protocol that is responsible for sending commands to start and stop transmissions or other operations used by a program. The commands sent between computers are codes that do such things as open a connection to make a phone call over the Internet or disconnect that call later on. SIP supports additional functions, such as call waiting, call transfer, and conference calling, by sending out the necessary signals to enable and disable these functions. Just as the telephone operator isn‘t concerned with how communication occurs, SIP works with a number of components and can run on top of several different transport protocols to transfer media between the participants.

Overview of SIP

One of the major reasons that SIP is necessary is found in the nature of programs that involve messaging, voice communication, and exchange of other media. The people who use these programs may change locations and use different computers, have several usernames or accounts, or communicate using a combination of voice, text, or other media (requiring different protocols).This creates a situation that‘s similar to trying to mail a letter to someone who has several aliases, speaks different languages, and could change addresses at any particular moment.

SIP works with various network components to identify and locate these endpoints. Information is passed through proxy servers, which are used to register and route requests to the user‘s location, invite another user(s) into a session, and make other requests to connect these endpoints. Because there are a number of different protocols available that may be used to transfer voice, text, or other media, SIP runs on top of other protocols that transport data and perform other functions. By working with other components of the network, data can be exchanged between these user agents regardless of where they are at any given point.

It is the simplicity of SIP that makes it so versatile. SIP is an ASCII- or text-based protocol, similar to HTTP or SMTP, which makes it more lightweight and flexible than other signaling protocols (such as H.323). Like HTTP and SMTP, SIP is a request- response protocol, meaning that it makes a request of a server, and awaits a response. Once it has established a session, other protocols handle such tasks as negotiating the type of media to be exchanged, and transporting it between the endpoints. The reusing of existing protocols and their functions means that fewer resources are used, and minimizes the complexity of SIP. By keeping the functionality of SIP simple, it allows SIP to work with a wider variety of applications.

The similarities to HTTP and SMTP are no accident. SIP was modeled after these text-based protocols, which work in conjunction with other protocols to perform specific tasks. As we‘ll see later in this chapter, SIP is also similar to these other protocols in that it uses Universal Resource Identifiers (URIs) for identifying users. A URI identifies

Identifiers (URIs) for identifying users. A URI identifies EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
Identifiers (URIs) for identifying users. A URI identifies EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

MCC-NOTES

resources on the Internet, just as a Uniform Resource Locator (URL) is used to identify Web sites. The URI used by SIP incorporates a phone number or name, such as SIP:

user@syngress.com, which makes reading SIP addresses easier. Rather than reinventing the wheel, the development of SIP incorporated familiar aspects of existing protocols that have long been used on IP networks. The modular design allows SIP to be easily incorporated into Internet and network applications, and its similarities to other protocols make it easier to use.

RFC 2543/RFC 3261

The Session Initiation Protocol is a standard that was developed by the Internet Engineering Task Force (IETF).The IETF is a body of network designers, researchers, and vendors that are members of the Internet Society Architecture Board for the purpose of developing Internet communication standards. The standards they create are important because they establish consistent methods and functionality. Unlike proprietary technology, which may or may not work outside of a specific program, standardization allows a protocol or other technology to function the same way in any application or environment. In other words, because SIP is a standard, it can work on any system, regardless of the communication program, operating system, or infrastructure of the IP network.

The way that IETF develops a standard is through recommendations for rules that are made through Request for Comments (RFCs).The RFC starts as a draft that is examined by members of a Working Group, and during the review process, it is developed into a finalized document. The first proposed standard for SIP was produced in 1999 as RFC 2543, but in 2002, the standard was further defined in RFC 3261. Additional documents outlining extensions and specific issues related to the SIP standard have also been released, which make RFC 2543 obsolete and update RFC 3261.The reason for these changes is that as technology changes, the development of SIP also evolves. The IETF continues developing SIP and its extensions as new products are introduced and its applications expand.

SIP and Mbone

Although RFC 2543 and RFC 3261 define SIP as a protocol for setting up, managing, and tearing down sessions, the original version of SIP had no mechanism for tearing down sessions and was designed for the Multicast Backbone (Mbone).Mbone originated as a method of broadcasting audio and video over the Internet. The Mbone is a broadcast channel that is overlaid on the Internet, and allowed a method of providing Internet broadcasts of things like IETF meetings, space shuttle launches, live concerts, and other meetings, seminars, and events. The ability to communicate with several hosts simultaneously needed a way of inviting users into sessions; the Session Invitation Protocol (as it was originally called) was developed in 1996.www.syngress.com

originally called) was developed in 1996. www.syngress.com EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
originally called) was developed in 1996. www.syngress.com EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

MCC-NOTES

The Session Invitation Protocol was a precursor to SIP that was defined by the IETF MMUSIC Working group, and a primitive version of the Session Initiation Protocol used today. However, as VoIP and other methods of communications became more popular, SIP evolved into the Session Initiation Protocol. With added features like the ability to tear down a session, it was a still more lightweight than more complex protocols like H.323. In 1999, the Session Initiation Protocol was defined as RFC 2543, and has become a vital part of multimedia applications used today.

OSI

In designing the SIP standard, the IETF mapped the protocol to the OSI (Open Systems Interconnection) reference model. The OSI reference model is used to associate protocols to different layers, showing their function in transferring and receiving data across a network, and their relation to other existing protocols. A protocol at one layer uses only the functions of the layer below it, while exporting the information it processes to the layer above it. It is a conceptual model that originated to promote interoperability, so that a protocol or element of a network developed by one vendor would work with others.

As seen in Figure 8.1, the OSI model contains seven layers: Application, Presentation, Session, Transport, Network, Data Link, and Physical. As seen in this figure, network communication starts at the Application layer and works its way down through the layers step by step to the Physical layer. The information then passes along the cable to the receiving computer, which starts the information at the Physical layer. From there it steps back up the OSI layers to the Application layer where the receiving computer finalizes the processing and sends back an acknowledgement if needed. Then the whole process starts over. www.syngress.com Figure 8.1 In the OSI Reference Model, Data is Transmitted down through the Layers, across the Medium, and Back up through the Layers

Layers, across the Medium, and Back up through the Layers The layers of the OSI reference

The layers of the OSI reference model have different functions that are necessary in transferring data across a network, and mapping protocols to these layers make it easier to understand how they interrelate to the network as a whole. Table 8.1 shows the seven layers of the OSI model, and briefly explains their functions.

of the OSI model, and briefly explains their functions. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
of the OSI model, and briefly explains their functions. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

Table 8.1 Layers of the OSI Model

MCC-NOTES

Layer

Description

7: Application

The Application layer is used to identify communication partners, facilitate authentication (if necessary), and allows a program to communicate with lower layer protocols, so that in turn it can communicate across the network. Protocols that map to this layer include SIP, HTTP, and SMTP.

6: Presentation

The Presentation layer converts data from one format to another, such as converting a stream of text into a popup window, and handles encoding and encryption.

5: Session

The Session layer is responsible for coordinating sessions and connections.

4: Transport

The Transport layer is used to transparently transfer data between computers. Protocols that map to this layer include TCP, UDP, and RTP.

3: Network

The Network Layer is used to route and forward data so that it goes to the proper destination. The most common protocol that maps to this layer is IP.

www.syngress.com 2: Data Link

1: Physical

The Data Link layer is used to provide error correction that may occur at the physical level, and provide physical addressing through the use of MAC addresses that are hard- coded into network cards.

The Physical layer defines electrical and physical specifications of network devices, and provides the means of allowing hardware to send and receive data on a particular type of media. At this level, data is passed as a bit stream across the network.

SIP and the Application Layer

Because SIP is the Session Initiation Protocol, and its purpose is to establish, modify, and terminate sessions, it would seem at face-value that this protocol maps to the Session layer of the OSI reference model. However, it is important to remember that the protocols at each layer interact only with the layers above and below it. Programs directly access the functions and supported features available through SIP, disassociating it from this layer. SIP is used to invite a user into an interactive session, and can also invite

a user into an interactive session, and can also invite EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
a user into an interactive session, and can also invite EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

MCC-NOTES

additional participants into existing sessions, such as conference calls or chats. It allows media to be added to or removed from a session, provides the ability to identify and locate a user, and also supports name mapping, redirection, and other services. When comparing these features to the OSI model, it becomes apparent that SIP is actually an Application-layer protocol.

The Application layer is used to identify communication partners, facilitate authentication (if necessary), and allows a program to communicate with lower layer protocols, so that in turn it can communicate across the network. In the case of SIP, it is setting up, maintaining, and ending interactive sessions, and providing a method of locating and inviting participants into these sessions. The software being used communicates through SIP, which passes the data down to lower layer protocols and sends it across the network. www.syngress.com SIP Functions and Features

When SIP was developed, it was designed to support five specific elements of setting up and tearing down communication sessions. These supported facets of the protocol are:

■ User location, where the endpoint of a session can be identified and found, so that a session can be established

■ User availability, where the participant that‘s being called has the opportunity and ability to indicate whether he or she wishes to engage in the communication

■ User capabilities, where the media that will be used in the communication is

established, and the parameters of that media are agreed upon

■ Session setup, where the parameters of the session are negotiated and established

■ Session management, where the parameters of the session are modified, data is transferred, services are invoked, and the session is terminated

Although these are only a few of the issues needed to connect parties together so they can communicate, they are important ones that SIP is designed to address. However, beyond these functions, SIP uses other protocols to perform tasks necessary that allow participants to communicate with each other, which we‘ll discuss later in this chapter.

User Location

The ability to find the location of a user requires being able to translate a participant‘s username to their current IP address of the computer being used. The reason this is so important is because the user may be using different computers, or (if DHCP is used) may have different IP addresses to identify the computer on the network. The program can use SIP to register the user with a server, providing a username and IP address to the server. Because a server now knows the current location of the user, other users can now find that user on the network. Requests are redirected through the proxy server to the user‘s current location. By going through the server, other potential

location. By going through the server, other potential EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
location. By going through the server, other potential EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

MCC-NOTES

participants in a communication can find users, and establish a session after acquiring their IP addresses.

User Availability

The user availability function of SIP allows a user to control whether he or she can be contacted. The user can set themselves as being away or busy, or available for certain types of communication. If available, other users can then invite the user to join in a type of communication (e.g., voice or videoconference), depending on the capabilities of the program being used.

User Capabilities

Determining the user‘s capabilities involves determining what features are available on the programs being used by each of the parties, and then negotiating which can be used during the session. Because SIP can be used with different programs on different platforms, and can be used to establish a variety of single-media and multimedia communications, the type of communication and its parameters needs to be determined. For example, if you were to call a particular user, your computer might support video conferencing, but the person you‘re calling doesn‘t have a camera installed. Determining the user capabilities allows the participants to agree on which features, media types, and parameters will be used during a session.

Session Setup

Session setup is where the participants of the communication connect together. The user who is contacted to participate in a conversation will have their program ―ring‖ or produce some other notification, and has the option of accepting or rejecting the communication. If accepted, the parameters of the session are agreed upon and established, and the two endpoints will have a session started, allowing them to communicate.

Session Management

Session management is the final function of SIP, and is used for modifying the session as it is in use. During the session, data will be transferred between the participants, and the types of media used may change. For example, during a voice conversation, the participants may decide to invoke other services available through the program, and change to a video conferencing. During communication, they may also decide to add or drop other participants, place a call on hold, have the call transferred, and finally terminate the session by ending their conversation. These are all aspects of session management, which are performed through SIP.

of session management, which are performed through SIP. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
of session management, which are performed through SIP. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

SIP URIs

MCC-NOTES

Because SIP was based on existing standards that had already been proven on the Internet, it uses established methods for identifying and connecting endpoints together. This is particularly seen in the addressing scheme that it uses to identify different SIP accounts. SIP uses addresses that are similar to e-mail addresses. The hierarchical URI shows the domain where a user‘s account is located, and a host name or phone number that serves as the user‘s account. For example, SIP: myaccount@madeupsip.com shows that the account my account is located at the domain madeupsip.com. Using this method makes it simple to connect someone to a particular phone number or username. Because the addresses of those using SIP follow a username @ domain name format, the usernames created for accounts must be unique within the namespace. Usernames and phone numbers must be unique as they identify which account belongs to a specific person, and used when someone attempts sending a message or placing a call to someone else. Because the usernames are stored on centralized servers, the server can determine whether a particular username is available or not when a person initially sets up an account. URIs also can contain other information that allows it to connect to a particular user, such as a port number, password, or other parameters. In addition to this, although SIP URIs will generally begin with SIP:, others will begin with SIPS:, which indicates that the information must be sent over a secure transmission. In such cases, the data and messages transmitted are transported using the Transport Layer Security (TLS) protocol, which we‘ll discuss later in this chapter.

SIP Architecture

Though we‘ve discussed a number of the elements of SIP, there are still a number of essential components that make up SIP‘s architecture that we need to address. SIP would not be able to function on a network without the use of various devices and protocols. The essential devices are those that you and other participants would use in a conversation, allowing you to communicate with one another, and various servers may also be required to allow the participants to connect together. In addition to this, there are a number of protocols that carry your voice and other data between these computers and devices. Together, they make up the overall architecture of SIP.

SIP Components

Although SIP works in conjunction with other technologies and protocols, there are two fundamental components that are used by the Session Initiation Protocol:

■ User agents, which are endpoints of a call (i.e., each of the participants in a call) ■ SIP servers, which are computers on the network that service requests from clients, and send back responses

that service requests from clients, and send back responses EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
that service requests from clients, and send back responses EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

User Agents

MCC-NOTES

User agents are both the computer that is being used to make a call, and the target computer that is being called. These make the two endpoints of the communication session. There are two components to a user agent: a client and a server. When a user agent makes a request (such as initiating a session), it is the User Agent Client (UAC), and the user agent responding to the request is the User Agent Server (UAS). Because the user agent will send a message, and then respond to another, it will switch back and forth between these roles throughout a session. Even though other devices that we‘ll discuss are optional to various degrees, User Agents must exist for a SIP session to be established. Without them, it would be like trying to make a phone call without having another person to call. One UA will invite the other into a session, and SIP can then be used to manage and tear down the session when

it is complete. During this time, the UAC will use SIP to send requests to the UAS, which

will acknowledge the request and respond to it. Just as a conversation between two people on the phone consists of conveying a message or asking a question and then waiting for a response, the UAC and UAS will exchange messages and swap roles in a similar manner throughout the session. Without this interaction, communication couldn‘t exist.

Although a user agent is often a software application installed on a computer, it can also be a PDA, USB phone that connects to a computer, or a gateway that connects the network to the Public Switched Telephone Network. In any of these situations however, the user agent will continue to act as both a client and a server, as it sends and responds to messages.

SIP Server

The SIP server is used to resolve usernames to IP addresses, so that requests sent from one user agent to another can be directed properly. A user agent registers with the

SIP server, providing it with their username and current IP address, thereby establishing their current location on the network. This also verifies that they are online, so that other user agents can see whether they‘re available and invite them into a session. Because the user agent probably wouldn‘t know the IP address of another user agent, a request is made to the SIP server to invite another user into a session. The SIP server then identifies whether the person is currently online, and if so, compares the username to their IP address to determine their location. If the user isn‘t part of that domain, and thereby uses

a different SIP server, it will also pass on requests to other servers. In performing these various tasks of serving client requests, the SIP server will act in any of several different roles:

■ Registrar server

■ Proxy server

■ Redirect server

■ Registrar server ■ Proxy server ■ Redirect server EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
■ Registrar server ■ Proxy server ■ Redirect server EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

Registrar Server

MCC-NOTES

Registrar servers are used to register the location of a user agent who has logged onto the network. It obtains the IP address of the user and associates it with their username on the system. This creates a directory of all those who are currently logged onto the network, and where they are located. When someone wishes to establish a session with one of these users, the Registrar server‘s information is referred to, thereby identifying the IP addresses of those involved in the session.

Proxy Server

Proxy servers are computers that are used to forward requests on behalf of other computers. If a SIP server receives a request from a client, it can forward the request onto another SIP server on the network. While functioning as a proxy server, the SIP server can provide such functions as network access control, security, authentication, and authorization.

Redirect Server

The Redirect servers are used by SIP to redirect clients to the user agent they are attempting to contact. If a user agent makes a request, the Redirect server can respond with the IP address of the user agent being contacted. This is different from a Proxy server, which forwards the request on your behalf, as the Redirect server essentially tells you to contact them yourself. The Redirect server also has the ability to ―fork‖ a call, by splitting the call to several locations. If a call was made to a particular user, it could be split to a number of different locations, so that it rang at all of them at the same time. The first of these locations to answer the call would receive it, and the other locations would stop ringing.

Stateful versus Stateless

The servers used by SIP can run in one of two modes: stateful or stateless. When a server runs in stateful mode, it will keep track of all requests and responses it sends and receives. A server that operates in a stateless mode won‘t remember this information, but will instead forget about what it has done once it has processed a request. A server running in stateful mode generally is found in a domain where the user agents resides, whereas stateless servers are often found as part of the backbone, receiving so many requests that it would be difficult to keep track of them.

Location Service

The location service is used to keep a database of those who have registered through a SIP server, and where they are located. When a user agent registers with a

where they are located. When a user agent registers with a EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
where they are located. When a user agent registers with a EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

MCC-NOTES

Registrar server, a REGISTER request is made (which we‘ll discuss in the later section).

If the Registrar accepts the request, it will obtain the SIP address and IP address of the

user agent, and add it to the location service for its domain. This database provides an up- to-date catalog of everyone who is online, and where they are located, which Redirect servers and Proxy servers can then use to acquire information about user agents. This

allows the servers to connect user agents together or forward requests to the proper location.

Client/Server versus Peer-to-Peer Architecture

In looking at the components of SIP, you can see that requests are processed in different ways. When user agents communicate with one another, they send requests and responses to one another. In doing so, one acts as a User Agent Client, and the other fulfills the request acts as a User Agent Server. When dealing with SIP servers however, they simply send requests that are processed by a specific server. This reflects two different types of architectures used in network communications:

■ Client/Server

■ Peer-to-peer

Client/Server

In a client/server architecture, the relationship of the computers are separated into two roles:

■ The client, which requests specific services or resources

■ The server, which is dedicated to fulfilling requests by responding (or attempting to respond) with requested services or resources

An easy-to-understand example of a client/server relationship is seen when using the Internet. When using an Internet browser to access a Web site, the client would be the computer running the browser software, which would request a Web page from a Web server. The Web server receives this request and then responds to it by sending the Web page to the client computer. In VoIP, this same relationship can be seen when a client sends a request to register with a Registrar server, or makes a request to a Proxy Server or Redirect Server that allows it to connect with another user agent. In all these cases, the client‘s role is to request services and resources, and the server‘s role is to listen to the network and await requests that it can process or pass onto other servers. The servers that are used on a network acquire their abilities to service requests by the programs installed on it. Because a server may run a number of services or have multiple server applications installed on it, a computer dedicated to the role of being a server may provide several functions on a network. For example, a Web server might also act as an e-mail server. In the same way, SIP servers also may provide different services.

A Registrar can register clients and also run the location service that allows clients and

other servers to locate other users who have registered on the network. In this way, a single server may provide diverse functionality to a network that would otherwise

diverse functionality to a network that would otherwise EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
diverse functionality to a network that would otherwise EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

MCC-NOTES

be unavailable. www.syngress.com Another important function of the server is that, unlike clients that may be disconnected from the Internet or shutdown on a network when the person using it is done, a server is generally active and awaiting client requests. Problems and maintenance aside, a dedicated server is up and running, so that it is accessible. The IP address of the server generally doesn‘t change, meaning that clients can always find it on a network, making it important for such functions as finding other computers on the network.

Peer to Peer

A peer-to-peer (P2P) architecture is different from the client/server model, as the computers involved have similar capabilities, and can initiate sessions with one another to make and service requests from one another. Each computer provides services and resources, so if one becomes unavailable, another can be contacted to exchange messages or access resources. In this way, the user agents act as both client and server, and are considered peers. Once a user agent is able to establish a communication session with another user agent, a P2P architecture is established where each machine makes requests and responds to the other. One machine acting as the User Agent client will make a request, while the other acting as the User Agent server will respond to it. Each machine can then swap roles, allowing them to interact as equals on the network. For example, if the applications being used allowed file sharing, a UAC could request a specific file from the UAS and download it. During this time, the peers could also be exchanging messages or talking using VoIP, and once these activities are completed, one could send a request to terminate the session to end the communications between them. As seen by this, the computers act in the roles of both client and server, but are always peers by having the same functionality of making and responding to requests.

SIP Requests and Responses

Because SIP is a text-based protocol like HTTP, it is used to send information between clients and servers, and User Agent clients and User Agent servers, as a series of requests and responses. When requests are made, there are a number of possible signaling commands that might be used:

www.syngress.com

REGISTER Used when a user agent first goes online and registers their SIP address and IP address with a Registrar server.

INVITE Used to invite another User agent to communicate, and then establish a SIP

session between them.

ACK Used to accept a session and confirm reliable message exchanges.

OPTIONS Used to obtain information on the capabilities of another user agent, so that

a session can be established between them. When this information is provided a session isn‘t automatically created as a result.

a session isn‘t automatically created as a result. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
a session isn‘t automatically created as a result. EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

MCC-NOTES

SUBSCRIBE Used to request updated presence information on another user agent‘s

status. This is used to acquire updated information on whether a User agent is online, busy, offline, and so on.

NOTIFY Used to send updated information on a User agent‘s current status. This

sends presence information on whether a User agent is online, busy, offline, and so on.

CANCEL Used to cancel a pending request without terminating the session.

BYE Used to terminate the session. Either the user agent who initiated the session, or the one being called can use the BYE command at any time to terminate the session.

When a request is made to a SIP server or another user agent, one of a number of possible responses may be sent back. These responses are grouped into six different categories, with a three-digit numerical response code that begins with a number relating to one of these categories. The various categories and their response code prefixes are as follows:

Informational (1xx): The request has been received and is being processed.

Success (2xx): The request was acknowledged and accepted.

Redirection (3xx): The request can‘t be completed and additional steps are required (such as redirecting the user agent to another IP address).

Client error (4xx): The request contained errors, so the server can‘t process the

request

Server error (5xx): The request was received, but the server can‘t process it. Errors of

this type refer to the server itself, and don‘t indicate that another server won‘t be able to process the request.

Global failure (6xx): The request was received and the server is unable to process it.

Errors of this type refer to errors that would occur on any server, so the request wouldn‘t be forwarded to another server for processing.

Protocols Used with SIP

Although SIP is a protocol in itself, it still needs to work with different protocols at different stages of communication to pass data between servers, devices, and participants. Without the use of these protocols, communication and the transport of certain types of media would either be impossible or insecure. In the sections that follow, we‘ll discuss a number of the common protocols that are used with SIP, and the functions they provide during a session.

UDP

The User Datagram Protocol (UDP) is part of the TCP/IP suite of protocols, and is used to transport units of data called datagrams over an IP network. It is similar to the Transmission Control Protocol (TCP), except that it doesn‘t divide messages into packets and reassembles them at the end. Because the datagrams don‘t support sequencing of the packets as the data arrives at the endpoint, it is up to the application to ensure that the data has arrived in the right order and has arrived completely. This may sound less beneficial than using TCP for transporting data, but it makes UDP faster because there is

transporting data, but it makes UDP faster because there is EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
transporting data, but it makes UDP faster because there is EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type

Sri Vidya College of Engineering & Technology

MCC-NOTES

less processing of data. It often is used when messages with small amounts of data (which requires less reassembling) are being sent across the network, or with data that will be unaffected overall by a few units of missing data. Although an application may have features that ensure that datagrams haven‘t gone missing or arrived out of order, many simply accept the potential of data loss, duplication, or errors. In the case of Voice over IP, streaming video, or interactive games, a minor loss of data or error will be a minor glitch that generally won‘t affect the overall quality or performance. In these cases, it is more important that the data is passed quickly from one endpoint to another. If reliability were a major issue, then the use of TCP as a transport protocol would be a better choice over hindering the application with features that check for the reliability of the data it receives. Notes from the Underground… Transport Layer Security

Transport Layer Security (TLS) is a protocol that can be used with other protocols like UDP to provide security between applications communicating over an IP network. TLS uses encryption to ensure privacy, so that other parties can‘t eavesdrop or tamper with the messages being sent. Using TLS, a secure connection is established by authenticating the client and server, or User Agent Client and User Agent Server, and then encrypting the connection between them. Transport Layer Security is a successor to Secure Sockets Layer (SSL), which was developed by Netscape. Even though it is based on SSL 3.0, TLS is a standard that has been defined in RFC 2246, and is designed to be its replacement. In this standard, TLS is designed as a multilayer protocol that consists of:

■ TLS Handshake Protocol

■ TLS Record Protocol

The TLS Handshake Protocol is used to authenticate the participants of the communication and negotiate an encryption algorithm. This allows the client and server to agree upon an encryption method and prove who they are using cryptographic keys before any data is sent between them. Once this has been done successfully, a secure channel is established between them. After the TLS Handshake Protocol is used, the TLS Record Protocol ensures that the data exchanged between the parties isn‘t altered en route. This protocol can be used with or without encryption, but TLS Record Protocol provides enhanced security using encryption methods like the Data Encryption Standard (DES). In doing so, it provides the security of ensuring data isn‘t modified, and others can‘t access the data while in transit.www.syngress.com

can‘t access the data while in transit. www.syngress.com EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type
can‘t access the data while in transit. www.syngress.com EC2037-MULTIMEDIA COMPRESSION AND COMMUNICATION [Type