Stream Control Transmission Protocol

Stream Control Transmission Protocol
The design of a new reliable transport

protocol for IP networks

Supervisor: Professor Raimo Kantola
Instructor: John Loughney, M. Sc.
Author: Ivn Arias Rodrguez

Helsinki University of Technology
Electrical and Communications Engineering Department
Networking Laboratory

Espoo, 12
th
of February, 2002

Abstract of Master's Thesis
i

HELSINKI UNIVERSITY
OF TECHNOLOGY
ABSTRACT OF MASTER'S
THESIS
Author: Ivn Arias Rodrguez
Title: Stream Control Transmission Protocol. The design of a new
reliable transport protocol for IP networks
Date: February the 12
th
, 2002
Number of pages: 159
Department: Electrical and Communications Engineering
Laboratory: Networking
Supervisor: Raimo Kantola
Instructor: John Loughney, M. Sc.
There is an increasing need for internetworking between telephone and computer
networks. Applications such as Voice over IP (VoIP) and the deployment of the 3
rd

Generation mobile telephony networks, make this integration a necessity.
The Signaling Transport (SIGTRAN) working group of the Internet Engineering
Task Force (IETF) is the one in charge of the design of the standards needed to make
this internetworking possible. The primary purpose of this working group is addressing
the transport of packet-based Public Switched Telephone Networks (PSTN) signaling
over IP networks, taking into account functional and performance requirements of the
PSTN signaling.
Among the multiple standards that have been defined by SIGTRAN there is one
new reliable transport protocol, the Stream Control Transmission Protocol (SCTP).
SCTP is the evolution of a previous transport protocol, called the Multi-Network
Datagram Transmission Protocol (MDTP), highly based on TCP.
SCTP has several new features that make it more suitable for PSTN signaling
transport than TCP. SCTP can take advantage of a multihomed host using all the IP
addresses the host owns. SCTP avoids a very simple attack that affects TCP, the so-
called SYN attack. This new protocol also provides a mechanism to prevent an
application using SCTP from the so-called Head-Of-Line (HOL) blocking by using
streams. Moreover, many features that are optional in TCP have been including in the
basic specifications of SCTP, such as the Selective Acknowledgements, the ability to tell
about the receipt of Duplicate Datagrams or the support for Explicit Congestion
Notification (ECN).
This Master's Thesis discusses the evolution of the design of SCTP. We will try to
explain why the different aspects of SCTP were designed in the way they were
designed. When possible, we will explain how the characteristics of SCTP evolved from
the initial MDTP's ones, and we will show how SCTP and TCP behave in similar
situations.
Keywords: Internet, Internet Protocol (IP), reliable transport protocol, Stream Control
Transmission Protocol (SCTP), Signaling System #7 (SS7), Signaling Transport
(SIGTRAN), Transport Control Protocol (TCP).
Resumen del Proyecto de Fin de Carrera
ii

HELSINKI UNIVERSITY
OF TECHNOLOGY
RESUMEN DEL PROYECTO
DE FIN DE CARRERA
Autor: Ivn Arias Rodrguez
Ttulo: Stream Control Transmission Protocol. El diseo de un nuevo
protocolo de transporte fiable para redes IP
Fecha: 12 de Febrero de 2002
Nmero de pginas: 159
Departamento: Ingeniera Elctrica y de Comunicaciones
Laboratorio: Redes de Ordenadores
Supervisor: Raimo Kantola
Tutor: John Loughney, M. Sc.
Coordinadores: ngel lvarez Rodrguez (ETSIT)
Anita Bisi (HUT)
Cada vez hay una mayor necesidad de integracin entre las redes de telefona y las
redes de ordenadores. Nuevas aplicaciones tales como Voz sobre IP (Voice over IP,
VoIP) o la implantacin de la 3 Generacin de telefona mvil hacen cada vez ms
necesaria esta integracin entre ambas redes.
El grupo de trabajo Transporte de Sealizacin (Signaling Transport, SIGTRAN) de
la Fuerza de Trabajo de Ingeniera de Internet (Internet Engineering Task Force, IETF)
es el que se encarga de producir los estndares necesarios para hacer posible la
integracin de dichas redes. El propsito principal de este grupo de trabajo es
encargarse del transporte de sealizacin de Redes Pblicas Telefnicas Conmutadas
(Public Switched Telephone Networks, PSTN) basadas en conmutacin de paquetes
sobre redes IP, teniendo en cuenta las funciones y prestaciones requeridas para el
transporte de dicha sealizacin.
Uno de estos nuevos estndares surgidos del trabajo conjunto de mltiples
ingenieros en SIGTRAN es el Protocolo de Transmisin con Control de Flujo (Stream
Control Transmission Protocol, SCTP). SCTP es un nuevo protocolo de transporte
fiable. El objetivo inicial de este nuevo protocolo era el transporte de los paquetes de
sealizacin de redes SS7 sobre redes IP.
SCTP comenz a disearse en verano de 1998. Por aquel entonces, Randall R.
Stewart y Qiaobing Xie comenzaron a disear un protocolo al que bautizaron como
Protocol de Transmisin de Datagramas Multi-Red (Multi-Network Datagram
Transmission Protocol, MDTP). En su diseo inicial se basaba ampliamente en el
Protocolo de Control de Transporte (Transport Control Protocol, TCP), el protocolo de
transporte fiable por excelencia presente en las redes IP. De hecho, este protocolo se
comenz a disear antes siquiera de la existencia de SIGTRAN, y su objetivo original
era subsanar algunos de los problemas encontrados al usar TCP.
Tiempo despus, al crearse SIGTRAN y comenzar a buscar el protocolo de
transporte idneo para sus propsitos, llegaron a la conclusin de que MDTP era lo ms
parecido a aquello que andaban buscando. Desde este momento el inters por MDTP
iii

subi, y su diseo comenz a debatirse en la lista de distribucin que con tal propsito
SIGTRAN haba abierto.
Durante su fase de diseo, la estructura inicial de MDTP cambi mucho. Haba que
adaptarlo a las necesidades especficas de SIGTRAN, el transporte de sealizacin de
las redes telefnicas, sobre todo de la red SS7. El diseo final de SCTP fue publicado en
la Peticin de Comentarios (Request For Comments, RFC) nmero 2960 a finales de
octubre de 2000. SCTP incluye muchas mejoras sobre TCP que lo hacen ms apropiado
que ste para el transporte de sealizacin, e incluso puede competir con l como
protocolo de transporte fiable general en Internet.
SCTP tiene un mecanismo para establecer asociaciones (el equivalente a las
conexiones de TCP) que le hace inmune al ataque por inundacin de datagramas con la
bandera de SYN fijada. SCTP utiliza un mecanismo de cuatro pasos en vez de los tres
que usa TCP. Esto le permite a los servidores el autenticar la direccin IP fuente del
datagrama que tiene la bandera SYN fijada antes de reservar ningn recurso y as
imposibilitar este ataque.
En TCP slo se pueden establecer conexiones de una direccin IP a otra direccin
IP. Una conexin TCP se identifica por la direccin IP y puerto tanto del cliente como
del servidor. As si una mquina posee diferentes tarjetas de red con sus respectivas
direcciones IP asociadas, no puede usar ms que una de ellas para establecer una
conexin TCP con otra mquina. En SCTP, una asociacin se identifica por una serie de
direcciones IP y un puerto del cliente, y el conjunto de direcciones IP del servidor y su
puerto. De esta manera, en caso de que una de las direcciones IP deje de funcionar,
siempre se puede seguir utilizando cualquiera de las otras.
Otra innovacin frente a TCP es que SCTP puede evitar el bloqueo del principio de
la lnea (head of line blocking) mediante el uso de flujos (streams). Este bloqueo se da
cuando en TCP enviamos varios mensajes independientes troceados en datagramas
usando una nica conexin. En esta situacin, aunque un mensaje haya llegado
completamente al receptor, ste no se podr pasar al usuario antes de que todos los
mensajes anteriores hayan llegado tambin completos. SCTP permite el uso de flujos,
que son subconexiones dentro de una asociacin SCTP de manera que datagramas
dirigidos a flujos distintos se tratan independientemente. Adems, con SCTP podemos
diferenciar distintos mensajes dentro del flujo de bytes con lo que el usuario no debe
incluir sus propias marcas. Incluso se pueden enviar mensajes de forma que el receptor
los pase al usuario nada ms recibirlos, sin guardar el orden en que fueron enviados.
SCTP utiliza varias direcciones IP (multihoming) tanto en el cliente como en el
servidor. Sin embargo, se utiliza tan slo una de ellas para enviar los datos, la direccin
primaria (primary address). El resto se reserva y slo se utilizan en caso de que la
direccin primaria falle. Por ello, para saber el estado en que se encuentran dichas
direcciones IP de reserva, SCTP tiene el llamado mecanismo de latidos de corazn
(heartbeat mechanism). Consiste en enviar mensajes a las direcciones IP que no se usan
para enviar datos. Dichos mensajes, o latidos, se deben responder, de manera que al
recibir la respuesta se sabe que esas direcciones siguen activas.
Uno de los principales problemas de TCP es que es muy difcil de extender.
Cuando se quiere aadir una nueva caracterstica a TCP, el limitado espacio que se dej
reservado para uso futuro cuando TCP se dise hace muchas veces que esto no sea
posible. SCTP es un protocolo muy abierto que ha sido diseado para que sea extensible
iv

por naturaleza. SCTP contiene una serie de funciones bsicas, y ha sido pensado para
que toda aquella caracterstica adicional que quiera ser aadida en el futuro, pueda
incluirse con gran facilidad.
Adems, una mquina que tiene una asociacin SCTP con otra, puede enviarle
mensajes de error, de manera que ciertos errores a nivel del protocolo de transporte
pueden resolverse sin afectar al usuario. Estos mensajes de error sirven tambin para
negociar el uso de funciones opcionales, de manera que versiones antiguas de SCTP que
no soporten dicha funcin nueva tengan una manera de expresar dicha carencia
enviando el mensaje de error apropiado.
TCP ha sido el protocolo de transporte fiable por excelencia de las ltimas dos
dcadas. Es por ello que muchas de las caractersticas que tiene SCTP han sido tomadas
directamente de TCP. La mayora de las extensiones que se han escrito para TCP han
sido incluidas en SCTP en su versin bsica. Entre ellas podemos mencionar el uso de
asentimientos selectivos (selective acknowledgements), la posibilidad de alertar de la
recepcin de datagramas duplicados, o el soporte para la Notificacin Explcita de
Congestin (Explicit Congestion Notification, ECN). Adems, SCTP usa los mismos
algoritmos que TCP para evitar la congestin. De esta manera, cuando haya convivencia
entre aplicaciones que usen bien SCTP o TCP como su protocolo de transporte, el ancho
de banda adjudicado a una asociacin SCTP o una conexin TCP sea el mismo.
Para evitar que el paso de TCP a SCTP sea dramtico, se ha definido una interfaz
de sockets que es lo ms parecido posible a la de TCP. De esta manera, los cambios
necesarios para hacer que una aplicacin utilice SCTP en vez de TCP son mnimos.
Aunque SCTP es un protocolo nuevo, e incluso se espera que se edite una nueva
especificacin de SCTP en el futuro debido a los fallos encontrados en la RFC 2960, ya
hay numerosas implementaciones pblicas en Internet. Esto har que los programadores
de aplicaciones tengan un fcil acceso a SCTP y lo puedan empezar a utilizar cuanto
antes. Aunque SCTP no es un protocolo simple, existen implementaciones que ocupan
menos de 100 Kbytes, lo que hace que se puedan usar incluso en pequeos dispositivos.
El autor de este Proyecto de Fin de Carrera ha estado trabajando con los ingenieros
que diseaban SCTP y sus posteriores extensiones desde septiembre de 1999, tomando
parte activa en su diseo. Tambin ha programado una implementacin de SCTP y ha
asistido a varias sesiones de interoperabilidad entre distintas implementaciones. Durante
este tiempo el autor ha adquirido una visin amplia sobre el proceso de diseo de un
protocolo de transporte fiable. Debido a ello, en este Proyecto de Fin de Carrera se
hablar sobre la evolucin del diseo de SCTP. Se tratar de explicar el porqu de los
diferentes aspectos de SCTP. Adems, puesto que SCTP es una evolucin de MDTP, se
intenta seguir la trayectoria de algunos de los aspectos de SCTP desde su diseo inicial
en MDTP. Y puesto que TCP es el protocolo de transporte con el cual SCTP tendr que
competir, se comparar el comportamiento de SCTP y TCP en situaciones similares.
El autor de este Proyecto de Fin de Carrera espera que esta recopilacin de datos
durante dos aos y medio pueda servir a futuros diseadores de protocolos similares.
Palabras Clave: Internet, Protocolo de Internet (Internet Protocol, IP), protocolo de
transporte fiable, Protocolo de Transmisin con Control de Flujo (Stream Control
Transmission Protocol, SCTP),.Sistema de Sealizacin Nmero 7, (Signaling System
#7 (SS7), Transporte de Sealizacin (Signaling Transport, SIGTRAN), Protocolo de
Control de Transporte (Transport Control Protocol, TCP).
Preface
v

PREFACE

The work of this Master's Thesis was carried out at the Communication Systems
Laboratory of the Nokia Research Centre located in Helsinki. It was supervised, however,
by the Networking Laboratory of the Department of Electrical and Communications
Engineering, in the Helsinki University of Technology.
If somebody would have told me that I was still going to be living in Helsinki almost
two years and a half after I arrived here, I would have simply laughed. I came to Helsinki
in September 1999 with an Erasmus grant, initially for six months. I had already enlarged
my stay here, in this supposed cold city, by three more months before Christmas. And after
that time, another year, and then another one. I do not know if I have done a good work
writing this Master's Thesis, but I am sure that I did my best, as I have never been in a
hurry to finish it quickly. Who would like to do so being in this wonderful city, Helsinki,
and working in this wonderful company, Nokia?
I would like to make good use of this page to thank all those people that helped me to
become an engineer. There are so many that I do not know how to start.
Let us start first with Ramn Francisco Alfonso Pujante, that great guy that not only
influenced me to choose the career I studied, but was the one that helped me so much to
continue studying in Madrid while living in Barcelona. Thank you very much for sending
me all those notes you took during those lessons I had to miss. Thank you for all those
formalities you had to do for me at the university. And thanks for being my friend. Without
your help, I probably would have never finished my studies.
I would also like to thank all those people from the Barcelona's branch of the National
Bank of Spain. They made so many things to make easier the fact of being studying while
working hard there, and constantly encouraged me to continue. Thanks, I spent really good
years living in Barcelona, the enemy of Madrid.
Thanks to all the people I met in Helsinki. I have spent here such a lovely time that
when I think about it and remember all those nice moments, it is even difficult to believe.
So many people... Marghe, Gusi, Martina, Pepelu, Albert, Willy, scar, Paulito, Santi,
Gaizka... I could continue for several pages if I would try to mention them all. However, I
would especially like to thank Javichu, who has been my flatmate, workmate, wonderful
friend (and cook!) for almost two years. Thanks to all of you. You are the main reason why
I have never been in a hurry to finish this work.
I would also like to thank the people in charge of the Socrates/Erasmus related issues.
Especially to ngel lvarez, who really takes care of all of us. He is, however, planning to
resign soon after being the Socrates/Erasmus Coordinator for many years now. It is really a
pity, the new generations will loose a great and competent guy.
Thanks also to the people here at the Nokia Research Centre that make it a lovely
place to work. There is a guy here at the NRC called John Loughney. They say he is my
boss, but I do not think so. In reality, he is my problem solver, and it looks like he is
always happy with what I do, which really makes me feel good. Thanks to him, coming to
work is one of the enjoyable things I do every day. If I am ever the boss of somebody, I
will try to do it as you do. Thanks a lot!

Preface
vi

I would also like to thank Raimo Kantola, my supervisor at the Helsinki University of
Technology. This busy guy really knows how to do his work. He really takes care of what
he does, so if he tells you that something will be better in another way, believe him, I
would. Thanks for your advice Raimo.
And at last but not least, I would like to thank my parents and siblings for everything
they have done during these last 25 years. Hei people, I am becoming an engineer!

Helsinki, February the 12
th
, 2002

Ivn Arias Rodrguez

Contents
vii

CONTENTS

ABSTRACT OF MASTER'S THESIS............................................................................................................I
RESUMEN DEL PROYECTO DE FIN DE CARRERA............................................................................ II
PREFACE........................................................................................................................................................ V
CONTENTS ................................................................................................................................................. VII
LIST OF FIGURES.......................................................................................................................................IX
LIST OF TABLES.......................................................................................................................................... X
LIST OF ACRONYMS AND ABBREVIATIONS......................................................................................XI
1. INTRODUCTION...................................................................................................................................... 1
2. BACKGROUND......................................................................................................................................... 3
2.1 TELEPHONY SIGNALING: A LITTLE BIT OF HISTORY................................................................................ 3
2.2 THE SS7 NETWORK: WHAT IS THAT? ..................................................................................................... 5
2.2.1 Functional Architecture of SS7 ................................................................................................... 7
2.2.1.1 The Service Switching Point (SSP) ................................................................................................. 9
2.2.1.2 The Signal Transfer Point (STP) ..................................................................................................... 9
2.2.1.3 The Service Control Point (SCP)................................................................................................... 11
2.2.1.4 The Signaling Links....................................................................................................................... 11
2.2.2 Protocol Architecture of SS7..................................................................................................... 13
2.2.2.1 The Message Transfer Part (MTP) ................................................................................................ 14
2.2.2.2 The Signaling Connection Control Part (SCCP)............................................................................ 16
2.2.2.3 The Transaction Capabilities Application Part (TCAP) ................................................................ 16
2.2.2.4 The ISDN User Part (ISUP) .......................................................................................................... 17
2.3 THE LARGEST COMPUTER NETWORK: THE INTERNET........................................................................... 17
2.3.1 A quick history of the Internet: From military use to worldwide business tool ........................ 18
2.3.2 The basis of the Internet: The internals of the Internet Protocol (IP) ........................................ 21
2.4 A MARRIAGE OF CONVENIENCE: REASONS FOR SS7 AND IP NETWORKS INTEGRATION........................ 25
2.4.1 Voice over IP............................................................................................................................. 26
2.4.2 The 3
rd
Generation Mobile Telephony ...................................................................................... 29
2.5 THIS IS WHAT WE WERE LOOKING FOR.................................................................................................. 31
2.5.1 The need of a new transport protocol ........................................................................................ 32
2.5.2 A proposal that IETF could not refuse ...................................................................................... 34
3. THE DESIGN OF SCTP: DATAGRAM STRUCTURE...................................................................... 37
3.1 SHAPE OF SCTP DATAGRAMS: AN EVOLUTION FROM MDTP .............................................................. 37
3.1.1 Common header and internal structure of MDTP ..................................................................... 37
3.1.2 Common header and internal structure of SCTP....................................................................... 40
3.2 SCTP ASSOCIATION MANAGEMENT: THE STATE DIAGRAM.................................................................. 48
4. AN ASSOCIATION'S BIRTH: FROM A TWO-WAY TO A FOUR-WAY HANDSHAKE............ 51
4.1 THE EVOLUTION OF THE ESTABLISHMENT PHASE.................................................................................. 51
4.2 COOKIES AGAINST THE ATTACKERS ..................................................................................................... 52
4.3 THE FIRST TWO LEGS: THE INIT AND THE INIT ACK CHUNKS............................................................ 54
4.3.1 The parameters .......................................................................................................................... 59
4.3.1.1 What is your address?.................................................................................................................... 59
4.3.1.2 The king of the parameters: The State Cookie............................................................................... 63
4.3.1.3 Other parameters ........................................................................................................................... 64
4.4 THE LAST TWO LEGS: THE COOKIE ECHO AND COOKIE ACK CHUNKS ......................................... 65
Contents
viii

5. DOING THE HARD WORK: TRANSMISSION OF DATA .............................................................. 67
5.1 BASIC DATA TRANSMISSION................................................................................................................. 67
5.2 SOME SOLUTIONS TO AVOID CONGESTION............................................................................................ 70
5.3 SEVERAL CONNECTIONS INSIDE A SINGLE ASSOCIATION: THE USE OF STREAMS ................................... 76
5.4 SIZE MATTERS: MTU DISCOVERY........................................................................................................ 80
5.5 I WILL WAIT FOR YOU: RTO CALCULATION ......................................................................................... 85
5.6 THE IDEAS LEFT ON THE WAY............................................................................................................... 87
6. IT IS NOT ALL PLAIN DATA .............................................................................................................. 89
6.1 ARE YOU ALIVE? THE PATH HEARTBEAT MECHANISM.......................................................................... 89
6.2 YOU ARE WRONG: THE OPERATIONAL ERROR CHUNK.......................................................................... 92
7. THIS IS THE END: THE SHUTDOWN AND ABORT ALGORITHMS.......................................... 95
7.1 TERMINATING ASSOCIATIONS IN MDTP............................................................................................... 95
7.2 A HARD END FOR AN ASSOCIATION'S LIFE: ABORTING AN ASSOCIATION IN SCTP................................ 96
7.3 I AM DONE, COULD YOU FINISH AS WELL? THE SHUTDOWN PROCEDURE.............................................. 97
8. AND NOW? SCTP EXTENSIONS AND SCTP USERS.................................................................... 102
8.1 THE SCTP EXTENSIONS ..................................................................................................................... 103
8.1.1 This is my new address: Adding and deleting addresses, and per stream flow control ........... 103
8.1.2 Can I trust you? Reliable and unreliable streams .................................................................... 105
8.1.3 Be ready to adapt to your environment: The adaptive Fast Retransmit algorithm.................. 107
8.2 IS ANYBODY USING SCTP? SOME APPLICATIONS THAT USE SCTP..................................................... 108
9. CHANGES TO BE MADE IN RFC 2960 ............................................................................................ 111
9.1 THE CHECKSUM DILEMMA.................................................................................................................. 111
9.1.1 The good old days: Letting others protect the data integrity ................................................... 111
9.1.2 The quest for a stronger scheme: The Cyclic Redundancy Check .......................................... 112
9.1.3 From a 16-bit to a 32-bit checksum......................................................................................... 115
9.1.4 The Adler-32 Checksum: We have a problem......................................................................... 116
9.1.5 Going back to the roots: Using the CRC-32 as the checksum................................................. 117
9.2 ERRATA: THE IMPLEMENTORS GUIDE................................................................................................ 119
10. CONCLUSIONS..................................................................................................................................... 122
APPENDIX A: CONTENTS OF THE CD-ROM..................................................................................... 125
APPENDIX B: OTHER SOURCES OF INFORMATION ABOUT SCTP ........................................... 127
BIBLIOGRAPHY........................................................................................................................................ 128
INDEX .......................................................................................................................................................... 140
List of Figures
ix

LIST OF FIGURES

FIGURE 2-1: EVOLUTION OF TELEPHONE NETWORK............................................................................................ 3
FIGURE 2-2: FUNCTIONAL ARCHITECTURE OF SS7 ............................................................................................. 8
FIGURE 2-3: SS7 PROTOCOL ARCHITECTURE................................................................................................... 13
FIGURE 2-4: INTERNET'S GROWTH (1981-2001)................................................................................................ 20
FIGURE 2-5: WORLDWIDE INTERNET POPULATION (AUGUST 2001) ................................................................. 20
FIGURE 2-6: THE IP HEADER ............................................................................................................................ 22
FIGURE 2-7: SOME MEMBERS OF THE INTERNET PROTOCOL SUITE.................................................................... 25
FIGURE 2-8: SIGTRAN FUNCTIONAL MODEL................................................................................................... 32
FIGURE 3-1: MDTP DATAGRAM STRUCTURE IN ITS FIRST VERSION.................................................................. 38
FIGURE 3-2: STRUCTURE OF SCTP DATAGRAMS.............................................................................................. 41
FIGURE 3-3: SCTP CONNECTION MANAGEMENT FINITE STATE MACHINE ......................................................... 49
FIGURE 4-1: ESTABLISHMENT PROCEDURE IN MDTP....................................................................................... 51
FIGURE 4-2: SYN ATTACK IN TCP ................................................................................................................... 53
FIGURE 4-3: ESTABLISHMENT PHASE IN SCTP (FIRST TWO LEGS) .................................................................... 55
FIGURE 4-4: TRANSMISSION OF 64 KILOBYTES FROM MADRID TO HELSINKI.................................................... 57
FIGURE 4-5: BASIC NAT OPERATION ............................................................................................................... 61
FIGURE 4-6: ESTABLISHMENT PHASE IN SCTP (LAST TWO LEGS) ..................................................................... 65
FIGURE 5-1: BASIC DATA TRANSMISSION ......................................................................................................... 69
FIGURE 5-2: TWO CAUSES OF CONGESTION....................................................................................................... 71
FIGURE 5-3: EVOLUTION OF CWND WITH AND WITHOUT PACKET LOSSES.......................................................... 74
FIGURE 5-4: USE OF FAST RETRANSMIT IN [STE2000] AND [STE2002B] ............................................................. 75
FIGURE 5-5: HEAD OF LINE BLOCKING............................................................................................................. 78
FIGURE 5-6: IP FRAGMENTATION ..................................................................................................................... 83
FIGURE 5-7: PROBABILITY DENSITY OF ACKNOWLEDGEMENT ARRIVAL TIMES................................................. 86
FIGURE 6-1: THE PATH HEARTBEAT MECHANISM IN SCTP ............................................................................... 91
FIGURE 6-2: THE ERROR CHUNK IN SCTP...................................................................................................... 92
FIGURE 7-1: THE ABORT PROCEDURE IN SCTP................................................................................................. 97
FIGURE 7-2: THE SHUTDOWN PROCEDURE IN SCTP.......................................................................................... 99
FIGURE 7-3: THE TWO-ARMY PROBLEM.......................................................................................................... 100
FIGURE 8-1: EVOLUTION OF THE ADDIP DRAFT ............................................................................................. 104
FIGURE 8-2: SS7-IP ADAPTATION LAYERS ..................................................................................................... 109
FIGURE 9-1: HARDWARE IMPLEMENTATION OF CRC-CCITT......................................................................... 114

List of Tables
x

LIST OF TABLES

TABLE 2-1: DIFFERENCES BETWEEN THE TELEPHONE AND IP NETWORKS ........................................................ 27
TABLE 5-1: SOME MTUS FOUND IN THE INTERNET .......................................................................................... 82
TABLE 9-1: ERROR-DETECTION CAPABILITIES OF SEVERAL CHECKSUMS ....................................................... 117
TABLE 9-3: CALCULATION TIME CONSUMED BY SEVERAL CHECKSUMS.......................................................... 118

List of Acronyms and Abbreviations
xi

LIST OF ACRONYMS AND ABBREVIATIONS

Numerics

1G 1
st
Generation Mobile Telephony
2G 2
nd
3G 3
rd
3GPP 3G Partnership Project

A

A Access links
ABORT Abort
ACM Association for Computing Machinery
AH Authentication Header
AIMD Additive Increase Multiplicative Decrease
ALG Application Level Gateway
AMPS Advanced Mobile Phone System
AMR Adaptive Multi Rate
ANS Advanced Networks and Service
ANSI American National Standards Institute
API Application Programming Interface
ARPA Advanced Research Projects Agency
ARIB Association of Radio Industries and Business
ASE Application Service Element
AT&T American Telephone and Telegraph
ATM Asynchronous Transfer Mode

B

B Bridge links (SS7)
B Beginning fragment flag (SCTP)
BISDN Broadband ISDN
BISUP Broadband ISDN User Part
BSD Business Services Database

C

C Cross links
CCITT Comit Consultatif International Tlgraphique et Tlphonique
CCITT International Telegraphy and Telephony Consultative Committee
CCS Common Channel Signaling
CDMA Code Division Multiple Access
CERN Conseil Europenne pour la Recherche Nuclaire
CERN European Council for Nuclear Research
xii

CERT Computer Emergency Respond team
CMSDB Call Management Services Database
COOKIE ACK Cookie Acknowledgement
COOKIE ECHO State Cookie
COPS Common Open Policy Service
CRC Cyclic Redundancy Check
CRC-16 Cyclic Redundancy Check of 16 bits
CRC-32 Cyclic Redundancy Check of 32 bits
CRC-32c Cyclic Redundancy Check of 32 bits studied by Castagnoli
CRC-CCITT Cyclic Redundancy Check standardized by the CCITT
CSIP Connectionless SCCP over IP Adaptation Layer
CSL Component Sublayer
CTP Common Transport Protocol
cwnd Congestion Window
CWR Congestion Window Reduced
CWTS Chinese Wireless Telecommunication Standard

D

D Diagonal links (SS7)
D Delay flag (TCP)
DATA Payload Data
DF Don't Fragment flag
DiffServ Differentiated Services
DNS Domain Name System
DoD Department of Defense
DPC Destination Point Code
DSCP Differentiated Services Codepoint
DTMF Dual Tone Multi-Frequency
DUP Data User Part

E

E Extended links (SS7)
E Ending fragment flag (SCTP)
E2E End-to-End
ECN Explicit Congestion Notification
ECNE Explicit Congestion Notification Echo
EDGE Enhanced Data for GSM Evolution
ERROR Operation Error
ESP Encapsulating Security Payload
ETSI European Telecommunications Standard Institute

F

F Fully associated links
FDDI Fiber Distributed Data Interface
FTP File Transfer Protocol

xiii

G

GNU GNU is Not Unix
GPRS General Packet Radio Service
GSM Global System for Mobile Communications

H

HEARTBEAT Heartbeat Request
HEARTBEAT ACK Heartbeat Acknowledgement
HLR Home Location Register
HMAC Keyed-Hashing algorithm for Message Authentication
HOL Head-of-line
HSCSD High Speed Circuit Switched Data
HTML Hypertext Markup Language
HTTP Hypertext Transfer Protocol

I

IANA Internet Assigned Numbers Authority
ICMP Internet Control Message Protocol
ICMPv6 Internet Control Message Protocol for IPv6
ICV Integrity Check Value
IEEE Institute of Electrical and Electronic Engineers
IETF Internet Engineering Task Force
IHL Internet Header Length
IKE Internet Key Exchange
IMT-2000 International Mobile Telephony 2000
IN Intelligent Network
INIT Initiation
INIT ACK Initiation Acknowledgement
IP Internet Protocol
IPsec IP Security Protocol
IPv4 Internet Protocol version 4
IPv6 Internet Protocol version 6
ISC Internet Software Consortium
ISDN Integrated Services Digital Network
ISO International Standards Organization
ITSP Internet Telephony Service Provider
ISUP ISDN User Part
ITU International Telecommunication Union
ITU-D ITU Development Sector
IUA ISDN Q.921-User Adaptation Layer

L

LAN Local Area Network
LAPD Link Access Procedures on the D-channel
LIDB Line Information Database
xiv

LNP Local Number Portability

M

M2PA MTP2-User Peer-to-Peer Adaptation Layer
M2UA MTP2-User Adaptation Layer
M3UA MTP3-User Adaptation Layer
MAC Message Authentication Code
MAC Medium Access Control
MD5 Message Digest 5
MDTP Multi-Network Datagram Transmission Protocol
MF Multi-Frequency
MF More Fragments flag (TCP)
MG Media Gateway
MGC Media Gateway Controller
MIB Management Information Base
MMUSIC Multiparty Multimedia Session Control
MPLS Multiprotocol Label Switching Architecture
MSS Maximum Segment Size
MTP Message Transfer Part
MTP1 MTP Level 1
MTP2 MTP Level 2
MTP3 MTP Level 3
MTU Maximum Transfer Unit

N

NCSA National Center for Supercomputer Applications
NFS Network File System
NIF Nodal Interworking Function
NMT Nordic Mobile Telephone
NREN National Research and Educational Network
NSF National Science Foundation
NSP Network Service Part
NUP National User Part

O

OOTB Out Of The Blue
OPC Origination Point Code
OSI Open Systems Interconnection
OSPF Open Shortest Path First

P

PCR Preventive Cyclic Retransmission
PDC Personal Digital Cellular
POTS Plain Old Telephone Service
PSTN Public Switched Telephone Network
xv

Q

QoS Quality of Service

R

R Reliability flag
RAP Resource Allocation Protocol
RFC Request For Comments
RSVP Resource Reservation Protocol
RTCP RTP Control Protocol
RTO Retransmission Time-Out
RTP Real Time Protocol
RTSP Real Time Streaming Protocol
RTT Round Trip Time
RTTVAR Round Trip Time Variation
RUDP Reliable UDP

S

SACK Selective Acknowledgement
SAP Session Announcement Protocol
SCCP Signaling Connection Control Part
SCN Switched Circuit Network
SCP Service Control Point
SCTP Stream Control Transport Protocol
SDP Session Description Protocol
SF Single Frequency
SG Signaling Gateway
SHA-1 Secure Hash Standard 1
SHUTDOWN Shutdown
SHUTDOWN ACK Shutdown Acknowledgement
SHUTDOWN COMPLETE Shutdown Complete
SIGTRAN Signaling Transport
SIO Service Indicator Octet
SIP Session Initiation Protocol
SMTP Simple Mail Transport Protocol
SNMP Simple Network Management Protocol
SP Signaling Points
SRTT Smoothed Round Trip Time
SS7 Signaling System #7
SSCOP Service Specific Connection-Oriented Protocol
SSN Subsystem Number
SSN Stream Sequence Number
SSP Service Switching Point
ssthresh Slow Start Threshold
SSTP Simple SCCP Tunneling Protocol
STP Signal Transfer Point
SUA SCCP-User Adaptation Layer
xvi

T

T TCB Missing flag (SCTP)
T Throughput flag (TCP)
T1 Standardization Committee T1-Telecommunications
TACS Total Access Communication System
T/UDP UDP for TCAP
TCAP Transaction Capabilities Application Part
TCB Transmission Control Block
TCP Transmission Control Protocol
TDM Time Division Multiplexing
TDMA Time Division Multiple Access
TFTP Trivial File Transfer Protocol
TLS Transport Layer Security
TLV Type-Length-Value
TOS Type of Service
TSL Transaction Sublayer
TSN Transmission Sequence Number
TSVWG Transport Area Working Group
TTA Telecommunications Technology Association
TTC Telecommunication Technology Committee
TUP Telephone User Part

U

U Unordered flag
UDP User Data Protocol
UMTS Universal Mobile Telecommunication System
URI Uniform Resource Identifier
URL Uniform Resource Locators

V

VLR Visitor Location Register
VoIP Voice over IP

W

WATS Wide Area Telephone Service
WWW World Wide Web

Introduction
1

1. INTRODUCTION

Our society is quite used to telephones. One simply grabs one of the available
telephone set models, dial a number using a rotary dial, keyboard, or simply telling the
name of the wished receiver, and in few seconds we are speaking with the desired person.
That person could have a mobile phone and be anywhere in the world, but it does not
matter. Alternatively, he may be speaking with some other person at that moment, but he
can notice about our calling and answer us, or we can even join the other conversation if he
wants. That person could not be available at the moment, but then we can leave a voice
mail message, or our call can be redirected to another location in which he is at the
moment. All this seems to be such a simple thing to do, but beneath it, there are the joint
work of many people and the constant evolution of the technologies used to carry the voice
from one point to another.
What makes all this possible is telephony signaling. This term refers to the
information transferred inside the telephone network that is used to establish, monitor and
terminate a telephone call. In a broader sense, it is applied to any data flow related with the
management of any of the telephone network internal elements or databases. This is what
makes possible services such as billing, roaming of mobile phones, toll free numbers,
televoting or calling card validation.
Telephony signaling has existed since the very beginning of the history of telephone
and it is as important as voice transport itself, if not more, as the whole operation of the
telephone network relies on it. It has evolved during all time, especially during the last 25
years when the marriage between telephone and computer became effective and the
differences between computer and telephone networks started to disappear.
The computer is another tool that is becoming more common. It is not as widespread
as the telephone because it is a newer invention and it is more expensive than a telephone
(or at least they were until recent times). Many people work using a computer and every
time it is harder and harder to think about a computer that is not connected to any computer
network. People are getting used to send emails or to surf the web in the same way they
write letters or read the newspaper. But again, those applications that simply work are the
fruit of many years of study and constant evolution.
This Master's Thesis deals with the effort done (and still to do) to join the telephone
and computer networks, and the steps lately taken (and the ones that should be taken in the
future) to achieve such objective.
In the next chapter we will give a background on telephony signaling. We will speak
not only about telephony signaling networks but also about computer networks and the
advantages of joining both types of networks.
The subsequent chapters of this Master's Thesis will be devoted to one of the key
protocols that will make that merger possible, the Stream Control Transport Protocol
(SCTP), which is the main topic of this paper. We will try to give a historic perspective of
the design of SCTP, discussing how it evolved from a TCP-like protocol to its final
specification.
In chapter 3 we will show the structure of SCTP datagrams and the finite state
machine that models its behavior.
Introduction
2

In chapter 4 we will look in detail at the establishment procedure of SCTP. We will
show how it differs from the way TCP sets up a connection, and the main advantages of
SCTP's scheme over TCP's one.
Chapter 5 is one of the most important ones. It discusses how SCTP performs its main
task, the transmission of data. Apart from the basic data transmission, we will speak about
the mechanism SCTP uses to avoid congestion in the network, how SCTP calculates the
Maximum Transfer Unit (MTU) and the Retransmission Time-Out (RTO). We will also
explain what are the streams and how they are used. Finally, we will comment some ideas
that were discarded during the design phase.
Chapter 6 is dedicated to the information transferred between two hosts that is not user
data but internal SCTP messages that help in the management of an SCTP association
1
.
We will show in chapter 7 the different ways to tear down an association.
In chapter 8 we will show the SCTP extensions defined so far and we will quickly
speak about applications that use SCTP as its transport protocol.
Chapter 9 summarizes the changes that are going to be made in the SCTP
specifications. A new version of the SCTP specifications including these changes will be
released within the next months.
Finally, in chapter 10 we will show our conclusions about SCTP, and what we think
about its future.

1
The term association identifies, in SCTP, one transport session between two peers. It is equivalent to
the term connection used in TCP.
Background
3

2. BACKGROUND

In this chapter we will first quickly review telephony signaling history. Then we will
continue explaining the main characteristics of the biggest telephony signaling network
nowadays, the Signaling System #7 (SS7).
We will also review its equivalent in the computer network world, the Internet: a
mixture of heterogeneous computer networks that use a common protocol that acts as the
glue that keeps them together, the Internet Protocol (IP).
At the end of the chapter, we will discuss what are the reasons and benefits of merging
both networks in a single one and what is needed to do so.

2.1 Telephony signaling: A little bit of history

Graham Bell patented the telephone in 1876, and immediately there was a huge
demand for the new invention. Initially, phone usage was so simple, and there was not
anything such as a telephone company but instead the telephone sets were sold in pairs
(much as the present walkie-talkies) and the happy owner was the one in charge to
physically establish the line by stringing a single wire between them (the earth surface
acted as ground so just one wire was needed). The telephones did not even have a ringer,
and the way of setting up a call was by simply shouting at the microphone and hoping that
the partner would be close enough to his phone to hear the other one calling.
This just gave to the telephone owner the possibility of speaking with another
customer. One should have as many telephone sets as different people he wanted to speak
with. Figure 2-1 (a) shows this situation, when 9 people wanted to be connected among
them.

Figure 2-1: Evolution of telephone network

Within one year the cities were covered with wires passing over houses and trees in a
wild jumble, and it became obvious that this model of connection was not going to work.
Taking advantage of this, Bell created the Bell Telephone Company and opened the first
switching office in 1878. The company ran a wire to each customer's house or office.
When they wanted to use the telephone, they had to lift the receiver, allowing DC current
(a) Fully interconnected network (b) Centralized switch (c) Two level hierarchy
Background
4

to flow through the telephone and back through the return of the circuit, turning on a lamp
in the operator's switchboard. Usually the subscriber had to crank the phone to make a
ringing sound in the telephone company office to attract the attention of the operator, who
connected him to the callee using a jumper cable. This way, a customer could speak with
all the other customers connected to the same switching office, just having a single
telephone set (now equipped with a ringer) and a single wire (now balanced, insulated,
twisted pairs). This model is illustrated in Figure 2-1 (b).
As the telephone started to be increasingly popular, people wanted to make long
distance calls between cities, so the switching offices were interconnected. But then, the
same problem of interconnecting all the offices arose again, and a second level of offices
was created, as shown in Figure 2-1 (c). Eventually the hierarchy grew to five levels.
The DC current flow and the ringer were the first type of telephony signaling that was
ever used to establish and terminate phone calls, although that was done mostly manually
by the operator. However, signaling evolved, including today much more information than
this early method could, and reducing the human intervention to its minimum.
Telephony signaling was initially limited by the fact that the same circuit was used
both to carry the voice and the signaling, a method called in-band signaling. Moreover,
telephony signaling was analog and had a small quantity of possible states and so little
information could be handled, making necessary operator intervention most of the times.
To make things worse, the in-band signaling approach caused that the circuit used for the
telephone call was busy since the very moment the caller started dialing until the caller
went on-hook. Thus, telephone companies were quickly running out of circuits to attend all
the demand they had, as the customers started to be counted by millions and created an
enormous amount of traffic.
On one hand, telephone companies needed a new way of calling management that
would save the substantial investments that had to be done to add new facilities
2
. On the
other hand, they needed methods to be able to support the new services that the subscribers
were demanding.
In the early sixties, the European telephone companies started to digitize their
networks. One of the first steps taken was to stop using the voice network for signaling and
using instead another network used solely for that purpose, practice known as Common
Channel Signaling (CCS). This new approach immediately brought some benefits. For
example, the setup and teardown procedures could be done more quickly and they were
less error prone. Digitalization of phone lines not only improved the quality of the calls
(especially long distance ones) but also made equipment cheaper.
CCS is in wide use today, the SS7 model being the protocol and architecture presently
used in this relatively new network. Nevertheless, in the history of telephony signaling,
many other methods have been used:

DC signaling: This was the first type of signaling used. When a subscriber went
off-hook, DC current flowed from the central office through the telephone and
back to the office. A DC current detector provided a dial tone, and the subscriber
dialed the number using a rotary dial, which use a relay to interrupt the current
creating pulse bursts (10 pulses per second). The central office determined the
number dialed and established a circuit to the callee. The callee was alerted by the

2
Part of those facilities was the human operator. A story is told that the American Telephone and
Telegraph (AT&T), in the early 1930s, predicted that by the mid-1950s, every woman of working age in the
USA would be employed by them as an operator, due to the expected increase in call volume and the
available technology.
Background
5

ringer of his telephone, and the caller meanwhile received a calling tone. When
the distant party answered, the tone was interrupted and then the circuit carried the
voice. The circuit was released when either party hung up.
The limitations of this system are obvious: the signaling is limited to seizing
circuits, call supervision and disconnect.
In-band signaling: This way of signaling relies on the use of tones at certain
frequencies instead of using DC current. The tones are transmitted over the same
circuit than voice, and thus, they must be within the voice band (0 to 4 kHz). The
tones are designed to minimize the possibility of the voice frequencies duplicating
the signaling tones, but it is not 100% fault tolerant.
The tones sent can be Single Frequency (SF) tones, still used in some parts
of the telephone network for interoffice trunks; or Multi-Frequency (MF) or
Dual Tone Multi-Frequency (DTMF), mostly used to send dialed digits through
the telephone network to the destination end office.
Apart from the existing possibility of misinterpretation of speech as signaling
tones, this method uses expensive tone detectors and it is still limited in the
different values it can handle.
Out-of-band signaling: It is quite the same as in-band signaling, with the
difference that the analog voice carried in the circuit is limited to 3.5 kHz and the
band between that frequency and 4 kHz is left for signaling tones. It has the same
problems as in-band signaling except that there is no worry of false signaling.
Digital signaling: One of techniques used for signaling when the telephone
network went digitized was using certain bits in the voice trunk for signaling (a bit
was robbed from certain frames). This practice did not hurt the quality of the
digitized speech, or at least not enough to be detected by human ear.
It is more cost effective than the other methods commented so far, but still
limited regarding the type of signaling it can provide as it is not message-based.
Common channel signaling: It is digital as well but its main property is that it
places the signaling information in a time slot or channel separate from the voice
and data it is related to, so the voice or data trunks are just used to carry speech or
user data.
This method is presently used in SS7 and Integrated Services Digital
Network (ISDN). It is able of sending and receiving messages that can have
unlimited values and thus can be extended to support new functionality.
Moreover, it can not only control the state of telephone calls but also make
queries and fetch data from remote databases to support special services.

Among all these signaling methods, the most important one is the last one, CCS, the
only that will be further discussed. In the next section we make an overview of SS7 and we
will slightly discuss its functional and protocol architectures.

2.2 The SS7 network: What is that?

CCS is more flexible and powerful than in channel signaling and it is well suited to
support the requirements of integrated digital networks. The culmination of the transition
of network control signaling from an in channel to a common-channel approach is SS7,
Background
6

first issued by the Comit Consultatif International Tlgraphique et Tlphonique
(CCITT)
3
in 1980, with revisions every four years.
SS7 is designed to be an open-ended common-channel signaling standard that can be
used over a variety of digital circuit-switched networks. The overall purpose of SS7 is to
provide an internationally standardized general-purpose common-channel signaling system
with the following primary characteristics:

Optimized for use in digital telecommunication networks in conjunction with
digital stored program-control exchanges, using 64 kbps digital channels.
However, it is also suitable for operation over analog channels and at speeds
below 64 kbps
Designed to meet present and future information transfer requirements for call
control, remote control, management, and maintenance.
Provides a reliable means for the transfer of in-sequence information without loss
or duplication.
Suitable for use on point-to-point terrestrial and satellite links.

The scope of SS7 is large, since it must cover all aspects of control signaling for
complex digital networks. The fact that SS7 specifications consist of 53 ITU-T
Recommendations in the Q.7XX series gives an idea of how complex the standard is.
However, the first usage of SS7 was not for call setup and teardown, but for accessing
remote databases. In the 1980s, some telephone companies started to offer a new service
called Wide Area Telephone Service (WATS) that used a common 800 area code no matter
what was the destination of the call. But all the telephone-switching equipment by then
relied in the area code to make the routing decisions through the Public Switched
Telephone Network (PSTN). That problem was solved by assigning a second normal
number to every 800 number, which would have a real area code and thus could be used
for routing. However, the quantity of 800 numbers grew rapidly and it was necessary to
store all of them in a central database that could be accessed by all the central offices.
Therefore, the SS7 network started to be used to fetch routing and billing information from
that central database by making queries inside message packets.
Later, the services of the SS7 network were expanded to include some other services,
including call setup and teardown. Local Number Portability (LNP) is another feature of
the telephone network achieved thanks to SS7, which allows customers to change their
telephone company but still keeping the same number they previously used. LNP also
avoids the number change when upgrading the service from Plain Old Telephone Service
(POTS) to ISDN. This service requires the use of a database that is much the same as the
one used for 800 numbers.
SS7 can provide much more than routing and billing information. It provides the
means for switching equipment to communicate with other switching equipment at remote
sites. As an example, if the called number is busy the caller can use a feature known as
automatic callback. Then, when the callee's number becomes available, the network will
ring the caller's telephone. As soon as the caller answers, the called party telephone will be
rung. This feature relies on the capabilities of SS7 to send messages from one switch to

3
The CCITT (the International Telegraphy and Telephony Consultative Committee in English) was
renamed in 1993. Nowadays it is part of the International Telecommunication Union (ITU), being its
Telecommunications Standardization Sector (ITU-T) and dealing with telephone and data communication
systems.
Background
7

another switch, allowing the two systems to invoke features within each switch without
setting up a circuit between the two systems.
Seamless roaming is a service of the cellular network that relies on the SS7 protocol.
Cellular providers store their customer's information in databases called Home Location
Register (HLR), and they share that information with other cellular providers with whom
they have signed agreements. This way, the customer no longer has to register with other
service providers when traveling abroad, the visited network is selected automatically.
Today, SS7 is deployed by almost all independent telephone companies and
interexchange carriers. All those subnetworks, owned by telephone companies, cellular
service providers and long distance carriers, are linked together thanks to the SS7 protocol.
This makes SS7 the world's largest data communications network. In the next subsections
we are going to make a quick review of the internal structure of this network. The
interested reader should take a look at those books from which most of the information
included in this paper regarding SS7 has been taken: [Rus1998], chapter 10 of [Kes1998]
and chapter 10 of [Sta1995]. Among the many documents in the Internet containing
information about SS7, [PT2000] and [Mod1992] are worth a special mention.

2.2.1 Functional Architecture of SS7

With common-channel signaling, control messages are routed through the network to
perform call management (setup, maintenance, and termination) and network management
functions. Those control messages are short packets that must be routed through the
network to their final destination. Thus, even if the network being controlled is a circuit-
switched network (the voice trunks), the control signaling is implemented using packet-
switching
4
technology. In effect, a packet-switched network is overlaid on a circuit-
switched network in order to operate and control the circuit-switched network.
SS7 defines the functions that are performed in the packet-switched network but does
not dictate any particular hardware implementation. For example, all of the SS7 functions
could be implemented in the circuit switching nodes as additional functions; this approach
is the so-called associated signaling mode. Alternatively, there can be separate switching
points that carry only the control packets and are not used for carrying circuits, the
nonassociated signaling mode. Even in the second case, the circuit-switching nodes would
need to implement portions of SS7 so that they could receive the control signals. Today,
the telephone switches used in many exchange offices perform signaling functions. This is
usually done by using an adjunct computer that is connected to the network through a
digital link. Those computers are called Signaling Points (SP). They are in charge of
switching messages through the network using transfer points to route those messages from
one end office to another one, and they also provide access to databases.
All nodes in the SS7 network are called signaling points. A signaling point has the
ability to perform message discrimination (read the address and determine if the message
is for that node), as well as to route SS7 messages to another SP. When using SS7 to
support the Intelligent Network (IN) service we can find three different types of SPs:

Service Switching Point (SSP).
Signal Transfer Point (STP).

4
A circuit-switched network is one in which a circuit is reserved and uniquely dedicated for transferring
data between two endpoints. Once reserved, that circuit cannot be used by any other endpoint, even though it
remains idle. In a packet-switched network, the resources are shared and used by all the endpoints with no
dedicated circuits.
Background
8

Service Control Point (SCP).

SPs provide access to the SS7 network, provide access to databases used by switches
inside and outside of the SS7 network, and transfer SS7 messages to other SPs within the
network. They are connected all together thanks to signaling links that provide the speed
necessary for SS7 message delivery
5
. This functional architecture is shown in Figure 2-2.

Figure 2-2: Functional architecture of SS7

Both SPs and signaling links are always deployed in pairs for redundancy and
diversity. SS7 makes sure that the network is always operational providing alternate paths
in the event of failures. This ensures that messages can always reach their destinations.
The network is deployed at two distinct levels, or planes. There is an international
plane, using the ITU-T standard of the SS7 protocol, and there is the national plane. The
national plane uses whatever standard exists within the country in which it is deployed. For
example, in the United States, American National Standards Institute (ANSI) is the
standard for the national plane, this version of SS7 being the one that will be discussed in
this paper. In other nations, there may be one or several different versions of national
protocols for SS7 and, while similar, they have fundamental differences. Yet all countries

5
ITU-T specifies a bit rate of 64 kbps, used almost everywhere in the world. The U.S. and Japan are
exceptions to this model, using 56 kbps and 4.8 kbps respectively. The 64 and 56 kbps links are usually
single DS0 channels of the digital signaling hierarchy. Future broadband networks might use T1 facilities at
1.536 Mbps links.

SSP

SCP

SSP

SSP

SSP

STP

STP

STP

STP

STP

STP

SSP

SSP

SSP

SCP

SSP
A Ac cc ce es ss s l li in nk ks s
C Cr ro os ss s l li in nk ks s
B Br ri id dg ge e l li in nk ks s
D Di ia ag go on na al l l li in nk ks s
E Ex xt te en nd de ed d l li in nk ks s
F Fu ul ll ly y a as ss so oc ci ia at te ed d l li in nk ks s
S Se er rv vi ic ce e S Sw wi it tc ch hi in ng g P Po oi in nt t
S Se er rv vi ic ce e C Co on nt tr ro ol l P Po oi in nt t
S Si ig gn na al l T Tr ra an ns sf fe er r P Po oi in nt t
SSP
STP
SCP
Background
9

are capable of communicating with one another through gateways that convert the national
version of the SS7 protocol to the international version of the SS7 protocol. This ensures
that all nations can interwork with the rest, while still addressing the requirements of their
own distinct networks.
In the next subsections we will take a closer look to the different SPs and signaling
links.

2.2.1.1 The Service Switching Point (SSP)

The SSP is the local exchange in the telephone network. An SSP can be a combination
of a voice switch and an SS7 switch, or an adjunct computer connected to the local
exchange's voice switch.
The SSP must convert signaling from the voice switch into SS7 signaling messages,
which can then be sent to other exchanges through the SS7 network. The exchange will
typically send messages related to its voice circuits to the exchanges with a direct
connection to it. In the case of database access, the SSP will be sending database queries
through the SS7 network to computer systems located centrally to the network.
The SSP function is to use the information provided by the calling party (such as
dialed digits) and determine how to connect the call using its routing tables. It will send an
SS7 message to the right adjacent exchange requesting a circuit connection.
The adjacent exchange acknowledges the request, granting permission to connect this
trunk. This same procedure is repeated, connecting trunks between several adjacent
exchanges until the final destination is reached.
Many SSP functions are accomplished by adding a computer adjunct to existing
switches. This computer receives signals from the voice switch that are used to trigger the
transmission of specific SS7 messages. Using adjuncts allows telephone companies to
upgrade their SS7 SPs without replacing expensive switches, providing a modular
approach to networking. Upgrades are typically limited to software loads.
An SSP must have the ability of sending messages using the ISDN User Part (ISUP)
protocol and the Transaction Capabilities Application Part (TCAP) protocol (see section
2.2.2).

2.2.1.2 The Signal Transfer Point (STP)

All SS7 packets travel from one SSP to another through at least one STP. The STP
acts as a router in the SS7 network and does not usually originate or terminate messages.
An STP is also typically an adjunct to a voice switch, and rarely it is a stand-alone
system built for the sole purpose of STP functionality. There are three levels of STPs:

A national STP exists within a national network and is capable of transferring
messages using the same national standard protocol. Messages may be passed to
another level of STP but the national STP has no capability of converting
messages into another version or format.
One international STP works the same as the national STP, but it operates in the
international network. The international network provides interconnectivity
between worldwide networks using the ITU-T standards. All nodes connecting to
the international STP must use the ITU-T protocol standard.
Background
10

The gateway STP provides protocol conversion from a national standard to the
ITU-T standard or some other standard, and vice-versa. A gateway STP is often
used as an access to the international network.

The gateway STP serves as the interface into another network. Long distance service
providers may have access into the local telephone company's database for subscriber
information, or the local service provider may need access into the long distance service
provider's database. In any case, this access is accomplished through a gateway STP.
Gateway STPs use screening features to maintain network security. Screening is the
capability to examine all incoming and outgoing packets and allow only those that are
authorized.
When considering the network constraints in terms of performance, one STP level
seems preferable. However, considerations of reliability and availability dictate a solution
with more than one level. The following guidelines are suggested by ITU-T:

In a hierarchical signaling network with a single STP level:
Each SP that is not also an STP is connected to at least two STPs.
The meshing of STPs is as complete as possible.
In a hierarchical signaling network with two STP levels:
Each SP that is not an STP is connected to at least two upper level STPs.
Each STP in the lower level is connected to at least two upper level STPs.
The STPs in the upper level are fully meshed
6
.

In Figure 2-2 we see an example of a hierarchical network with two STP levels. The
four STPs at the lower part of the figure could be national STPs, while the other two STPs
could be gateway STPs (or international STPs is the national SS7 is the ITU-T standard).
Apart from the basic routing tasks, STP performs measurements. There are two basic
types of measurements: traffic measurements and usage measurements.
Traffic measurements provide peg counts and statistical information regarding the type
of messages entering and leaving the network. For maintenance purposes, network events
are also recorded (such as link out-of-service duration, local processor outage, etc.).
Because of the speed of the network and the quickness at which SS7 entities respond to
problems, traffic measurements are the best way for maintenance personnel to keep track
of what is happening in the network and preventing network failures.
Usage measurements are always peg counts and record the number of messages by
message type that enter and leave the network. These peg counts are aggregated by a
collection process and stored on magnetic tape. The tape is then used to create an invoice
for its customers.
In the local SS7 network, the STP receives messages from the SSP. These packets are
either related to call connections of database queries. Database access is provided through
another SS7 entity, the SCP (see next section). If the SSP does not know the address of the
destination SCP, the STP must provide the address. In this case, the SSP sends a database
query directed to the local STP. The STP will look at the dialed digits (the so-called global
title digits) and determine, through its translation tables, the address of the database. This is
referred to as global title translation.
The STP is the most versatile of all the SS7 entities, providing a wide array of services
to the users of the network.

6
We say that we have a full mesh when every STP has a direct link to every other STP.
Background
11

2.2.1.3 The Service Control Point (SCP)

The SCP serves as an interface to telephone company databases. SCP does not
necessarily store the information, but acts as an interface to the mainframe or
minicomputer system that houses the information. These databases are used to store
information regarding subscribers' services, routing of special service numbers or calling
card validation and fraud protection.
The SCP is usually a computer used as a front end to the database system. This
database system is usually linked to the SCP through X.25 links, but in integrated
STP/SCP, the database is resident in the SCP. The SCP can perform protocol conversion
from SS7 to X.25, or it can provide an interface to access the database directly. The
protocol used to access and interface to the databases is TCAP (see section 2.2.2.3).
The type of database depends on the network. Each service provider has different
requirements, and their databases will differ. The databases most commonly used within
either of these networks are:

Call Management Services Database (CMSDB): Provides routing instructions for
special service numbers (such as 800, 976 or 900 numbers) and billing
information. It also provides routing instructions to avoid congested nodes.
Local Number Portability (LNP): This database contains the necessary
information that allows subscribers to be able to change telephone companies
without having to change their telephone numbers. As the office code portion of a
telephone number can no longer be used to identify the destination, this database
provides the needed information to route the call.
Line Information Database (LIDB): It provides information regarding
subscribers, such as calling card service, third-party billing instructions, and
custom calling features such as call forwarding and speed dialing.
Business Services Database (BSD): The purpose of this database is to allow
subscribers to store call processing instructions, network management procedures,
and other data relevant only to their own private network.
Home Location Register (HLR): This kind of database appears in cellular
networks. The HLR stores information regarding billing, services allowed, as well
as the current location of the cellular telephone.
Visitor Location Register (VLR): It is used to store the current locations for the
visiting subscribers when they roam outside of their home areas.

As seen, each database contains information for a specific application. Each database
is also given an address, called a subsystem number, used in routing queries from SSPs
through the SS7 network to the actual database entity.

2.2.1.4 The Signaling Links

Links are bi-directional and full-duplex, working at speeds varying from 4.8 kbps to
1.536 Mbps, depending on the national SS7 network standard.
Links are placed into groups, called linksets. All the links in a linkset must have the
same adjacent node. The switching equipment will alternate transmission across all the
links in a linkset to ensure equal usage of all facilities. Up to 16 links can be assigned to
one linkset. In the common case that a node has links to a mated STP pair, the links are
assigned to two linksets, one linkset per node. Both linksets can then be configured as a
Background
12

combined linkset. Combined linksets are used for load sharing, where the sending SP can
send messages to both pairs, spreading the traffic load evenly across the links.
Inside the SS7 network, alternate linksets are used to provide alternate paths for
messages. An alternate linkset is used when congestion conditions occur over the primary
links, thus taking profit of the provided diversity of paths to overcome congestion.
Links must remain available for SS7 traffic at all times, with minimal downtime (a
maximum of 10 minutes downtime per year is allowed for any one linkset). When a link
fails, the other links within its linkset must take the traffic. Likewise, if an SS7 entity (such
as an STP) fails, its mate must assume the load. This means links can suddenly be
burdened with more traffic than they can handle. For this reason, SS7 entities are restricted
to send less than 40% traffic on any link. In case of a failure, any link can suddenly be
responsible for the failed link's traffic. Even at 80%, the links still have enough capacity to
carry SS7 network management messages as well as the extra traffic.
If the average message length is 40 bytes, i.e. 320 bits, and we consider the ANSI
specifications of SS7 with 56 kbps links, working at 40% gives 22.4 kbps of available
capacity, that could carry up to 70 messages per second. This simple formula is used to
dimension the network.
As seen in Figure 2-2, signaling links are labeled according to their function. There are
six different types of links used in SS7:

Access links (A): They are used between the SSP and the STP, or SCP and STP.
These links provide access into the network and to databases through the STP.
There are always at least two A links, one to each of the home STP pairs (except
in the highly unusual case that STPs are not deployed in pairs). The maximum
number of A links connecting an SSP to one STP is 16. A links can be configured
in a combined linkset, with 16 links to each STP, providing 32 links to the mated
pair.
Bridge links (B): B links are used to connect mated STPs to other mated STPs at
the same hierarchical level. B links are deployed in a quad fashion, as seen in
Figure 2-2. A maximum of eight B links can be deployed between mated STPs.
Cross links (C): The C links connect an STP to its mate STP. Normal SS7 traffic
is not routed over these links, except in congestion conditions or when a node
becomes isolated and the only available path is over the C links. The only
messages that travel between mated STPs during normal conditions are network
management messages. At most eight C links can be used between STP pairs.
Diagonal links (D): D links are used to connect mated STP pairs at a primary
hierarchical level to another STP mated pair at a secondary hierarchical level.
Otherwise, they have completely identical characteristics that C links.
Extended links (E): They are used to connect to remote STP pairs from an SSP.
They are used as an alternate route for SS7 messages in the event that congestion
occurs within the home STP pairs. A maximum of 16 E links may be used
between any remote STP pairs.
Fully associated links (F): F links are used when a large amount of traffic may
exist between two SSPs, or when it is not economical to provide a direct
connection between an SSP and an STP. When traffic is particularly heavy
between two end offices, the STP may be bypassed altogether. Only call setup and
teardown procedures would be sent over this linkset.

Background
13

There is no difference between the various links. Only the way in which the links are
used during message transfer and its interaction with network management is different.

2.2.2 Protocol Architecture of SS7

So far, we have been discussing SS7 architecture in terms of the way in which
functions are organized to create a packet-switching control network. The term architecture
can also be used to refer to the structure of protocols that specify SS7. As the Open
Systems Interconnection (OSI)
7
model, the SS7 standard is a layered architecture. The
term level in SS7 is used in the same context as layer in the OSI model. Figure 2-3 shows
the current structure of SS7 (in its ANSI version) and relates it to OSI.

Figure 2-3: SS7 Protocol Architecture

Some of the functions called for in the OSI model have no purpose in the SS7 network
and are, therefore, undefined. It should also be noted that the functions in the SS7 protocol
have been refined over the years and tailored for the specific requirements of the SS7
network. For this reason, there are many discrepancies between the two protocols and their
corresponding functions. Regardless of the differences, the SS7 protocol has proven to be a
highly reliable packet-switching protocol, providing all of the services and functions
required by the telephone service providers. SS7 continues evolving to adapt to bigger
networks and new services provided by telephone companies.
The lowest three levels of the SS7 architecture, referred to as the Message Transfer
Part (MTP), provide a reliable but connectionless (datagram style) service for routing
messages through the SS7 network.

7
The OSI model was developed and published in 1982 by the International Standards Organization
(ISO). Its name comes from the fact that it deals with connecting open systems, that is, systems that are open
for communication with other systems. It has seven layers, each of them performing a well-defined task, and
its is often used to describe the kind of functions that a protocol provides. For a good yet short introduction to
OSI read section 1.4.1 of [Tan1996].
MTP Level 1
MTP Level 2
MTP Level 3
SCCP

TCAP

I
S
U
P

Physical
Data Link

Network

Transport
Session
Presentation
Application

7

6

5

4

3

2

1
Background
14

MTP does not provide the complete set of functions and services specified in the OSI
layers 1-3, most notably in the areas of addressing and connection-oriented service. In the
1984 version of SS7, an additional module was added, which resides in level four of SS7,
known as the Signaling Connection Control Part (SCCP). The SCCP and MTP together
are referred to as the Network Service Part (NSP). SCCP defines a variety of different
network-layer services to meet the needs of various users of NSP. The remainder of the
modules of SS7 is considered to be at level four and comprise the various users of NSP.
NSP is simply a message delivery system; the remaining parts deal with the actual contents
of the messages. The ISDN User Part (ISUP) provides for the control signaling needed in
an ISDN to deal with ISDN subscriber calls and related functions, mostly to set up and tear
down telephone connections between end offices. ISUP was derived from the Telephone
User Part (TUP) (which is the ITU-T equivalent to ISUP and not used in the ANSI SS7
specifications). Apart from TUP functionality, ISUP offers the added benefit of supporting
IN functions and ISDN services. The Transaction Capabilities Application Part (TCAP),
first introduced in 1988, provides the mechanisms for transaction-oriented (as opposed to
connection-oriented) applications and functions.
There are some other protocols that as TUP are part of the SS7 family and do not
appear in Figure 2-3, as for example the Broadband ISDN User Part (BISUP) that is used
for setting up and tearing down Broadband ISDN (BISDN) circuits. However, it is still
being refined. Another protocol that is not present in Figure 2-3 is the Data User Part
(DUP), designed to provide data transmission capabilities for circuit-mode data networks.
DUP is not intended for ISDN as ISUP is, and thus it is already obsolete and it is not in use
presently in North American SS7 networks.
In the next subsections we will discuss the user and application parts of primary
importance to SS7.

2.2.2.1 The Message Transfer Part (MTP)

The MTP protocol is at the lowest level in the SS7 protocol stack, and it is a transport
protocol used by all the other members of the SS7 suite. It is actually divided into three
different levels with the same functionality as layers one, two and three of the OSI model.
MTP Level 1 (MTP1) allows the use of any digital-type interface. Common interfaces
in most SS7 networks today include E1 (2,048 kbps; 32 64 kbps channels), DS1 (1.544
Mbps; 24 64 kbps channels), V.35 (64 kbps), DS0 (64 kbps), and DS0A (56 kbps). To be
compatible to some older versions of SS7 (as the one deployed in Japan), SS7 can operate
at speeds as low as 4.8 kbps although that can cause unacceptably long delays. The most
common interface in the U.S. is DS-0A, but there exist already some preliminary standards
on the usage of a full DS1 facility at 1.544 Mbps as a signaling link, which will reduce the
number of multiplexers used in the network (saving one level of multiplexion, from one
DS1 channel to 24 DS0 channels).
MTP Level 2 (MTP2) provides the functions necessary for basic error detection and
correction. This protocol is concerned only with the reliable delivery of signal units
between two exchanges or SPs, there is no consideration outside of the signaling link and it
has no knowledge of the final destination.
This level provides flow control functionality and sequence numbering of the signal
units sent through the link across this point-to-point signaling link.
Another function of level two is error correction. There are two types of error
correction procedures. Basic error correction is used for links with a delay under 15ms. It
uses Go-Back-N retransmission, where a bad frame (lost or corrupted) and all
Background
15

subsequently transmitted frames are retransmitted by the sender. The Preventive Cyclic
Retransmission (PCR) scheme is used in links such as satellite signaling links with longer
delay. In PCR the transmitted signal units are retransmitted automatically during idle
periods until they are acknowledged.
MTP2 also performs two important signaling link error rate monitoring functions. The
signal unit error rate monitor counts signal unit errors using a leaky bucket scheme: a
counter is incremented by one whenever a signal unit with errors is detected, and is
decremented by one after every 256 signal units received (as long as the counter is
positive). If the counter reaches 64, an indication is sent to MTP3.
The alignment error rate monitor is used to ensure that signal unit alignment is
maintained: alignment is considered to be lost if more than 6 consecutive one bits are
received
8
or if a signal unit is received that is greater than the allowed maximum size. A
counter is incremented after the receipt of every 16 octets until alignment is reestablished.
If the counter crosses a threshold, an appropriate indication is passed to MTP3.
The MTP Level 3 (MTP3) protocol has the responsibility of transporting messages
between SPs. There are two broad functional categories performed by this layer network
management and message handling. Network management is in charge of providing
reconfiguration of the signaling network in the case of link or SP failures. It also controls
traffic in case of congestion. The signaling network management functions are:

Signaling link management: Activates new links and reinitializes or removes
from operation failed signaling links (following MTP2 indications).
Signaling link management only informs about the problem to the adjacent
SP that is at the other end of the problematic link. Therefore, signaling link
management is a local function. Traffic rerouting due to link failures is not a task
of the signaling link management.
Another feature provided by some SSPs that is responsibility of signaling link
management is the automatic allocation. It consists in removing voice circuits to
use them as SS7 signaling links, and vice-versa.
Signaling traffic management: Performs in a way a similar task than signaling
link management since it also deals with signaling link replacement. However, it
deals with signaling links that have suffered a complete malfunction (for example
a backhoe digged a link facility). The messages used to remove the signaling link
that caused the trouble are sent through a different path. So, the basic difference
between signaling traffic management and signaling link management lies in the
mechanism used to inform the adjacent SP about the failure. Signaling link
management will be then the one in charge to take the link out of service.
Signaling route management: It is used to advise other SPs about the inability of
one SP to reach another SP. Therefore, when a SP realized that it could not
communicate with an adjacent SP, it tells the other SPs to avoid sending signal
units to the unreachable SP.

Inside the message handling category we can find these three major functions:

Message discrimination: Determines whether a MTP2 message belongs to this SP
or another based upon the message's routing label. If the routing label contains the

8
MTP2 uses the fixed pattern of bits of 01111110 as an opening and closing flag of the signal units. As
a result of this, the sender must apply bit stuffing, inserting a 0 after every five consecutive 1s. At the
receiver, any 0 following five consecutive 1s is deleted.
Background
16

address of the local signaling point the message is handed off to message
distribution. Otherwise, it is passed to message routing.
Message distribution: If the message belongs to this SP, the message is passed to
the appropriate MTP user (ISUP or TCAP) or MTP3 function.
Message routing: If the MTP2 message received is to be relayed to another SP, or
if the message originated at this SP, it must be forwarded through another
signaling link chosen thanks to the information provided by the routing table.

As seen, a large part of the signaling network functional specification is concerned
with procedures for overcoming link failures and congestion. Procedures are specified for
quickly determining when a link has failed, removing it from service, rerouting traffic, and
bringing the link back into service after repair. There is an overriding concern for network
reliability, the goal being of 99.998% availability. This goal is achieved in SS7 by both
equipment redundancy and the network's dynamic reconfiguration and rerouting functions.

2.2.2.2 The Signaling Connection Control Part (SCCP)

The MTP was originally designed to meet the real-time requirements of telephone
network signaling and, for that reason, provides a connectionless network service. Some
applications, however, require a connection-oriented transfer capability and a larger, more
complete address space than the MTP makes available.
The MTP provides both the Origination Point Code (OPC) and the Destination Point
Code (DPC), of 14-bit length. In both cases, the point code is from a node-to-node
perspective. Moreover, MTP has a limited distribution capability at the node using a 4-bit
indicator in the Service Indicator Octet (SIO) field of a signal unit. This addressing
capability is adequate for a very limited set of services.
One major enhancement provided by the SCCP is its expanded addressing
functionality. The SCCP supplements MTP addressing by defining an additional field
called the Subsystem Number (SSN), which consists of local addressing information used
to identify SCCP users at each node. The combination of OPC plus SSN forms the calling
party address, and the DPC plus SSN number is the called party address.
Another SCCP enhancement is its ability to use global titles as addresses. A global
title is a special address, such as an 800 number, that does not provide information usable
for routing. SCCP is the protocol that performs the global title translation.
SCCP is used only with TCAP, although the standards indicate its use with ISUP (as
appears in Figure 2-3). This could in theory allow ISUP messages associated with an
already established connection to be routed using end-to-end routing, as with TCAP
messages. However, that functionality has not been implemented in SS7 networks.

2.2.2.3 The Transaction Capabilities Application Part (TCAP)

TCAP provides a general purpose, remote operation function for SS7. It provides the
capability for an application at one node to invoke the execution of an operation at another
node and to receive the results from that remote process. TCAP was originally designed to
support queries into databases, although its role can include additional functions.
TCAP comprises two protocol sublayers called the Transaction Sublayer (TSL) and
the Component Sublayer (CSL). The TSL is the lower TCAP sublayer and it defines how
the transaction or dialogue will take place, that is, what will be the context in which the
remote operation will take place. There are two types of dialogues, the unstructured
Background
17

dialogue, that is a one way communication in which the remote peer processes our
message but does not send any response back, and the structured dialogue, which is
analogous to a virtual connection where queries produce responses.
The CSL is the upper TCAP sublayer, and defines the actual messages, called
components, that are contained in the TSL messages. There are four types of CSL
components: invoke (to request a remote operation), return result (containing the response
of the requested operation), return error (indicating some kind of error), and reject
(indicating some kind of syntax error). Both invoke and return result have a single and a
multiple message versions (in case a unique message is not enough).
The TCAP services are provided to an upper user application which is called the
Application Service Element (ASE), responsible for providing the information that a
specific application needs, such as translating an 800 number into a routable number or
obtaining a billing number from a telephone calling card.

2.2.2.4 The ISDN User Part (ISUP)

ISUP is a circuit-related protocol, used to set up, manage and release trunks carrying
voice and data calls over the PSTN. It is used for both ISDN and non-ISDN calls and it
was adopted by the ANSI SS7 to replace TUP, which did not support data transmission or
digital circuits. However, ISUP does not support broadband technologies. These new
technologies will be addressed by a new version of ISUP called BISUP and still under
development by the ITU-T.
ISUP may use the transport services provided by either the MTP or SCCP as can be
seen in Figure 2-3. However, the interface between ISUP and SCCP has not been
implemented yet. MTP services are used for the transport of call-related signaling
messages between ISDN central offices, while the SCCP may be employed for additional
connectivity services as well as end-to-end signaling.
ISUP is compatible with the ISDN protocol, which was developed as an extension of
SS7 to the subscriber. The purpose of the ISDN compatibility is to allow subscribers
switches to send signaling information to remote subscribers. This can be used to support
called-invoked features such as conference calling or automatic callback.
Not all the SS7 networks use ISUP as its basis for ISDN services. Most of the
European countries, North America and Japan use ISUP, but for example in United
Kingdom they use National User Part (NUP), developed in the early 1980s and largely
based on TUP because ISUP was not yet available.

2.3 The largest computer network: The Internet

How many people have not heard about the Internet? In the developed countries, not
that many. The Internet and the services it offers have become in the last few years a mass
phenomenon, which is quickly gaining more and more popularity. In the close future
having an Internet connection at home will be as normal as having a TV set or a telephone
line is today. Behind this growing use there is a quite old protocol, the Internet Protocol
(IP) [Pos1981a]. IP has proven to be a very robust network protocol that can face the
introduction of new technologies with minimal changes, being still valid about 30 years
after its initial design. This is because, unlike most older network layer protocols, it was
designed from the beginning with internetworking in mind.
Background
18

The number of users of the Internet has always been growing, however, it has not been
until recent dates when the Internet users community became a significant group. This
happened thanks to the new kind of applications developed that make use of the Internet
and that make possible a new era of communication.
In the next subsection we quickly tell about the origin of the Internet since its
beginning in the late 1960s to the new millennium. Then we will briefly discuss about the
structure of IP and the protocols that use its services to communicate through the Internet.

2.3.1 A quick history of the Internet: From military use to worldwide
business tool

In the mid-1960s, at the height of the Cold War, the Department of Defense (DoD) of
the U.S. wanted to have a network that could survive a nuclear war (knowing who would
make use of that network after the nuclear attack was another issue). As traditional circuit-
switched networks were considered too weak because the loss of a single node or line
would terminate all the communications using it and could even split the network, the DoD
turned to its research arm, the Advanced Research Projects Agency (ARPA)
9
, to
investigate about a new network using the then-radical idea of packet switching. Having a
datagram subnet, if some lines or nodes were destroyed the messages could be
automatically rerouted along alternative paths.
ARPA gave some grants to universities to investigate about this topic, and finally in
December 1969, a packet switching network with four nodes was born, the ARPANet.
ARPANet rapidly grew, and few years later, experiments showed that the existing
ARPANet protocols were not suitable for running over multiple networks. This
observation led to more research on protocols, culminating in the invention of the
Transmission Control Protocol (TCP) [Pos1981c], and the TCP/IP model in 1974. TCP/IP
was specifically designed to handle communication over internetworks.
By 1983 ARPANet was stable and successful, with more than 200 networks and
hundreds of hosts, TCP/IP being the only standard protocol used. The Domain Name
System (DNS) [Moc1987] was created during the 1980s to organize machines into
domains and map hostnames into IP addresses. In the 1980s the ARPANet was connected
to several nodes outside the U.S., mostly in Europe and Japan, but the real growth and
evolution of the Internet was happening in North America.
By 1990, the ARPANet had been overtaken by newer networks that it itself had
spawned, so it was shut down and dismantled. But already in the late 1970s the National
Science Foundation (NSF) of the U.S. realized about the deep impact that the ARPANet
had in the universities and research centers, as it was a very good means to share ideas and
projects. The main problem was that, the universities wanting to join ARPANet should
have a research contract with its owner, the DoD. This was not always the case, so the NSF
began designing a high-speed successor to the ARPANet, open to all the universities, and
the result of this research was the NSFNet, founded in the mid-1980s using the same
hardware than ARPANet.
NSFNet was an instantaneous success. Few years later it connected thousands of hosts
placed in universities, research laboratories, libraries and museums, including the
computers connected to the ARPANet.

9
ARPA was founded in 1957 after Russia launched Sputnik 1 into earth's orbit, with the mission of
applying state-of-the-art technology to U.S. defense and to avoid being surprised by technological advances
of the enemy, i.e. Russia.
Background
19

NSFNet's success was killing itself, and in the subsequent years the links used for the
backbone had to be upgraded from 56 kbps links at its foundation, to 1.5 Mbps links in
1990. However, these upgrades were not free of charge, and it became obvious that the
government could not finance networking forever. So that same year some companies
formed a nonprofit corporation called Advanced Networks and Service (ANS), this being
the first step forward to the commercialization of the NSFNet. ANS took over the NSFNet
and upgraded its links to 45 Mbps. By this time the Internet bound around 200,000
computers contained in about 3,000 networks.
In 1991, the U.S. Congress approved the creation of the National Research and
Educational Network (NREN), the successor of the NSFNet, already running at gigabit
speeds. During the early 1990s commercial companies started to deploy their own IP-based
networks so the NSFNet backbone was no longer needed. It was sold to America On Line
in 1995 and since then, the Internet as a whole has not been maintained by the U.S. and
local governments anymore.
Until the early 1990s the traditional services provided by the Internet were e-mail (the
most popular application since the ARPANet times), news (to create international forums
regarding the most different topics), remote login (normally using the Telnet [Pos1983]
protocol to access remote computers) and file transfer (to make copies of files using the
File Transfer Protocol (FTP) [Pos1985] or the Trivial File Transfer Protocol (TFTP)
[Sol1992]). Those services were mostly used by academic, government and industrial
researchers. But in 1990, Tim Berners-Lee, a scientist working in the Conseil Europenne
pour la Recherche Nuclaire (CERN)
10
, created the Hypertext Transfer Protocol (HTTP)
[Ber1996], the language computers would use to communicate hypertext documents
11
over
the Internet, and he also designed a scheme to give documents addresses on the Internet,
the Uniform Resource Identifier (URI) [Ber1994]. At the end of 1990 he created a server
of hypertext documents, and a client program (browser) to retrieve and view those
hypertext documents. He called this application the World Wide Web (WWW).
Next year, in 1991, he made his web server and client software publicly available on
the Internet and what we today know as The Web started to take off. Berners-Lee's browser
was specifically designed for the personal computer he was using, so others, mostly
students, started to program their own web browsers. Among those early web browsers
was Erwise, written by the students of the Helsinki University of Technology
12
, which
worked in UNIX machines. The first browser with multimedia support was Mosaic,
written at the National Center for Supercomputer Applications (NCSA), in 1993, and
after this moment, things were so fast that are impossible to follow.
The birth of the WWW was the killer application that attracted million of new,
nonacademic users to the net, and it was what has made it so popular. Its first use was to
make easier the sharing of documents among scientist and researchers, but nowadays its
use is mostly commercial. There is virtually no known company that does not have its web

10
Translated to English is the European Council for Nuclear Research.
11
A hypertext document is a document containing text and embedded sound, images an even video
(which is usually defined as hypermedia), including links to other hypertext documents. Hypertext
documents are formatted using the Hypertext Markup Language (HTML) (also invented by Berners-Lee),
being its last version published in [W3C1999].
12
After a visit from Robert Cailliau, a close workmate of Tim Berners-Lee, a group of students at
Helsinki University of Technology joined together to write a web browser as a master's project. Since the
acronym for their department was called "OTH", they called the browser "erwise", as a joke on the word
"otherwise". The final version was released in April, 1992, and included several advanced features, but was
not developed further after the students graduated and went on to other jobs.
Background
20

page selling its products, and governments maintain web pages where many bureaucratic
processes can be done.
Figure 2-4 shows the exponential growth of the Internet in the last twenty years as
published by the Internet Software Consortium (ISC). The last measure of the number of
hosts connected to the Internet was taken in January 2001, and by then there were about
110 million hosts. If the growth rate continues, it is expected that by the end of year 2001
there will be about 175 million hosts, and the first billion host would be reached at some
point during year 2005.

Figure 2-4: Internet's growth (1981-2001)

The methods used to measure the number of hosts connected to the Internet are
different and their results vary considerably when consulting different sources. Moreover,
the number of hosts connected to the Internet is not the only figure that can give us an idea
of its tremendous success. As an example, the company Nua shows in its web pages
[Nua2001] an estimation of Internet users in August 2001 of 513 million worldwide. That
estimation is shown in Figure 2-5.

Figure 2-5: Worldwide Internet Population (August 2001)

0
20 000
40 000
60 000
80 000
100 000
120 000
1981 19821983 1984 19851986 1987 19881989 1990 1991 19921993 1994 19951996 1997 1998 19992000 2001
Year
N
u
m
b
e
r

o
f

h
o
s
t
s

(
t
h
o
u
s
a
n
d
s
)
0
20 000
40 000
60 000
80 000
100 000
120 000
Asia/Pacific Rim (including Australia)
South America
Africa
Middle East
U.S. and Canada
Europe
E Eu ur ro op pe e
1 15 54 4. .6 6 M Mi il ll li io on n
U U. .S S. . a an nd d C Ca an na ad da a
1 18 80 0. .7 7 M Mi il ll li io on n
S So ou ut th h A Am me er ri ic ca a
2 25 5. .3 3 M Mi il ll li io on n
A Af fr ri ic ca a
4 4. .2 2 M Mi il ll li io on n
M Mi id dd dl le e E Ea as st t
4 4. .7 7 M Mi il ll li io on n
A As si ia a/ /P Pa ac ci if fi ic c R Ri im m
( (i in nc cl lu ud di in ng g A Au us st tr ra al li ia a) )
1 14 44 4 M Mi il ll li io on n
Background
21

The times when North America was the almost solitaire owner of the Internet have
gone. However, still only the developed countries have a significant quantity of Internet
users. North America, the European Community, Japan, South Korea and Australia have
more than the 77% of all the Internet users worldwide, while its population is about 14% of
the total of the globe. Among all those countries we can highlight one, Sweden, that having
a population of about 8.8 million inhabitants, has 5.6 million of Internet users, the 63.5%
of its population, making it the country with the highest world's Internet penetration.
However, IP was not ready for such an incredible success. Due to the new applications
that make Internet interesting for the general public, the number of online user is growing
exponentially since the mid 1990s, and that number is expected to keep growing in the next
years. Even more, millions of people with wireless portables may use them to keep in
contact with their home base, and with the convergence of the computer, communication
and entertainment industries, it may not be long before every television or mobile phone in
the world is an Internet node. This brought two problems. On one hand, IP addresses are
32-bit numbers, which gives a theoretical maximum of about 4 billion addressable hosts.
But the practice of organizing the address space in classes to help routing wastes millions
of them. So, with the enormous growth of the Internet, IP addresses have become a scarce
commodity. On the other hand, having such a huge quantity of hosts makes the routing
algorithms inefficient, both making routing slower and more resource consuming.
Under these circumstances, it became apparent that IP had to evolve and become more
flexible. So, more than ten years ago, in 1990, the Internet Engineering Task Force
(IETF), the international organization that produces the standards regarding the Internet,
started to work on a new version of IP. The main characteristic of this new version should
be the use of a bigger address space so it would never run out of addresses, but at the same
time it should solve a variety of other problems [Hui1998]:

Reduce the size of the routing tables.
Simplify the protocol, thus allowing routers to make their job faster.
Provide better security (authentication and privacy) than the former version of IP.
Pay more attention to type of service, particularly for real-time data.
Aid multicasting by allowing scopes to be specified.
Make it possible for a host to roam without changing its address.
Make a protocol open enough that could evolve in the future.
Permit both versions of IP to coexist for years.

IETF issued a call for proposals, receiving 21 responses. In December 1992, seven
serious options were on the table, varying from simple patches to IP, to complete different
protocols. Next year, the three better proposals were chosen out of those seven: the one
created by Deering, the one created by Francis, and the Katz and Ford proposal. The final
protocol chosen was a modified combined version of the Deering and Francis proposals,
and was given the designation Internet Protocol version 6 (IPv6) [Dee1998].
IPv6 is not fully deployed yet, and virtually the only protocol used in the Internet is
still the previous version of IP, now called IPv4. But, there are already several complete
IPv6 implementations that should start working, together with IPv4, within the next years.

2.3.2 The basis of the Internet: The internals of the Internet Protocol (IP)

We have already seen how the Internet became what it is today, and we have also
mentioned that the whole Internet is possible thanks to a protocol that rules it: IP.
Background
22

But why is IP so important? IP is what keeps all the networks together, it is the
language that all the computers connected to the Internet must speak to be able to
communicate among them. IP is a network protocol that provides a best-effort way to carry
pieces of information called datagrams from our computer to any remote one, and vice-
versa, no matter whether or not these machines are on the same network or not.
Communication on Internet works as follow. A protocol operating at a higher level
fragments the data it wants to transfer to another host. The address of that remote host must
be provided to the IP layer along with the data itself, and then IP transfers that data to the
receiver, probably going through different networks in its way, and possibly further
fragmenting the data into smaller units. When all the pieces reach their destination, the IP
layer at the receiver side reassembles them into the original datagram and identifies which
upper level protocol originated it, passing the datagram to the right receiving process.
Let us take a closer look at the structure of both IPv4 and IPv6. Figure 2-6 shows us
their message headers. The first field in both IPv4 and IPv6 is the Version field. It keeps
track of which version of the protocol the datagrams belongs to, making possible the
transition between versions to take years. Obviously, it is set to 4 in IPv4 and to 6 in IPv6.

Figure 2-6: The IP header
(b) Internet Protocol version 6 (IPv6)

Destination Address

Source Address

Payload Length Next Header Hop Limit
Version DSCP ECN Flow Label
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
(a) Internet Protocol version 4 (IPv4)

Options (0 40 bytes)
Destination Address
Source Address
Time to Live Protocol Header Checksum
Identification

D

F

M

F
Fragment Offset
Version IHL DSCP ECN Total Length
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
Background
23

The next field in IPv4 is the Internet Header Length (IHL), necessary only in IPv4
since, as can be seen in Figure 2-6 (a), the IPv4 header length is variable, while the IPv6
header has a fixed length. IPv4 can include Options in its header and this is the reason of
the existence of the IHL field. The options allow the datagram sender to indicate the path
the datagram should follow, to create a log of the routers visited in its way, or even to tell
how secret is the information carried in the datagram (something only valuable to spies).
However, the space left for the options is insufficient, just 40 bytes. Moreover, most of
the routers ignore the options, so the conclusion is that having to check which is the header
length just makes datagram processing slower. That is the reason why IPv6 headers have a
fixed length of 40 bytes.
However, to provide extension possibilities IPv6 has the Next Header field. The IPv6
header was simplified as its maximum, but to allow adding capabilities to the protocol,
there can be additional extension headers. They are located right after the main header and
identified by the Next Header field. They in turn have another Next Header field, to be able
to include several extension headers in the same datagram, placed in a daisy chain. The last
one is set to the identifier of the payload protocol carried by the IPv6 datagram, an
identifier that in IPv4 is located in the Protocol field.
The byte used by the Differentiated Services Codepoint (DSCP) and the Explicit
Congestion Notification (ECN) fields has been quite an unstable one. As defined in the
initial specification of IPv4 [Pos1981a] it was called the Type of Service (TOS) byte. The
first three bits were the so-called Precedence field, which indicated the importance of the
information carried by the IPv4 datagram. The next three bits were three flags, Delay (D),
Throughput (T) and Reliability (R), allowing the host to specify what it cares most about.
The last two bits were set to zero. In practice, routers ignored the TOS field altogether.
Later, the seventh bit of the Type of Service field was converted into another flag as
the D, T and R bits, which was used to indicate preference for low monetary cost
[Alm1992]. Still, the TOS field was not popular enough to bother routers, which usually
did not take care about its value.
Similarly, in the first specification of IPv6 bits 4-7 (second half of the first byte) of the
main header formed a field named Priority, which had a similar use than the Precedence
field in IPv4. In the final specification of IPv6, [Dee1998], that field was enlarged to be
one byte long, and became the Traffic Class field, but its use was not defined.
Taking profit of this anarchical situation, another modification was done to this byte in
both versions of IP. As per [Nic1998] the first six bits became the DSCP field, and the last
two bits were unused. Those six bits define 64 codepoints that are mapped to specific
behaviors in the routers along the path that the datagram follows.
However, this was not the end of the story. Later, in [Ram2001] the last two unused
bits became the ECN field. These two bits are used to avoid using IP packet drops in
routers as the only indication of congestion. In turn, when a router becomes congested, it
uses this field to indicate that it is suffering from congestion. The data receiver uses then
the acknowledgements of the transport protocol to tell about the congestion situation to the
data sender, who should decrease its sending rate.
The Total Length field of IPv4 counts the total length of the datagram. It is equivalent
to the Payload Length field of the IPv6 header, which does not include the length of the
main header. Theoretically the biggest IP datagrams are about 64 Kbytes long, but in
practice they are around 1,500 bytes due to the limitations of the data link layer.
The Identification field is used when a datagram is further fragmented in the network.
All the fragments of a datagram carry the same Identification value. This field, together
with the Fragment Offset field that indicates the position of the fragment within the
Background
24

original datagram (in multiple of 8 bytes), and the More Fragments (MF) flag that marks
the end of a fragmented datagram, makes possible the reassembly at the receiver.
The Don't Fragment (DF) flag represents an order to the routers to not fragment the
datagram because the destination is incapable of putting the pieces back together again. In
IPv6 there is no field to provide fragmenting capabilities in its main header simply because
the routers are not allowed to fragment datagrams. In turn, there exists a fragmentation
header that can be used just at the sender and receiver sides. Routers simply discard too big
datagrams sending back an error message, which simplifies the job. We will discuss more
about fragmentation in section 5.4.
The Time to Live field helps avoiding having datagrams wandering around forever due
to corrupted routing tables. Initially it was designed to count seconds, but in practice it is
decremented by one at each router, so it just counts hops. In IPv6 the Hop Limit fields
makes that same work, but this time the name reflects the reality of its function.
The Header Checksum protects just the header. It is supposed that if the upper user
wants to protect the data carried by IP it will provide its own error detection scheme.
However, as fields such as Time to Live vary at every hop, the checksum must be
recalculated in every router. IPv6 does not include any checksum in its main header, but
there are two extension headers available to overcome this problem. The Authentication
Header (AH) [Ken1998b] provides connectionless integrity and data origin authentication.
the Encapsulating Security Payload (ESP) [Ken1998c] provides encryption.
The Source Address and Destination Address present in both versions of IP indicate
the origin and the final destination of the IP datagram. While IPv4 addresses are 4-byte
numbers, in IPv6 they are 16-byte values. Deering's originally proposed 8-byte addresses,
but during the review process people felt that with 8-byte addresses IPv6 will have the
same problem than IPv4 in the next few decades. People then suggested 20-byte addresses,
or even variable length ones, an after much discussion it was decided that fixed-length 16-
byte addresses were the best compromise. To have an idea of the magnitude of the quantity
of addresses we can calculate that, if they were equally distributed over the earth, there
would be about 6.710
23
addresses per square meter (a little bit more than the Avogadro
number). Nonetheless, part of those 16 bytes carries routing information, allowing routers
to work efficiently in such a big network [Hin1998].
Finally, we have the Flow Label field in the IPv6 main header, whose function has not
been strictly defined in the IPv6 specifications. It is still an experimental field that may be
used by a source to label sequences of packets for which it requests special handling by the
IPv6 routers, such as non-default quality of service or real-time service.
Nevertheless, even though IP is the most important protocol in the Internet because it
makes possible communication among remote networks, it is just a tiny part of the whole
suite of protocols that the Internet uses. Figure 2-7 shows some of those protocols and their
relation with the OSI reference model.
At the lowest protocol levels, the Internet can use any of the available technologies,
from Token Ring or Token Bus, to even high performance Local Area Networks (LANs)
such as FDDI (the ones represented in Figure 2-7 are just a few of them). However, one of
the most widely deployed LAN for its use in IP networks is Ethernet
13
(working at any of
the speeds available). In the Internet architecture, both the Physical and Data Link layers
are joined together in a single level, which is usually called the Host to Network layer.

13
That makes that the de facto standard for the length of the IP datagrams is 1,500 bytes, which is the
Maximum Transfer Unit (MTU) of the Ethernet networks.
Background
25

Then, at a higher level we have the IP protocol trying to make a single network out of
a huge quantity of dispersed LANs. In the Internet architecture IP works at the Internet
level, which is the equivalent to the Network layer in the OSI model.
Then, working on top of IP we have the Transport level, in which we usually found
only two protocols: the User Data Protocol (UDP) [Pos1980] and TCP. UDP is very
simple and is used for connectionless applications or unreliable connection-oriented
applications. TCP is a connection-oriented protocol that provides reliability and congestion
control. They have been the most used transport protocols in the last 20 years, but
nowadays we also have SCTP, close to TCP in the functionality it provides but improved.
SCTP is the main topic of this Master's Thesis so we will speak a lot more about it later.
On top of IP there are not only transport protocols. Some control protocols operate
directly over IP without using any other protocol in between. Examples of such protocols
are Internet Control Message Protocol (ICMP) ([Pos1981b] for IPv4 and [Con1998] for
IPv6) that is used to report errors at the IP level, or some routing protocols such as Open
Shortest Path First (OSPF) [Moy1998] used to calculate the routing tables.

Figure 2-7: Some members of the Internet protocol suite

In the Internet model we do not have the Session or Presentation layers. Instead,
directly on top of the Transport layer we have the Application layer. The protocols shown
in Figure 2-7 are just a very small portion of the total. Apart from Telnet (used for remote
login), FTP (used for file transfer) and the Simple Mail Transport Protocol (SMTP)
[Kle2001] (used for e-mail transport) shown in the figure, there are many more protocols
in use in the Internet, such as Network File System (NFS) [She2000] (used to provide
transparent file access for client applications) or Simple Network Management Protocol
(SNMP) [Cas1990] (used to manage remote systems through the network). Most of those
protocols can use either TCP or UDP, but usually one of them is preferred
All in all, IP networks provide a very flexible well-known packet-switched network
that is able to carry a multitude of protocols designed for the widest range of uses.

2.4 A marriage of convenience: reasons for SS7 and IP networks
integration

The SS7 network and the Internet have been two independent networks that perform
different tasks and provide different services. SS7 is used for telephony signaling and the

ETHERNET

TOKEN
RING

IP

SCTP

TCP

UDP

TELNET

FTP

SMTP

Physical

Data Link

Network

Transport

Application
7

4

3

2

1
Background
26

Internet for data transfer in packet-switched networks. However, in the last years the
services they offer are merging. On one hand, the telephone users are demanding new
services that involve access to the Internet. On the other hand, there has been a
development in the services provided by IP networks so that they are able to assure certain
levels of quality. This makes IP networks more suitable for the transport of more delay
sensitive data such as speech or telephony signaling.
In the next subsections we review some of the most important reasons why it is
convenient to merge SS7 and IP networks.

2.4.1 Voice over IP

During the last five years, IP Telephony or Voice over IP (VoIP) has become a hot
topic. Its share of the global number of telephone calls is increasing, and it is becoming
more and more popular. This popularity comes from the fact that it makes a better use of
the resources that transfer the voice stream, and so, the price the companies offer for IP
Telephony can be cheaper, especially for long distance calls. However, its bigger problem
is that it can not offer the same quality levels as the PSTN does.
As we have seen, the traditional telephone system is circuit-switched. When someone
makes a telephone call, a dedicated circuit is reserved for that specific call during the
whole call. It then remains open for the whole calling time.
IP telephony in turn does not make any exclusive reservation of any resource, but it
handles the call over the network as just another stream of data. Normally, an IP telephony
user dials a toll-free number and connects to the IP telephony gateway, dialing also the
necessary information such as his account number and the destination telephone number.
The gateway bridges the public telephone network and the IP network providing the
service, and it is in charge of receiving the voice stream, compressing the speech and
transporting it through a public or private IP network to the receiver gateway. That
gateway will connect to the call receiver using the local telephone network, decompressing
the IP packets and passing the voice stream to the desired subscriber.
The main difference between these two schemes is that the long distance carrier is
replaced by an IP network. So we convert a long distance call into two local calls plus long
distance IP transport. Thus the IP Telephony Service Provider (ITSP) can offer a cheaper
price to its customers
14
. The costs of transporting the speech using an IP network are lower
than those of a long distance carrier, as the whole facilities are shared among all the users
and there is no dedicated channels. If we have a dedicated full-duplex circuit to transmit a
telephone conversation we make poor use of it, as most of the time at least one of the
parties will be silent (at least that is the idea) and its channels unused.
Initially, Internet telephony meant the existence of some software that was able to
establish a phone call with another computer also connected to the Internet. Those calls
were free but offered a poor quality at the beginning, as those programs were mostly the
products of someone's hobby. The first of such programs appeared in 1995, and since then
internetworking trials between IP network and PSTN were made. In 1997 the first phone-
to-phone service was launched. Presently, many ITSP offer long distance calls.
However, the PSTN and the Internet have very different characteristics that presently
make difficult the use of IP networks for transporting voice. Some of those differences are
summarized in Table 2-1.

14
This is the phone-to-phone IP telephony. If instead of connecting two telephone subscribers, any or
both of the users involved in the call is using a computer connected to the Internet running the right software,
then the price of the call can be as cheap as not paying anything.
Background
27

Description PSTN Internet
Designed for Voice only Packetized data
Bandwidth
Assignment
64 kbps (dedicated)
Full-line bandwidth over a period
of time
Delivery Guaranteed Not guaranteed
Delay 5-40 ms (distance-dependent)
Not predictable (usually more
than PSTN)
Cost for the
Service
Per-minute charges: long distance
Monthly flat rate: local access
15

Monthly flat rate for access
Voice Quality Toll quality Depends on customer equipment
Quality of
Service
Real-time delivery Not real-time delivery

Table 2-1: Differences between the telephone and IP networks

One of the biggest differences they have is the Quality of Service (QoS)
16
they
provide. While the PSTN has been designed to be a highly reliable network in which a
packet is rarely lost or delayed, Internet is on the contrary just a best-effort network that
every now and then losses a packet
17
and that can not provide even a delay limit, which can
severely damage speech quality. The QoS offered by VoIP is highly dependent on network
congestion, degrading as the available bandwidth decreases. This problem can be faced by
simply adding bandwidth, but this would be simply a temporary solution. More appropriate
network-based mechanism must be used in order to guarantee the necessary bandwidth to
services such as VoIP and helping carriers to minimize their costs while still achieving a
satisfactory QoS.
Nevertheless, the future in this aspect does not seem to be hopeless. The IETF has
developed several technologies to add QoS features to IP networks that could address the
problems originated by IP telephony and transport of real-time media in general. Among
those efforts we can highlight these:

The Resource Reservation Protocol (RSVP) [Bra1997] is used by hosts to request
specific QoS from the network for application data streams. The routers use
RSVP to communicate QoS requests to all nodes along the path of the flow, and
to establish and maintain state. RSVP requests normally result in resources being
reserved in each node along the data path. The desired level of QoS is assured by
reserving the resources beforehand.
The Resource Allocation Protocol (RAP) [Yav2000] is a protocol used by
routers that are RSVP capable to communicate with policy servers within the
network. Where there are not enough resources to satisfy all the RSVP requests,
the policy servers are the ones that determine who will be granted network
resources and which requests will have priority.
The Common Open Policy Service (COPS) [Dur2000] is the base protocol for
communicating policy information between policy servers and routers within the
RAP framework.

15
In U.S. local calls are free (included in the monthly rate) opposed to most of the other countries.
16
The QoS is specified in quantitative or statistical terms of throughput, delay, jitter, and/or loss, or may
otherwise be specified in terms of some relative priority of access to network resources.
17
In fact, TCP needs packet loss to use it as feedback to provide congestion control.
Background
28

The Real Time Protocol (RTP) [Sch1996] is a protocol specially designed to
carry real-time data. It operates on top of UDP and can be used in media-on-
demand or VoIP. It consists of two parts. The data part, called RTP Data
Transfer Protocol, is a thin protocol that provides timing reconstruction, loss
detection, security and content identification. The control part, or RTP Control
Protocol (RTCP), checks the quality of the transmission and controls the state of
the participants. RTP itself does not provide any mechanism to ensure timely
delivery or provide other QoS guarantees, but relies on lower-layer services such
as RSVP to do so.
The Real Time Streaming Protocol (RTSP) [Sch1998] is a control extension to
RTP. It adds functions such as rewind, fast forward and pause.
The Session Initiation Protocol (SIP) [Han1999] is an application-layer control
protocol that can be used to set up, manage and terminate multimedia sessions
(including VoIP). SIP can be used to establish multi-party sessions, and Internet
telephony gateways that connect PSTN parties can also use SIP to set up calls
between them. It also defines new types of Uniform Resource Locators (URL)
that help translating phone numbers into IP addresses and back again (those URLs
were revised in [Vh2000] where we can find tel, fax and modem URL schemes).
The Session Description Protocol (SDP) [Han1998] was defined with the purpose
of describing multimedia sessions to be able to make session announcement,
session invitation and other ways to initiate multimedia sessions. The session
announcement itself is done using the Session Announcement Protocol (SAP)
[Han2000] by multicasting the announcement containing the description of the
session.
The use of Differentiated Services (DiffServ) [Bla1998] [Nic1998] enables
service providers to classify packets with various priorities using the DSCP field
of the IP header. It is expected that routers throughout a network would recognize
those priority labels and give packets certain throughput privileges according to
them.
The Multiprotocol Label Switching Architecture (MPLS) [Ros2001a] essentially
imposes some kind of circuit-switching into an IP network. Packets can be
grouped by tagging them with a common label, which permits expedited passage
through the network. The labels not only inform the routers about the QoS to be
applied but also supersede the routing decisions, that must be done just once and
applied to the whole group of packets.

However, the IETF is not the only standardization organization that has published
standards in order to help the development of VoIP. ITU-T has published its
recommendation H.323 [ITU2000] that deals with multimedia communication services
(especially audio) over packet-switched networks that may not provide the necessary QoS
(such as Internet). Since the ratification of H.323 in 1998 and its posterior revisions, this
recommendation has been widely adopted to provide interoperability between VoIP
products over local and wide area networks.
However, once a reasonable quality for the speech transported by IP networks has
been achieved, there are some other tasks to solve. One of the biggest issues related with IP
telephony has been its limitations to interoperate with the PSTN. VoIP gateways are able
to provide a means for the transport of a raw voice stream, but much of the services
provided by the PSTN come from the signaling network it uses: the SS7 network.
Background
29

The functionality provided by the SS7 network to carriers includes a wide range of
features, from simple caller identification to more complicated IN-based features. Only
when proper interworking between IP networks and SS7 is provided, the VoIP services
will be widely adopted by customers. As a simple example, without a complete SS7
interconnection, the ITSPs have to continue with their cumbersome multi-state dialing
practices (the subscribers must firstly dial to the gateway, then their customer ID and
finally the number of the callee).
Moreover, a true integration of the voice (SS7) and data (IP) networks would
introduce the long run benefits of VoIP supporting multimedia and multi-service
applications, something that today's telephone system can not compete with. Beyond
replacing the circuit-switched network, VoIP has the potential of making phone service as
flexible and programmable as email and web service, speed the availability of multimedia
communications, as well as integrating phone service with existing common Internet
services. Today, most of the interest in VoIP comes from its cheaper costs in long distance
call. However, in the future, the real benefits of VoIP will come from possibility of
offering these new services.
And it is not only a matter of the service offered. An integration of the voice and data
networks would allow more standardization and would reduce the total equipment needed.
Also, having a single network both for voice and data would make its management easier.

2.4.2 The 3
rd

Once the telephone was invented and the electromagnetic wave propagation was
studied enough to be able to become a means of communication (firstly using the Morse
code, which was the first digital code ever used), the next step forward was freeing the
telephone from its wire boundaries and making it wireless.
The first mobile telephone service was provided in U.S. in the 1940s, and in the early
1950s in Europe. They were analog car phones restricted in its mobility and number of
subscribers. They were bulky and expensive, very susceptible to interferences, with a very
high power consumption and poor speech quality. In the early 1980s there were about one
million subscribers worldwide.
In the late 1970s and early 1980s, the introduction of cellular systems was a quantum
leap in the mobile communications. Thanks to semiconductors and microprocessors, new
lighter, smaller and more sophisticated phones became a reality. These early cellular
systems that were only able to transmit analog voice, are known as the 1
st
Generation
Mobile Telephony (1G), the most prominent ones being the Advanced Mobile Phone
System (AMPS) in America, part of Europe and Russia, Australia and part of Asia; the
Nordic Mobile Telephone (NMT) in the Nordic countries, and Total Access
Communication System (TACS), in Great Britain. There were about 20 million customers
of 1G by 1990 and it is still in use.
The 2
nd
Generation Mobile Telephony (2G) is the one we use today. It is digital, thus
providing a new range of services such as fax, short messages and data transmission, even
with the possibility of encryption. Moreover, it provides advanced mobility services
(roaming), that make possible for customers to move to areas were different telephone
companies operate while still having service (as far as they use the same technology). The
most successful of the 2G cellular standards is Global System for Mobile Communications
(GSM), born in 1991
18
and supporting about 66% of the some 860 million users of mobile

18
The first public GSM call was made on 1
st
of July 1991 in a city park of Helsinki, Finland.
Background
30

telephony in July 2001 [GSM2001]. GSM is used mostly in Europe but it is spreading to
the urban areas of U.S., and in about 170 countries worldwide, reaching penetrations of up
to 80% in countries such as Finland. Some other important 2G systems are Code Division
Multiple Access (CDMA), mostly used in the Asia-Pacific region; Time Division Multiple
Access (TDMA), still in use in the U.S.; and Personal Digital Cellular (PDC), serving the
customers in Japan.
But 2G networks are far from perfect. There are several standards that are
incompatible making the mobile terminals useless in other areas or countries with a
different technology; the bit rate for data transmission (9.6 kbps in GSM) is far too slow;
the speech quality is good but could be improved; and the users are demanding new
services such as multimedia applications that do not fit in the 2G networks. Whilst a new
generation of mobile phones is being developed, some new technologies have been added
to the GSM networks, such as High Speed Circuit Switched Data (HSCSD) that provides
up to 57.2 kbps by opening several circuit-switched channels; the General Packet Radio
Service (GPRS), allowing data transfer at up to 171.2 kbps using IP packets (but that bit
rate is not available yet); Enhanced Data for GSM Evolution (EDGE), with two versions
based in HSCSD and GPRS respectively, that should provide 384 kbps (but is still under
development); or Adaptive Multi Rate (AMR) to optimize speech quality. GSM, together
with these new technologies, is often known as 2G+.
In this environment, the 3
rd
Generation Mobile Telephony (3G) will be soon a reality.
The 3G networks are being specified by the world wide 3G Partnership Project (3GPP)
19

with the main aim of making a global mobile communication system that provides
multimedia services. The 3G networks (collectively known as International Mobile
Telephony 2000 (IMT-2000), and Universal Mobile Telecommunication System (UMTS)
in Europe) started to be designed in mid 1990s and were supposed to be ready by 2000.
However, the first 3G license was granted in 1999 to a Finnish operator, and it is expected
that 3G networks will start to provide service within year 2002.
UMTS is backward compatible with GSM and also uses SS7 networks for signaling.
However, UMTS places more interest in packet-switching than GSM, using it not only for
signaling but also for user data. UMTS offers up to 1,920 kbps (under certain
circumstances) to be used for multimedia applications, such as videoconference. Having a
dedicated channel of 1,920 kbps for each user is too much, and the resources must be
shared among all of them, so it needs a packet-switched network.
In the first release of the UMTS specifications, 3GPP R99, Asynchronous Transfer
Mode (ATM) was the packet-switched network chosen. This was mostly because it
provides a way of assuring the necessary QoS, and also because its addressing space was
big enough. However, the last advances regarding QoS in IP and the development of IPv6
with its much wider range of addresses, made things change. All in all, most of the servers
containing the data that will be transferred to the UMTS users are expected to be located in
the Internet. Thus it made sense to use IP to transfer that data to the terminals (which will
be IPv6 hosts) through the UMTS network. Moreover, IP networks are cheaper.
But once we have an IP network transporting the user data, it is desirable that the
signaling network is also IP-based so we can use the same network for both purposes.
There have been two more releases of UMTS, 3GPP R4 and 3GPP R5, where the role of

19
3GPP is a joint venture of several standardization bodies: the European Telecommunications
Standard Institute (ETSI), the Standardization Committee T1-Telecommunications (T1) from U.S., the
Association of Radio Industries and Business/Telecommunication Technology Committee (ARIB/TTC)
from Japan, the Telecommunications Technology Association (TTA) from South Korea and the Chinese
Wireless Telecommunication Standard (CWTS).
Background
31

IP has been enlarged. In 3GPP R5, also known as the All IP release, the transport network
utilizes IP networking as much as possible. IP and overlying protocols will be used in
network control too, and the user data flows are also expected to be mainly IP based. In
other words, the mobile network implemented according to the 3GPP R5 specifications
will be an end-to-end packet switched cellular network using IP as the transport protocol
instead of SS7. But the IP-based network should still support circuit switched services, and
UMTS must be compatible with GSM. This means that we will still need a way to use the
SS7 protocols in our IP network. As a result of this situation, the new 3G networks needed
a way of carrying SS7 messages over IP.
The interested reader can find a really good introduction to 3G networks in [Kaa2001].

2.5 This is what we were looking for

While people in the Multiparty Multimedia Session Control (MMUSIC) working
group of the IETF were in charge of providing the necessary means to improve the QoS
capabilities of the IP networks, a new working group was founded on November 23
rd
of
1998, Signaling Transport (SIGTRAN), with the mission of addressing the transport of
packet-based PSTN signaling over IP networks.
The way of facing such a task was keeping both the SS7 and IP stacks and defining an
interface that would make possible transporting both the voice stream and the SS7
signaling data through IP networks. So, as stated in the SIGTRAN working group page:

"The primary purpose of this working group will be to address the transport of
packet-based PSTN signaling over IP Networks, taking into account functional
and performance requirements of the PSTN signaling."

The first step was to produce an informational document [Ong1999], published in
October 1999, identifying functionality and performance requirements to support
telephony signaling over IP. Signaling messages have very stringent loss and delay
requirements, and also security and resilience must be addressed. That document, among
other things, defines the architectural model shown in Figure 2-8.
In that figure we can see the gateways that connect the SS7 and IP networks where the
red lines represent voice channels and the black lines represent signaling links. We can
identify the next three elements:

A Media Gateway (MG) terminates PSTN media streams, packetizes the voice
and delivers the packets to the IP network. At the receiver side, it performs the
reverse function.
The Signaling Gateway (SG) is a signaling agent that receives the native
signaling and translates it to send it then through the IP network, and vice-versa.
The Media Gateway Controller (MGC) handles the registration and management
of resources at the MG, with the possibility of authorizing resource usage based
on local policy.

The SS7-IP gateways not only translate and transport the SS7 signaling through the IP
network, but can also receive management messages directly addressed to them. They
provide transparent transport of message-based signaling protocols over IP networks. In
this way, both the media data and the signaling can traverse the IP network and reach the
Background
32

destination, providing the same kind of services that the PSTN offers while making a better
use of the network that carries the voice stream.

Figure 2-8: SIGTRAN functional model

2.5.1 The need of a new transport protocol

Even before the architecture was completely defined, people from SIGTRAN started
to define the protocols to be used to provide such translation from SS7 messages (see
section 8.2). Obviously a transport protocol was needed but there was not an agreement
about which one should be used, and they referred to it simply as the Common Transport
Protocol (CTP). There was an initial attempt not to complicate even more the whole issue
and just use either TCP or UDP. Apart from the fact that both TCP and UDP were
implemented in almost every operating system, they have gone through years of review,
criticism and adjustment, and have been very successful. The expected functionality
supported by the CTP was this [Ong1999]:

Transport of a variety of Switched Circuit Network (SCN) protocol types, such as
MTP3, ISUP, SCCP, TCAP, etc., with the ability of providing a way to identify
the specific SCN protocol being transported.
Provide a common base protocol defining header formats, security extensions and
procedures for signaling transport, and support extensions to add individual SCN
protocols if needed.
Together with IP, provide the relevant functionality as defined by the appropriate
SCN lower layer. That relevant functionality may include:
Flow control.
In sequence delivery of signaling messages within a control stream.
Error detection.
Recovery from failure of components in the transit path.
Retransmission and other error correcting methods.
Detection of unavailability of peer entities.

SSP
MG
MGC
SG
MG
MGC
SG

IP Network

SS7 Network

SSP
Background
33

Support the ability to multiplex several higher layer SCN sessions on one single
signaling transport session. In general, in-sequence delivery is required for
signaling messages within a single control stream, but is not necessarily required
for messages that belong to different control streams. The protocol should if
possible take advantage of this property to avoid blocking delivery of messages in
one control stream due to sequence error within another control stream.
Be able to transport complete messages of greater length than the underlying SCN
segmentation/reassembly limitations.
Allow for a range of suitably robust security schemes to protect signaling
information being carried across networks. Signaling transport shall be able to
operate over proxyable sessions, and be able to be transported through firewalls.
Provide means for congestion avoidance and reaction to network congestion.

UDP was not even considered, and initially they suggested TCP as a candidate to
become the CTP. However, after some detailed analysis it was shown that TCP had some
deficiencies that did not make it suitable for PSTN signaling transport across IP networks.
Among them, we can identify the next ones [Ste2000]:

TCP is a transport protocol that provides both reliable data transfer and strict
order-of-transmission delivery of data. This is what is normally desired, but there
are some applications that need reliable transfer but not sequence maintenance, or
even just partial ordering of the data. An application with such needs is suffering
from the head-of-line (HOL)
20
blocking that TCP produces, causing a delay that
is unnecessary and undesirable.
TCP is stream oriented, and this can be also an inconvenience for some
applications, since usually they have to include their own marks inside the stream
so the beginning and end of their messages can be identified. In addition, they
should explicitly make use of the push facility to ensure that the complete
message has been transferred in a reasonable time.
TCP was never designed to be multihomed
21
. The limited scope of the TCP
sockets makes difficult the task of designing any data transfer mechanism in
which a multihomed host could use several network cards at the same time. This
would provide high availability, often needed in some applications.
TCP does not scale well since The maximum number of simultaneous TCP
connections is dependent on kernel limitations. This is because TCP is generally
implemented at the operating system level.
In TCP there is no possibility of timer control. TCP generally does not allow
application control over its initialization, shutdown, and retransmission timers.
TCP is relatively vulnerable to denial of service attacks. This kind of attacks try
to make unavailable one service, commonly trying to exhaust the resources it
uses. One of such well-known attacks is the so-called SYN attack (more about this
attack in section 4.2).

20
TCP manages messages as a single string of bytes without internal structure. Thus, if we use a single
TCP connection to send several unrelated messages, the receiver will deliver them to their upper user in the
same order they were sent. If a datagram is lost, this will affect all the subsequent messages, which will be
retained until the lost message arrives. This is the HOL blocking.
21
A multihomed host is one that has several network cards, and can make use of a number of IP
addresses at the same time.
Background
34

The transport of PSTN signaling across an IP network is one kind of application for
which all of these limitations of TCP are relevant. There was an initial attempt of
modifying or enhancing TCP to meet those requirements. However, the idea was
discarded, mostly because some other similar IETF investigations on transport issues had
already pointed out how hard it would be. Therefore, they decided to design a new suitable
transport protocol that would operate on top of UDP. Apart from the necessary
functionality that the CTP was expected to provide, some other features were identified as
desirable [Ste2000]:

Ability to discover the Maximum Transfer Unit (MTU) of the path used from the
IP sender address to the IP receiver address, and possibility to fragment user data
to conform to the discovered MTU.
Possibility of sending user messages within multiple streams inside the same
association. Sequenced delivery of the user messages sent through the same
stream, and possibility of order-of-arrival delivery of individual user messages.
Possibility of bundling multiple user messages into a single packet.

Having these fixed objectives, people at SIGTRAN started to work in the design of a
new protocol that could overcome TCP's problems.

2.5.2 A proposal that IETF could not refuse

Late in 1998 at the Orlando IETF meeting, several authors submitted proposals of
protocols that totally or partially met those requirements. One of them was called Reliable
UDP (RUDP) [Bov1999], which supported acknowledged data and retransmissions, but it
did not provide support for multihoming, neither it had any congestion avoidance
algorithm, so it was finally abandoned.
Another proposal was UDP for TCAP (T/UDP) [Ma1998] that included flow control
and reliable data transfer, but was equally abandoned. Yet other protocol with similar
characteristics was Simple SCCP Tunneling Protocol (SSTP) [Sn1999] (an evolution of
the Connectionless SCCP over IP Adaptation Layer (CSIP) [Sn1998] protocol), this one
being able to run on top of UDP or TCP, but again, after two versions the idea was
discarded. The PURDET [Ton1999] protocol was another option, using UDP and
supporting sequencing, flow control, protocol identification, error retransmission and link
loss detection. However, after its first version it was forgotten, as the rest.
People at SIGTRAN were not only looking for new protocols but they also took into
account some ITU-T protocols that could be valid ones, such as Service Specific
Connection-Oriented Protocol (SSCOP) [ITU1994] or H.323 Annex E [ITU1999] or
even RTP, but none of them was considered suitable for the purposes of SIGTRAN.
Nevertheless, there was a proposal submitted by Randall R. Stewart and Qiaobing Xie,
the Multi-Network Datagram Transmission Protocol (MDTP) [Ste1998], which attracted
the attention of the SIGTRAN working group. MDTP started to be designed in 1997,
independently of the SIGTRAN work, as a solution for some of TCP's weaknesses. After
getting most of the general concepts together and having a working implementation, the
authors decided to submit it to the IETF for consideration in summer 1998. In its
preliminary design, MDTP was an application level protocol working on top of UDP that
incidentally met most of the requirements imposed by SIGTRAN to the CTP. This
proposal was the only one supporting multihoming and that avoided the HOL blocking,
and there was even an available implementation working with a performance similar to
Background
35

TCP's. These were good reasons to choose MDTP to become the CTP and during the next
10 months it was improved and eight more versions were written. However, it never
became a Request For Comments (RFC) and it was abandoned as well, the reason being
that it was deeply modified and used as the basis of SCTP.
The acronym by then stood for Simple Control Transport Protocol, but later on they
realized that it was not that simple and that it was not limited to control messages. So the
intention was firstly to change its name to Signaling Common Transport Protocol, but
finally, that name was never used and the protocol was renamed, in its 9
th
version, to the
present Stream Control Transport Protocol.
The change from MDTP to SCTP not only involved a change in the name but also a
deep revision of the protocol itself. It was then when the protocol datagram header and its
internal structure were almost completely modified (see section 3.1) so it became highly
extensible; the cookie mechanism was adopted in the initialization to avoid denial of
service attacks similar to the known SYN attack of TCP (see section 4.2); the TCP
congestion control features [All1999] were included in SCTP (discussed in section 5.2);
and some other features such as stream negotiation (see section 5.3), message bundling and
data fragmentation were also changed (as explained in section 5.1).
Later on, in January 2000 another big change was introduced: the working group
revised the protocol stack to run SCTP directly on top of IP. This change was a very
polemic one because it implicitly meant that SCTP should be implemented inside the
operating system kernel. Thus, SCTP implementations would not be ready within the next
years, having to wait until the operating system vendors had it available in their products.
Moreover, having SCTP inside the kernel would make more difficult (if not impossible) to
have control over the values of the timers and some parameters to adapt SCTP to different
environments. However, the benefits of locating SCTP in its architecturally right place in
the IP stack outweighed all these problems. Moving SCTP on top of IP and having its own
port number space, opened the way to SCTP to become a major transport protocol, at the
same level than TCP, making SCTP useful for a wider range of applications than telephony
signaling transport.
The first version of the internet draft specifying SCTP was submitted in September
1999 and since then lots of modifications were made until late October 2000. Then, the
14
th
version of the SCTP Internet-Draft was raised to the RFC status, and was published in
the IETF as RFC number 2960, a Proposed Standard. During these almost 14 months of
work, the design of the SCTP protocol was discussed daily in a distribution list that
contained more than 1,000 members at some stages of the design, proposing changes and
highlighting specification errors in more than 4,000 messages. SCTP evolved so much
during its design that today it would be hard to say that MDTP was its predecessor.
SCTP was initially designed to be a transport protocol for telephony signaling. It was
not the original idea to design a protocol that could compete with TCP. In fact, in the
second version of MDTP the following paragraph could be read in its Introduction section:

Comparing to traditional TCP [3], MDTP design is more tuned towards a
special set of applications, that is the time critical fault tolerant
applications using redundant LANs. It is not designed to replace TCP as a
general purpose transmission protocol.

However, this paragraph was deleted seven months later, in April 1999, in the 5
th

version of MDTP, when the authors realized that they should not limit the scope of
application of what they were designing.
Background
36

In its long design time, many features were added to the original sketch, most of them
trying to solve problems that were already noticed when using TCP/IP, even if they were
not that important for its main use, PSTN signaling transport across private IP networks.
In February 2001 the discussion about SCTP was moved from SIGTRAN to the
Transport Area Working Group (TSVWG), another working group of the Transport Area
of the IETF. This effectively meant that the SIGTRAN working group was very successful
in designing SCTP to be useful to a wide range of applications, and thus it started to be
thought as a general-purpose transport protocol rather than a signaling-specific one.
Since October 2000 some editorial and technical defects have been found, and it is
planned that in the near future a new version of the SCTP specifications will be released,
making obsolete the present one (more about this in chapter 9) . Moreover, quite many
people already know about SCTP and they are writing extensions, so some valuable or
needed features can be added to make SCTP more suitable to work in different
environments. The nice extensibility capabilities of SCTP make this a relatively easy job,
and the present extensions will be discussed in section 8.1.
SCTP could be used as the main transport protocol in Internet, substituting TCP in the
future. It is for this reason that SCTP can be thought of as a renewed version of TCP with
extended capabilities, and, used together with IPv6, it is expected to change in the future
the way in which information is sent and received in the Internet.
The Design of SCTP: Datagram structure
37

3. THE DESIGN OF SCTP: DATAGRAM STRUCTURE

During its long design time, SCTP features were increased. People from the
distribution list where its design was debated were sending valuable comments that were
shaping the structure of the protocol itself. All the modifications to the protocol were made
after rough consensus was reached. Thus, the final specification of SCTP (contained in
RFC 2960) is the result of the joint work of lots of people specialized in areas varying from
checksum protection to IP routing. They contributed with their ideas and advice, pointed
out errors or tested their own implementation during the test sessions discovering possible
points of failure and improvements.
In this chapter we will take a look to the internal structure of SCTP. As in the
following chapters, we will not only explain what is written in the SCTP specifications, but
also what is not there. Will discuss about the motivations that made the designers to choose
some specific solutions and not others, and what were the reasons to include certain
features while others seem to be absent. We will explain the design of SCTP from a
historic point of view, highlighting the main pitfalls discovered in its evolution and design
errors that were late corrected.
However, in this chapter we will just show the shape of the SCTP datagram, quickly
reviewing the function of its fields. We will also briefly introduce the state diagram that
represents the behavior of SCTP. In the next chapters we will go further, explaining the
way SCTP performs its tasks.

3.1 Shape of SCTP datagrams: An evolution from MDTP

The internal shape of SCTP datagrams has completely changed since the first version
of MDTP. Its features have been highly improved and many mistakes have been solved.
But the basic ideas remained and the final design of SCTP is internally much closer to
MDTP than it could be thought at a first sight.
As SCTP is an evolution from MDTP, we will speak first about MDTP's header and
internal structure as it was in its first version. Then we will discuss its final design in SCTP
as published in [Ste2000].

3.1.1 Common header and internal structure of MDTP

When the first MDTP version was submitted to the IETF editor in August 1998 the
datagram structure was TCP-like and looked like Figure 3-1 (numbers on top of the figure
mean the bit position):
As we can see from Figure 3-1 the datagram format reminds the TCP one, revealing
which protocol was the basis to start designing MDTP. TCP is nowadays the most
successful of the transport protocols used in the Internet, and so, it was the best reference
to start with. Later, with the evolution of MDTP and SCTP, the similarities with TCP
stayed more in the internal behavior than in its external shape.
We are not going to explain in detail the meaning of the fields in the first version of
the MDTP protocol, but we can note the next similarities and differences with TCP:
38

Figure 3-1: MDTP datagram structure in its first version

Every MDTP datagram had an overhead of 8 bytes containing the identifier of
MDTP itself, while TCP does not have anything like that. The reason of this is
that MDTP was designed as an application protocol running on top of UDP, and
then the IP header field that identifies the protocol carried would always be set to
17, identifying the UDP protocol. But it is important for proxies, firewalls and
even for the routers, to know the protocol that is carried in the IP datagrams,
because then they can decide better what to do with them. So, these 8 bytes
overhead were a lesser evil.
The MDTP Protocol Identifier 1 and MDTP Protocol Identifier 2 fields were
originally set to the hexadecimal numbers F7873072 and 17074012 respectively.
As it is not that uncommon when designing a new protocol, those values were
chosen randomly and used because it was highly improbable that any data carried
by UDP would start by these 8 bytes.
Later on, it was decided that this way of identifying the MDTP protocol was
not clean, as one should have to dig inside application data carried in the UDP
datagram to know what was being transported. So, it was accepted that certain
UDP port numbers would be used when sending MDTP datagrams to help
identifying them. However, to allow protocol multiplexing (so other protocols can
share the same UDP port) the 8-byte identifier was not immediately removed.
Firstly, the MDTP Protocol Identifier 2 field disappeared in the 7
th
version of
MDTP, and then the MDTP Protocol Identifier 1 field was reduced from 32 to 28
bits in the next version. Later, in the 9
th
version, it was made optional (sharing its
28 bits with an optional Cyclic Redundancy Code (CRC) instead). Finally, the
MDTP identifier was completely removed when the first version of SCTP was
released as its identification inside UDP datagrams relied exclusively in the use of
a reserved port (a port that, by the way, was never specified).
But that was not a good solution either. Having the port number as the only
way of identifying SCTP datagrams encapsulated in UDP would eventually limit
the number of associations between two single homed hosts to just one.
This problem was finally eliminated when SCTP started to run on top of IP
and was given the protocol number 132 as its identifier.

Data
N
O
G
N
O
V
W
I
N
I
S
B
F
I
R
R
E
S
D
A
T
A
C
K
B
R
O
S
H
U
W
N
R
R
E
1
R
E
2
B
U
N
G
A
R
U
N
R
Version In Queue
Data Size Part Of
Sequence Number (Send)
Acknowledgement Number (Seen)
MDTP Protocol Identifier 2
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
39

The Acknowledgement Number and Sequence Number were completely
equivalent to the fields of TCP with the same name. The change in their relative
position is anecdotal.
The Data Size field was included at the beginning in the MDTP header. TCP has
the Data Offset field, which includes only the length of the header. This is enough
since the IP layer tells how large is the entire TCP datagram. This did not work
with MDTP because it was designed to add padding (bytes set to 0) at the end of
the datagram to make its length a multiple of 4 bytes, and those bytes are not part
of the user data. This padding (originally optional) was added because most of the
computers used nowadays read data in pieces of 32 bits or larger. So, it is
effectively quicker to read 4 bytes in a row even if only one of them is useful.
This field was moved from the header to the chunks
22
in the first version of
SCTP, as it is explained later in section 3.1.2.
MDTP was message oriented, which is quite a big difference from TCP. TCP just
manages a stream of bytes, and it is the application that delimits the different
messages carried in the same flow of bytes and cut that stream in pieces. In
MDTP you sent messages, identified by the Part and Of fields. If the datagram
contained a whole message, then Part was set to 0 and Of was set to 1 (note that a
single datagram could contain more than one message, all bundled together to
make the transmission more efficient, saving the bytes of the header). Otherwise,
Of told to the receiver the number of fragments of the message and Part indicated
the order of the fragment (from 0 to Of 1) so the message could be correctly
reassembled.
That made that the length of the biggest message transmitted through MDTP
was 255 times the MTU of the network used to transmit the datagrams minus the
IP and MDTP headers. If we are using an Ethernet, whose MTU is 1,500 bytes,
and IPv4 as the network protocol, with a typical header without options of 20
bytes, that makes 255(1,500 20 24) = 371,280 bytes. Even though this value
should be more than enough for a single message, the mechanism was later on
modified in SCTP making the maximum length of a message technically infinite.
There were 16 flag bits grouped in two bytes called Flags and Mode. This was
similar to TCP, with the difference that the added functionality in MDTP needed
more bits to perform correctly (only two of those bits were free, the RE1 and RE2
ones). Those bits were used as a negotiation during the establishment phase to ask
for optional services, and also to tell the receiver which was the internal structure
of the Data field (that could contain several kinds of information as well as user
data).
The two free bits were already used in the 4
th
version of MDTP, limiting the
possibilities of extension of MDTP. Moreover, having such a big quantity of flags
was somehow ugly and difficult to manage, making the datagram process quite
clumsy. In the 8
th
version, the MDTP datagram was highly transformed, including
a Control Parameter Part and Data Part areas. The flags were replaced by 2 bits
indicating if either the control and/or data areas were present, plus a 6-bit
identifier of the control parameter. In addition, 8 bits were reserved for future use.
The Version field represented the version number of the protocol. This field was
reduced to 4 bits in the 8
th
release of the MDTP specifications and finally
discarded in the 6
th
version of SCTP. The way SCTP was designed made it so

22
A chunk is a unit of information within an SCTP packet, consisting of a chunk header and chunk-
specific content.
40

easily extendable that a Version field did not make much sense. If you can not
extend SCTP to include the feature you want to add, probably you actually need a
new protocol, not a new version of SCTP.
The In Queue field contained the number of messages the sender of the datagram
had in its incoming queue, waiting to be read by the application. It was used for
flow control purposes. This field was equivalent to the Window field in TCP, with
the difference that it indicates unread messages, not bytes.
There was a big discussion about the use or this field, and it was deleted from
the header in the 8
th
version of MDTP. It was agreed that the information about
data sent but not yet acknowledged (referred to as outstanding data) was enough
for the congestion avoidance algorithms, and so this field was a waste of space in
the header.
But as soon as the reference implementation of MDTP was updated, it was
noted that the knowledge about the outstanding data did not provide the necessary
information about the state of the receiver's incoming buffer. It is clear that if the
receiver acknowledges the receipt of certain datagrams but not the previous ones,
and if they have to be delivered in order to the upper user, the received data must
be occupying space in the MDTP buffer, but the opposite is not true at all. Even
when there was no outstanding data, the receiver's buffer could be full if the upper
user did not retrieve the data received. If the buffer is full, all the incoming data
will be discarded, and we waste network resources. The point is that there are two
different problems to be addressed: congestion control (which is related with the
network) and buffer control (which happens at the receiver side).
The In Queue field was a useful hint to the data sender about the state of the
receiver's buffer. However, it would be more useful if the information carried in
that field would be expressed in bytes, not in messages. So the final decision was
that the receiver's buffer size would be exchanged during the establishment phase
and that a similar field would be used again (called Advertised Receiver Window),
this time not in the header but in the acknowledgement chunks. This change was
made in the first version of SCTP.
As its name tells us, the Data field carried the user data, but this was not
completely true. Apart from data, and depending of the value of the Flags and
Mode fields it could carry MDTP's control information related with its internal
behavior. That made the Data field kind of wildcard field that could be used for
almost anything. This was not a neat design, and was changed in the 8
th
version of
MDTP, differentiating a Control Parameter Part and a Data Part fields. This
structure evolved and was converted finally into control chunks and data chunks.

3.1.2 Common header and internal structure of SCTP

We have just revised the initial structure of MDTP. About 27 months later, SCTP
looked like it is shown in Figure 3-2. As we can see there, SCTP's common header is
completely different to MDTP's one, and it is far less complex. That makes SCTP
datagrams easier to process. Nonetheless, the internal structure is somehow more
elaborated as we can see that an SCTP datagram contains several structures at different
levels.

41

Figure 3-2: Structure of SCTP datagrams

Every single SCTP datagram has a common header of 12 bytes, followed by one or
more structures called chunks. The common header has the following elements:

In January 2000 SCTP became a transport protocol running on top of IP.
Therefore, the information carried in the UDP header had to be moved to the
SCTP header. The Source Port Number and the Destination Port Number had to
appear in the SCTP header.
However, having SCTP at the same level as protocols such as TCP or UDP
also resulted in that SCTP had to face the interactions with other protocols also
running on top of IP. One of those protocols, ICMP [Pos1981b], is the one in
charge of telling the IP users about problematic situations such as lack of buffer
space in a router, an unreachable address, or inefficient routing tables. ICMP was
initially defined for IPv4, but it has an IPv6 version (ICMPv6) [Con1998].
When there is any situation in the IP network that triggers the transmission of
an ICMP message, that message includes the beginning of the IP packet that
originated the anomalous situation. The ICMPv6 messages include as many bytes
Common
Header

Parameter Value
Parameter Type Parameter Length
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Parameters or Error Causes

Fixed Fields
Chunk Type Chunk Flags Chunk Length
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Cause Value
Cause Code Cause Length
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Chunks
Checksum
Verification Tag
Source Port Number Destination Port Number
00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
42

of the original IP datagram as will fit without the ICMPv6 packet exceeding the
minimum IPv6 MTU (1,280 bytes), while the ICMP messages contain only the
first 8 bytes of the IP datagram. As the Source Port Number and the Destination
Port Number fields are vital to identify which SCTP association triggered the
sending of the ICMP message, those two fields had to be located in the first 8
bytes of the SCTP datagram.
When SCTP was modified to run directly on top of IP, it was also agreed that
all the TCP ports used for well known applications would be automatically
reserved in the SCTP port address space. That would not only make easier the
migration of application from TCP to SCTP, but also would avoid a lot of
bureaucratic work related with the Internet Assigned Numbers Authority
(IANA)
23
.
The Verification Tag field is the evolution of the Acknowledgement Number and
the Sequence Number fields in MDTP. It plays the same role as those fields in the
establishment phase (together with the Initiate Tag field of the INIT chunks as we
will see in chapter 4), but it also gives protection against blind attacks. It is a 32-
bit integer, randomly chosen and exchanged in the establishment phase. It is used
without modifications thorough the whole life of the association to validate the
incoming datagrams as being really sent by the peer endpoint and not by an
attacker. Otherwise, it would be really easy for an attacker to forge one of our
peer's addresses and, for example, abort the association.
As happened with the Source Port Number and the Destination Port Number,
the position of the Verification Tag became an important issue when it was
decided that SCTP should run directly on top of IP. Some ICMP and ICMPv6
messages can be used to make attacks to TCP, and they could also be used to
attack to SCTP in the same way. One of these types of messages is the ICMP
Source Quench message (note that ICMPv6 does not have this kind of message).
That message is typically sent by a router that suffers from lack of buffer space, or
that is receiving IP datagrams at a higher rate it can process them, so it has to
discard those datagrams. The arrival of this kind of message to a TCP sender
causes the congestion window (cwnd) to be set to one segment, initiating slow
start (see section 21.10 of [Ste1994]). That effectively causes that the sender is
allowed to have less outstanding data and so the speed of data transmission may
decrease. This is because the sender must wait for the acknowledgement of the
data already sent before transmitting again. If the Round Trip Time (RTT) is
bigger than the time the TCP sender takes to send cwnd bytes, it will stay idle part
of the time waiting for the acknowledgements.
The MTU discovery algorithm is described in [Mog1990] for IPv4, and in
[McC1996] for IPv6. It relies on the use of the ICMP Destination Unreachable
message (with a code meaning Fragmentation Needed and DF Set) and the
ICMPv6 Packet Too Big message. These messages are sent by a router to the
sender of the IPv4 packet when the Don't Fragment (DF) bit in the header of the
IPv4 datagram is set (thus meaning that this IPv4 packet can not be fragmented,
see section 2.3.2) but it should traverse a network whose MTU is smaller than the
size of the packet, and so it must be discarded. In IPv6, an ICMPv6 Packet Too
Big message must be sent by a router in response to a packet that it can not
forward because the packet is larger than the MTU of the outgoing link (note that

23
IANA is the organization that assigns the port numbers, protocol identifiers, IP addresses, domain
names, and basically every number that identifies something in the Internet.
43

in IPv6 the routers never fragment IPv6 datagrams). When any of these two
messages is received, TCP decreases the segment size and retransmits the segment
that triggered the sending of the ICMP message (see [Ste1994], section 24.2). The
smaller the segment size, the bigger the overhead of the IP and TCP headers,
making the transmission less efficient, so the segment size should not be reduced
unless it is unavoidable.
An attacker could easily send us any of those ICMP messages to affect our
sending capabilities reducing our throughput. Therefore, the SCTP header was
designed so the Verification Tag is present in the ICMP messages. When
processing a received ICMP packet, the association affected is determined by
using the Source Address and Destination Address fields (carried in the IP header
inside the ICMP message), and the Source Port Number and Destination Port
Number (carried in the first 4 bytes of the SCTP header also included in the ICMP
message). Then, to validate that this ICMP packet was not sent by an attacker, the
ICMP message is validated by comparing the SCTP Verification Tag also carried
in the ICMP message and checking that it is the right one.
This field appeared in the first version of SCTP. When SCTP was decided to
run directly on top of IP, in the 6
th
version, it was agreed that this field should be
placed within the first 8 bytes of the header, for the reasons stated above.
The Checksum field has proved to be quite a controversial one. As we have seen,
the first versions of MDTP did not have any kind of checksum, because, as it was
transported by UDP it seemed that the checksum it provides would be enough.
Later, in the last MDTP version, an optional Cyclic Redundancy Check of 16 bits
(CRC-16) as the one standardized by the ITU-T (see [ITU1996], section 8.1.1.6.1)
was added to the header. This approach was maintained in the first five versions
of SCTP.
Then, in the 6
th
version of SCTP it was modified again to be a 32 bits
checksum, the Adler-32 Checksum [Deu1996], and this is the one that have been
finally included in the SCTP specifications. But later on it was agreed that Adler-
32 provides a weak protection against the detection of errors in small frames, and
taking into account that telephony specific messages typically use packets of less
than 128 bytes, another kind of check should be used. The discussion has been
long, but it is agreed that the Cyclic Redundancy Check of 32 bits (CRC-32)
standardized by ITU-T (see section 8.1.1.6.2 of [ITU1996]) will be used. This
checksum issue will be further discussed in section 9.1.

After the common header, there must be at least one chunk. A chunk is an independent
structure with its specific identifier and meaning but with a common Type-Length-Value
(TLV) structure. They are used in a wide sense to send requests to the peer endpoint and
receive the answers for those requests. The chunks initially defined in [Ste2000] are the
next ones:

The Initiation (INIT) , Initiation Acknowledgement (INIT ACK) , State Cookie
(COOKIE ECHO) and Cookie Acknowledgement (COOKIE ACK) chunks.
They are used in the establishment phase.
The Payload Data (DATA) and the Selective Acknowledgement (SACK)
chunks. They are used for the data transfer.
44

The Heartbeat Request (HEARTBEAT) and Heartbeat Acknowledgement
(HEARTBEAT ACK) chunks. These chunks are used to track the state of the
different network interfaces used in the association.
The Operation Error (ERROR) chunk, which is used to report a non-fatal error.
The Shutdown (SHUTDOWN) , Shutdown Acknowledgement (SHUTDOWN
ACK) and Shutdown Complete (SHUTDOWN COMPLETE) chunks. They are
the ones used during the graceful termination of the association.
The Abort (ABORT) chunk, which reports a fatal error and terminates the
association.
The Explicit Congestion Notification Echo (ECNE) and the Congestion
Window Reduced (CWR) . However, these two chunks are not really defined in
[Ste2000], merely their identifiers have been reserved. The reason for this is that
the Explicit Congestion Notification (ECN) mechanism was still being studied
(and there has not been a great advance in its application for SCTP so far), being
the work finally published in [Ram2001].

This structure is what makes SCTP most different from TCP. As we have seen before,
the initial design of MDTP was quite close to TCP, with lots of flags and fixed length
fields. SCTP structure tried to avoid two problems present in TCP:

TCP has a major problem with its extensibility possibilities. TCP has only 40
bytes of space to include options in its header, and six free bits (with the extension
of TCP for ECN [Ram2001] only four bits remain unused). This makes TCP
hardly extendable (and that is exactly why a new protocol was needed), so the
designers of SCTP tried to avoid being in this same situation in the future.
The initial design including many flags was abandoned. When TCP was designed,
one of the key design principles was to make it as efficient as possible in terms of
the overhead produced by its header. If something made the processing of a TCP
datagram somehow harder, but saved one byte of its header, then it was a good
choice. There are even standards such as the one depicted in [Jac1990], which
defines a method to compress TCP's header from its typical 20 bytes to an average
of 3 bytes. This is possible due to the similarity between TCP headers in segments
belonging to the same connection.
These efforts can be easily understood if we take into account that in the
1970's and 1980's, when TCP started to be used, the data links could go as fast as
several tens of kilobytes per second in the best case. But this is not the case
anymore. Nowadays it is quite common that your computer is connected to the
Internet through a 100 Mbps Ethernet card, more than 1,000 times faster than the
connections used about 30 years ago. So, the key features for a transport protocol
presently is not if its overhead is a little bit bigger or smaller, but if it is simple,
easy to extend, and its datagrams are fast to process
24
. Some tests made by
Randall R. Stewart, one of the parents of SCTP, showed that it was quicker to
process TLV structures than checking the value of specific bits inside a byte.

24
However, when not using a fast LAN to send the TCP/IP packets, but a radio link of small capacity,
compressing the headers can greatly improve the performance. Compression not only lowers the header
overhead allowing the use of small packets for delay sensitive low data-rate traffic and improving the
interactive response time, but also reduces packet loss rate over lousy links because fewer bits are sent per
packet. One of the latest efforts to compress TCP/IP headers (both IPv4 and IPv6, including the IPv6
extension headers) is published in [Pri2001].
45

Moreover, the extensibility possibilities reached with this model (which is the
same one used in IPv6) are excellent. Some more bytes of overhead are worth
these features, especially if there is even the opportunity of bundling several of
those structures inside a single datagram, saving then the space of the common
header.

Therefore, the first version of SCTP was kind of a revolution in this aspect as it was
deeply modified from the last MDTP version. As can be seen in Figure 3-2, all the chunks
follow a basic guideline in their definition. They have the following fields:

A chunk first has a one byte field, the Chunk Type, that is what identifies one
chunk from another and what tells the receiver what to do with it. As we have
stated before, there are 15 values defined for the Chunk Type field in the
[Ste2000] (all the values from 0 to 14), and the rest are reserved by the IETF.
Originally, the value 254 was used for vendor-specific chunk extensions.
That was designed when SCTP was running over UDP and when every company
interested in SCTP would implement its own version. But as SCTP was evolving,
it was placed to run on top of IP, and moreover, there were plans to include its
code inside the kernel of the operating system (this has been already done for
UNIX / LINUX). So it would be more likely that companies interested in SCTP
would directly buy it from their local operating system provider. This would make
expensive for a single company to ask for a specific extension to SCTP. Also,
there was an increasing feeling that the possibility of letting vendors define and
use their own chunks would lead to future interoperability problems. But that was
not necessarily true, and it has been shown in the past that not having such a
vendor-specific extension field in other standards did not stop a company from
defining non-interoperable versions of an IETF protocol and forcing the industry
to adapt to it. A vendor extension is only useful between two hosts that understand
the extension (normally from the same vendor), in which case the vendor could
introduce its new functionality in any way it wanted.
In a discussion in the distribution list about the issue, it was agreed that
defining this kind of non-interoperable extension would tempt people to use it,
and so, this possibility was removed from the 11
th
version of SCTP. It did not add
any value to the protocol, and vendors could still propose extensions to IETF
meeting their needs (at the end, SCTP was defined by several vendors discussing
in a distribution list).
As the chunks can be created to fulfill the needs created by any new feature,
the receiver can find new chunk types that it may or may not understand. The
design group could have easily chosen to simply discard the chunks that are
unknown, but this does not always work. Sometimes the sender of the chunk must
know if the receiver understood it or not, and some other times, the processing of
that chunk can be of vital importance to continue processing the rest of the
datagrams. So, the first two bits of the Chunk Type field tell the receiver what to
do in case it does not recognize the chunk. Depending on their value the next
actions should be performed:

00: The receiver should discard the whole datagram, not processing any
further chunks within it.
46

01: Same as with 00 but also reporting to the sender that it did not recognize
the chunk.
10: The receiver should discard this chunk, but continue processing the rest.
11: Same as with 10 but also reporting to the sender that it did not recognize
the chunk.

With this convention, the negotiation phase to request for new services is
simpler, because if the receiver does not have that feature it can always be pushed
to send a negative answer to our request. This idea was taken from IPv6, where
the first three bits of an unknown option type are used to tell the receiver whether
it has to discard the whole datagram or just skip the option, if it has to send back
an ICMPv6 message, and if the option can be modified en-route.
The next field is the Chunk Flags field. The structure of flags was abandoned as
we have explained before, but it did not mean that it was completely useless.
Usually, some of the fields of the chunk can be expressed as a Boolean value, and
for those the flag structure is still valid. The meaning of the flags is chunk
dependent and only three chunks defined in [Ste2000] have flags: the DATA,
SHUTDOWN COMPLETE and ABORT chunks.
As the length of the chunks can be variable, it is necessary to include a Chunk
Length field.
Not all the chunks have Fixed Fields. Some of them do not need to provide any
more information than their own Chunk Type. Nevertheless, most of them have
some other fields to give extra information. Of course, the structure of the Fixed
Fields is chunk dependent.
If the chunk has to include any optional or variable length data, it can carry
parameters.
Finally, if the Chunk Length is not a multiple of four bytes, some padding bytes
must be appended at the end. They are compulsory, and are included because
nowadays most computers use at least 32-bit buses and read the data in pieces of 4
bytes or more, and padding makes their buffer management easier.

The parameters are similar to the chunks but at a lower level. They were created to be
able to include optional or variable length information inside the chunks, and to provide
more possibilities of extension (it should be noted, however, that chunk types 63, 127, 191
and 255 are reserved for IETF-defined chunk extensions). As can be seen in Figure 3-2
they have the following structure:

The parameters were designed to make virtually unlimited the possibilities of
expansion of SCTP. Therefore, they have a Parameter Type field of two bytes.
These 65,536 different parameters should be more than enough. As a hint, only 8
parameters have been defined in [Ste2000], and another one has been reserved for
ECN (some others have been defined in some Internet-Drafts specifying SCTP
extensions, see section 8.1).
As in the case of the chunks, the first two bits of the Parameter Type tell the
receiver what to do in case it does not understand it. The behavior is basically the
same one as described above, but, as we are in a lower level, when the first bit of
the Parameter Type is set to zero the receiver should not discard the whole
datagram but the whole chunk instead.
47

As the parameters have variable length, there is a Parameter Length field that
tells the receiver about the size of the parameter.
The Parameter Value contains the actual information (note that the Parameter
Value is optional, the Parameter Type alone can be enough in some cases).
As in the case of the chunks, there are also padding bytes at the end of the
parameter if its length is not a multiple of four bytes.

One parameter normally can be sent inside only one type of chunk, but this is not
necessary, so the Parameter Type field must be unique across all the chunks.
There is yet another structure used inside SCTP datagrams as Figure 3-2 shows us. It
is the so-called Error Cause. Basically they are the same as the parameters. They have
exactly the same structure, having Cause Code, Cause Length and Cause Value fields,
which are completely equivalent to the Parameter Type, Parameter Length and Parameter
Value fields. The main differences between the error causes and the parameters are:

Error causes are only included inside the ERROR and ABORT chunks (more
about them in sections 6.2 and 7.2 respectively). They inform about some
problematic situation, such as the receipt of an unrecognized chunk or parameter,
lack of resources to open a new association, or a type of address that can not be
managed. These two types of chunks can not carry parameters.
The first two bits of the Cause Code field do not have the meaning they have in
the Parameter Type value. This way of specifying how to act upon the receipt of
an unknown error cause is not necessary due to two main reasons. First, no error
cause should be originated by another error cause (this is basically the same idea
as that no ICMP message is sent about ICMP messages). Second, error causes
different than the ones already defined in [Ste2000] will be sent only in response
to a chunk or parameter defined in an SCTP extension. If the sender of the chunk
or parameter that triggered the sending of the error cause knows about that
extension, then it should know about the error cause as well. Thus, in theory a
host could never receive an unknown error cause (unless any of the two endpoints
involved has bugs in its implementation).

As we can see, this design of SCTP is what makes possible one of its key features: its
extensibility. A simple example that shows that lots of different needs will appear is that
SCTP was created to transport several different telephony signaling protocols with
different requirements, and later on its range of use became wider converting it in a
possible replacement to TCP. Thanks to the chunk structure of an SCTP datagram, it is
really easy to design new chunks and parameters that provide a new feature, and defining
new error causes that inform about problems originated by the use of that new extension.
This new feature will always be backwards compatible with implementations of SCTP that
do not support it, since as we have seen, the chunk sender may require to be informed if the
receiver is not able to understand that chunk. The chunk sender will eventually decide if it
can continue without the features provided by the SCTP extension or if it tears down the
association.
There is complete freedom for the design and use of new SCTP chunks as well as
parameters and error causes. The only problem about doing this, is that this new feature
should then be presented to the IETF to be accepted as a valid SCTP extension, and the
time involved in this process can be significant. This is because an agreement about its use
and design should be reached inside the proper IETF working group (SCTP was first part
48

of the SIGTRAN working group, and then it became a matter of TSVWG). As we have
seen with the design of SCTP, it can take a long time.
Nowadays, several extensions of SCTP are being designed. We will deal with them in
chapter 8.

3.2 SCTP association management: The state diagram

As in TCP, the steps required to establish and release associations can be represented
as a finite state machine. In SCTP it has 8 states instead of the 11 that TCP has, as
represented in Figure 3-3. In the figure we can clearly identify the 8 different states as
rounded rectangles (note that the CLOSED state appears twice, and that the state in the
upper part of the diagram labeled with Any State is not another state but means any of the 8
states). The two representations of computers identify the other host from or to which we
receive or send datagrams.
As it is shown in the legend in the right bottom area of Figure 3-3, there are three
types of arrows meaning different things:

The notched arrows with associated text in bold letters mean that the upper user
makes a primitive call, namely Associate (to start an association), Shutdown (to
gracefully terminate an association) and Abort (to abort an association).
The arrows with text in Italics over them represent the control chunks sent to the
peer or received from it. They go from any of the rectangles representing a state to
the other host, and vice-versa.
Finally, the arrows without any adjacent text represent changes in our internal
state. They go from one rectangle representing a state to another one.

The diagram is colorful and many arrows have different starting and ending colors.
This not only has to do with the author preferences, but it also helps to understand better
what the diagram tells us. An upper user primitive call or an incoming chunk usually
triggers a state change and quite often a chunk is sent as well, so the key to identify which
output is related with a specific input is the color. The response to a given primitive call or
incoming chunk is the arrow that has the same starting color as the ending color of the
arrow representing the primitive call or incoming chunk. This applies not only to our state,
but also to the answers of the other host to our outgoing control chunks.
As an example, we see that the incoming INIT chunk arriving in the CLOSED state
appears as a rose terminating arrow. Thus, we have to follow the rose outgoing arrow from
the CLOSE state and we will see that the response to that chunk is that we stay in the same
state and we send an INIT ACK chunk back to the host. Moreover, the arrow representing
the INIT ACK chunk ends in pink color when it reaches the host, and so its answer to the
INIT ACK chunk is the outgoing arrow from the host whose initial color is also pink, that
is it, it sends us back a COOKIE-ECHO chunk. There are two state changes, marked with
the * symbol that are not produced by any represented incoming chunk or primitive call,
but they occur when the peer acknowledges all the possible outstanding data we could
have.
49

Figure 3-3: SCTP connection management finite state machine
COOKIE-WAIT
CLOSED
Any State
ESTABLISHED
SHUTDOWN-
PENDING
SHUTDOWN-
SENT
SHUTDOWN-
RECEIVED
SHUTDOWN-
ACK-SENT
CLOSED
A AB BO OR RT T
I IN NI IT T A AC CK K
I IN NI IT T
C CO OO OK KI IE E E EC CH HO O
S SH HU UT TD DO OW WN N
C CO OM MP PL LE ET TE E
S SH HU UT TD DO OW WN N C CO OM MP PL LE ET TE E
A AC CK K
SHUTDOWN
ABORT
ASSOCIATE
ESTABLISHED
U Us se er r P Pr ri im mi it ti iv ve e C Ca al ll l ASSOCIATE
S St ta at te e
C CO OM MP PL LE ET TE E
S St ta at te e c ch ha an ng ge e
C Co on nt tr ro ol l c ch hu un nk k s se en nt t
o or r r re ec ce ei iv ve ed d
* * T Th he e s st ta at te e i is s c ch ha an ng ge ed d a an nd d t th he e
s si ig gn na al l i is s s se en nt t w wh he en n t th he er re e a ar re e n no o
m mo or re e o ou ut ts st ta an nd di in ng g D DA AT TA A c ch hu un nk ks s. .
* *
COOKIE-ECHOED
S SH HU UT TD DO OW WN N A AC CK K
* *
C CO OO OK KI IE E
A AC CK K
C CO OO OK KI IE E A AC CK K
I IN NI IT T
A AB BO OR RT T
I IN NI IT T A AC CK K
C CO OO OK KI IE E E EC CH HO O
CLIENT
50

Another hint to understand better the figure is that the right part of it represents the
actions taken when we play the active part in the establishment and termination of the
association (for example when we act as a client connecting to the server and then we
finish the connection), while the left part is just the opposite (when we act as a server
waiting for a client to connect to us, and then the client is the one who releases the
association). This rule is broken only in one case, which is the SHUTDOWN chunk sent
by the right host (this is done to keep the clarity of the diagram).
At the first sight, one could think that SCTP's finite state machine representation is
quite similar to TCP's one. This would not be a surprise, not only because TCP is one of
the ancestors of SCTP but also because most of the transport protocols have roughly the
same finite state machine representation. But there are big differences between them, the
most important ones being these:

SCTP uses for the establishment phase a four-way handshake while TCP uses a
three-way one. This has to do with the so-called cookie mechanism, used to avoid
an attack similar to the known SYN attack in TCP (described in section 4.2). The
establishment phase will be shown in detail in the next chapter.
The termination of an association is simpler. In addition, in SCTP there is not the
concept of half-open connections. This issue about termination of an association
will be further commented in chapter 7.

While the initiation process has been accepted as a great improvement over TCP, the
termination phase has been largely criticized due to its lack of the half-open connection
concept. We will see more about these phases in their respective chapters.
An association's birth: From a two-way to a four-way handshake
51

4. AN ASSOCIATION'S BIRTH: FROM A TWO-WAY TO A FOUR-WAY HANDSHAKE

In this chapter we will take a deep look at the establishment phase in SCTP. The way
how SCTP sets up a new association is one of its major improvements over TCP.
The establishment procedure has been modified several times during the design of
SCTP and the final design is very robust (providing protection against one of the most
typical attacks done to TCP) but at the same time allows a fast way to start transmitting
data.
In the next sections we will explain in detail how SCTP associations are formed, and
the main advantages of this way of doing. We will also describe how this procedure
evolved and the reasons behind the changes done.

4.1 The evolution of the establishment phase

Initially, the establishment of an association was really simple. MDTP was first
designed to be used for telephony signaling transport. In the telecommunications business,
signaling means not only call establishment and shutdown, but also billing, and when
money comes into play all kind of precautions taken are never enough. So it was taken for
granted that MDTP would be used inside the private IP networks of telecommunication
companies without any connection to the Internet, and so little effort was put to avoid
attacks as no hacker would be able to get into the network (and if you have attackers inside
your own company then you really have a problem).
Moreover, those IP networks would be properly engineered, meaning that they would
have more bandwidth than they were expected to manage. So they would never be
congested and they would not loose any message making IP almost reliable (but still some
failures could happen in the network making routers to misbehave).
In addition, the main objective was to make this establishment phase as fast as
possible, so the hosts involved in the association could start sending data as soon as
possible, with the lesser delay the better. As a result of this, MDTP used the simple two-
way handshake connection algorithm shown in Figure 4-1.

Figure 4-1: Establishment procedure in MDTP

In the environment we were dealing with, it was supposed that packets were not lost in
the network. Then the fastest way to start an association is simply sending data packets to
the peer endpoint with whom the association is to be started, allowing the receiver to
immediately send data back as soon as it receives the first data packet. This would mean no
delay at all, but this design could originate some problems because one can not completely
A Ac ck kn no ow wl le ed dg ge em me en nt t N Nu um mb be er r = = 0 0
S Se eq qu ue en nc ce eN Nu um mb be er r = = T Ta ag g A A
Data
Version In Queue
Data Size Part Of
A Ac ck kn no ow wl le ed dg ge em me en nt t N Nu um mb be er r = = T Ta ag g A A
S Se eq qu ue en nc ce eN Nu um mb be er r = = T Ta ag g B B

Data
Version In Queue
Data Size Part Of
52

trust an unreliable network (one should not even completely trust a reliable one). Among
other reasons, if a router fails there is still the possibility of having for example old delayed
packets inside the network. If the receiver of one of such packets immediately opens a
connection it will be likely that it will start waiting for more data that will never come, and
that connection will be open forever (or at least until some other mechanism gets rid of it),
wasting resources.
So, before properly opening a connection one should ask the sender of the packet to
verify if it really wants or not to open it to avoid these problems (this is exactly one of the
reasons of the existence of the three-way handshake in TCP). Finally, it was decided that a
three-way handshake was not necessary, but there should be at least a minimum
establishment phase.
As can be seen in Figure 4-1 the procedure used to open a new association was quite
simple. The initiator of the association sent a MDTP datagram with several flags set to
indicate that a new association was to be established. It also used the Sequence Number
field of the MDTP datagram (see Figure 3-1) to send to the initiatee a key number. That
number should be sent back inside the Acknowledgement Number field of a datagram in
which the initiatee should also send its own key number in the Sequence Number field.
This is exactly the same TCP does, even the names of the fields in the header are the
same. The difference is that, this done, the MDTP association was open, and there was no
need to acknowledge the receipt of the second message. The initiatee could start sending
data from the moment it sent the answer to the initiator, who could start sending data right
after receiving the acknowledging message. That made the initiatee able to send data two
round trips time before it would in a normal TCP connection (actually, it would be
perfectly legal in TCP to send data inside the three initial segments, but this is never done
because, using the standard sockets interface defined for TCP, the upper user must first
open the association before it can send any data).
As explained in section 3.1.1 the Data field of MDTP was also used to exchange other
kind of information such as the receiver buffer size, the number of streams (see section
5.3) or the valid IP addresses that could be used in the association. So MDTP was not
allowed to include data in the datagrams sent during the establishment phase.
However, the more the protocol evolved the clearer it became that it should not be
restricted to signaling transport. That meant that the designers were focusing their efforts in
creating a protocol that some day could even compete with TCP in its present task, as the
main Internet transport protocol. As a consequence, they started to look at external
attackers as a real menace and so they could not use this initiation scheme anymore.

4.2 Cookies against the attackers

This initiation procedure was kept until the last version of MDTP. During the IETF
meeting in Oslo in July 1999 a new and revolutionary establishment phase started to be
sketched. Little time later, in a designer's meeting in Santa Clara, Randall R. Stewart
explained the main idea of this new mechanism (that, almost mythical, moment is
remembered as the birth of a cookie). It completely removed the problem of the so-called
SYN attack in TCP. This attack is very simple and can affect any system connected to the
Internet providing TCP-based network services (such as an HTTP, FTP or mail server).
There is a very good description of this attack in [CER1996]. Let us see in short how
this basic attack is performed. In TCP, the connection phase consists of a three-way
handshake, the first two legs being exactly the same as they were in MDTP (Figure 4-1).
53

The third one is simply the acknowledgement of the second message exchanged. These
three packets are usually called SYN (from Synchronization, as it has the SYN flag set, used
only during the establishment), SYN-ACK (it has both the SYN and ACK flags set) and
ACK (this is a simple acknowledgement message with the ACK flag set). The problem is
that the receiver of the SYN not only sends back the SYN-ACK but also keeps some
information about the packet received while waiting for the ACK message (a server in this
state is said to have a half-open connection). The memory space used to keep the
information of all pending connections is of finite size and it can be exhausted by
intentionally creating too many half-open connections. This makes the attacked system
unable to accept any new incoming connections and thus provokes a denial of service to
other users wanting to connect to the server. There is a timer that removes the half-open
connections from memory when they have been in this state for so long, and that will
eventually make the system to recover, but nothing will change if the attacker continues
sending SYN messages.
There is no generally accepted solution to this attack. Using packet filtering,
discarding the IP datagrams coming from the attacker would solve the problem if the
attacker was not able to forge its IP source address, something commonly refereed to as IP
spoofing [CER1995]. This practice, which not only makes packet filtering useless but is
also so effective in hiding the identity of the attacking machines, is a trivial thing to do
under any of the various UNIX-like operating systems. Fortunately, in a fluke of laziness
(or good judgement?) that has saved the Internet from untold levels of disaster, Microsoft's
engineers never fully implemented the complete UNIX Sockets specification in any version
of Windows previous to Windows 2000. As a consequence, Windows machines, which are
the most spread ones among Internet users, have been blessedly limited in their ability to
generate deliberately invalid Internet packets (compared to UNIX machines). It is
impossible for an application running under any version of Windows 3.x/95/98/ME or NT
to spoof its source IP or generate malicious TCP packets such as the ones used to produce
SYN floods
25
.
Therefore, the attack works as represented in Figure 4-2 below:

Figure 4-2: SYN attack in TCP

As we see, the attacker uses IP spoofing, making it unable to receive the SYN-ACK
segments produced, which is not a problem since it will never answer them. All those SYN-

25
For the interested reader, there is a very good reading about a SYN attack causing denial of service
directed to the Gibson Research Corporation in [Gib2001].

I IP P D De es st ti in na at ti io on n A Ad dd dr re es ss s A A
S SY YN N- -A AC CK K

I IP P D De es st ti in na at ti io on n A Ad dd dr re es ss s B B

I IP P D De es st ti in na at ti io on n A Ad dd dr re es ss s Z Z
S SY YN N
F Fa ak ke e I IP P S So ou ur rc ce e A Ad dd dr re es ss s A A
S SY YN N
F Fa ak ke e I IP P S So ou ur rc ce e A Ad dd dr re es ss s B B
F Fa ak ke e I IP P S So ou ur rc ce e A Ad dd dr re es ss s Z Z
S SY YN N
54

ACK segments will be lost unless there is any host with TCP service listening to the port
and addresses used as the source of the SYN segment. In that case that host will answer
with a segment carrying the RST (from Reset) flag set and the attacked system will delete
the information for that specific half-open connection.
It seems that with the release of the last versions of Windows (Windows 2000 and
Windows XP), which give access to raw IP sockets allowing the programmer to
completely modify the whole IP header, this kind of basic attacks could become much
more common. So servers relaying on the use of TCP as their transport protocol could be
in danger. Or maybe not, but in any case, SCTP gives no chance of success to this kind of
attacks with its cookie mechanism. When the designers of SCTP started to think about
how to deal with SYN flooding, they quickly saw that two things were necessary in order
not to make a new transport protocol with this same weakness:

The server (the initiatee of a new association) should not use even a byte of
memory until the association is completely established.
There must be a way to recognize that the client (the initiator of the association) is
using its real IP address.

Usually, to meet the second requirement, the server sends some kind of key number to
the client who will only receive that information if the source address used in its IP
datagram is the real one. Once the client has that information, it can then send a
confirmation to the server using that key number thus proving that it was telling the truth.
This means that the server needs to save somewhere that key number as well so there is a
way it can verify that the key number was the right one. But then comes the problem of
being forced to store that value somewhere and using some memory resources while
waiting for the answer that might never come.
Therefore, the idea was: why not instead of storing that information in our system we
make it to stay all the time in the network or in the client's memory? Of course, one
immediately thinks that if a datagram coming from the client is the one that is going to
provide us the information to check against the client's answer, we have not done anything
but making worse the situation. The client will tell us whatever it wants and then it could
just completely open an association sending us a simple message.
But this is not necessarily true if we manage to convert the two problems into another
one: the server has to sign with a secret key the information sent to the client. So, when it
receives that information back from the client, it can recognize due to the signature and
using the secret key, that it did send exactly that information, which is unmodified, and so
we can be as confident on it as if it had never left the server's buffers. And that is the
cookie mechanism. Apparently (and truly) so simple, but at the same time so powerful to
avoid the flooding attack described above. In any case, that mechanism was basically the
same as the one used in Photuris (a session-key management protocol specified in
[Kar1999]).

4.3 The first two legs: The INIT and the INIT ACK chunks

So, let us look at the establishment phase in SCTP, represented in Figure 4-3, where
the datagrams exchanged in the first two legs of the four-way handshake are augmented to
see their internal structure:

55

Figure 4-3: Establishment phase in SCTP (first two legs)

As we can see the client first sends a datagram to the server containing the INIT
chunk, and the server answers sending back an INIT ACK chunk. These two chunks are
very similar and apart from the Chunk Type (which is set to 1 in the INIT chunk and 2 in
the INIT ACK chunk), Chunk Flags (which are not used and are reserved for future use)
and Chunk Length fields, they carry the following information:

The Initiate Tag in the INIT chunk plays the same role as the Sequence Number
field of the TCP header. It matches the INIT chunk sent to the server (equivalent
in this case to the SYN segment) with the expected INIT ACK chunk (which
would be the SYN-ACK segment counterpart). The big difference is that in TCP
only this first exchange is protected with this key number, while in SCTP the
number is kept and all the datagrams exchanged during the whole life of an
association must be tagged with this value. The randomly chosen value contained
in the Initiate Tag field will be included inside the Verification Tag field of the
common header of the datagrams sent by the server as a validity check: the client
will never accept a datagram coming from the server if it does not have the
Verification Tag set to the right value (except for some special cases as explained
in chapter 7).
Received Cookie
Chunk Type = 10 Chunk Flags (Reserved) Chunk Length
Checksum
Verification Tag =Tag Z
Source Port Number
Chunk Type =
11
Chunk Flags
(Reserved)
Chunk Length
Checksum
Verification Tag =A

Parameters
Initial TSN
Number of Outbound Streams Number of Inbound Streams
Advertised Receiver Window Credit
Initiate Tag = Tag A
Chunk Type = 1 (INIT) Chunk Flags (Reserved) Chunk Length

Verification Tag = 0

Parameters
Initial TSN
Chunk Type = 1
(INIT)
Chunk Flags
(Reserved)
Chunk Length
Checksum

Cookie + Other Parameters
Initial TSN
Initiate Tag = Tag Z
Chunk Type = 2 (INIT ACK) Chunk Flags (Reserved) Chunk Length
Checksum
Verification Tag =Tag A

State Cookie + Other Parameters
Initial TSN
Chunk Type = 2
(INIT ACK)
Chunk Flags
(Reserved)
Chunk Length
Checksum
Verification Tag = Tag A
56

The Verification Tag field was included to avoid blind attacks
26
. In our case,
any blind attacker would not know the value of the Verification Tag and so its
datagrams would be rejected by the receiver. With the use of the 32 bits
Verification Tag the blind attacks are drastically reduced as the attacker would
need to send in average 2
31
datagrams before one of them is accepted. This would
take a very long time, and much before such a quantity of datagrams with a wrong
Verification Tag had arrived there should be some alarms bells already ringing.
However, for a stronger protection against attacks one should use the procedures
defined in [Ken1998a], which may be a tradeoff between security of the
association and time consumed processing the datagrams.
Of course, only if we make the Verification Tag as random as possible, that
average number of 2
31
attempts will be a reality. Otherwise, as the Verification
Tag is a basic defense against blind attacks, there will be the possibility of
suffering attacks similar to the so-called Sequence Number Attack described in
[Bel1996], in which the power of the attack relies in the possibility of guessing
the value of a new pseudorandom number if the attacker knows the ones
generated during a small period of time. Random numbers are hard to produce in
a computer, but the hints given in [Eas1994] can be helpful to achieve the desired
level of randomness.
As the datagram containing the INIT chunk is the first one of an association,
it has its Verification Tag field set to zero. As seen in Figure 4-3 the datagram
containing the INIT ACK chunk already uses the Initiate Tag included in the INIT
chunk received. The INIT ACK chunk itself also contains an Initiate Tag that will
be used by the client as the Verification Tag of its subsequent datagrams directed
to the server.
The Advertised Receiver Window Credit tells the server which is the buffer space
in bytes that the sender of the chunk has reserved to store incoming data. This
field has been changing a lot during the evolution of MDTP and SCTP (already
discussed in section 3.1).
In the first version of SCTP, the In Queue field of the MDTP header evolved.
A new field, called the Receiver Window Credit, was included both in the INIT
and in the INIT ACK chunks. This information told the receiver of the INIT or
INIT ACK how many outstanding messages it could have. But this did not help
that much: the information was still given in number of messages instead of
number of bytes, and again that information was related with the number of
outstanding data messages and not with the real state of the receiver's buffer. In
the next version it was changed to express the value in number of bytes instead of
messages, but the error in the concept was still there, as there was no direct
boundary between the outstanding bytes and the buffer space at the receiver.
Finally the mistake was fixed in the 6
th
version of SCTP, including the
Advertised Receiver Window Credit field, also included in the acknowledgement
chunks, that allows the data sender to track the buffer space at the receiver side.
This was somehow going back to the roots, as the Window field of TCP's header
performs exactly this same function. The main difference is that the Window field
is 16 bits long, while the Advertised Receiver Window Credit uses 32 bits. When
the Receiver Window Credit field was firstly used in SCTP it was 16 bits long as
well. But when it was changed to express the value in octets instead of messages,

26
In a blind attack the attacker is not able to read a datagram that is not directed to it, and it does not
have access to the data exchanged between the peers involved in an association
57

it was immediately upgraded to 32 bits to avoid a problem that TCP has related
with its Window field.
TCP has a 16-bit Window field that can at most report 64 Kbytes. That
quantity, while enough when TCP was designed, quickly became too small. As
described in [Jac1992] TCP performance depends not upon the transfer rate itself,
but rather upon the product of the transfer rate and the round-trip delay. This
BandwidthDelay product measures the amount of data that has been already sent
but that has not yet reached its destination (the bits that are still on the way). It is
the buffer space required at the receiver to obtain maximum throughput on the
TCP connection over the path, i.e., the amount of unacknowledged data that TCP
must handle in order to keep the pipeline full.
As networks evolve to become Gigabit networks, the small Window field that
TCP has brings performance problems especially in long distance connections.
Let us consider for example that we are transmitting data from Madrid to Helsinki
having the receiver a buffer of 64 Kbytes, and suppose also that the link used
transmits at one Gbps through fiber. The example is represented in Figure 4-4:

Figure 4-4: Transmission of 64 kilobytes from Madrid to Helsinki

In the example, Figure 4-4 (a) shows the initial state, just before the host in
Madrid starts sending data. Let us make some rough calculations that will show
how big the problem can be. The 64 Kbytes (524,282 bits) of the Window are sent
in about 500 s as shown in Figure 4-4 (b). If we consider that the speed of light
inside the fiber is about 200,000 Km/s, the datagrams will take about 15 ms to
cover the 3,000 Km distance between the two cities (Figure 4-4 (c) shows the
moment when the first datagram sent reaches Helsinki). In that moment, the
acknowledgements are started to be sent and they reach Madrid 15 ms later as
(a) At t = 0
(d) After 30 ms (c) After 15 ms
(b) After 500 s
58

represented in Figure 4-4 (d). The arrival of those acknowledgements to the data
sender allows it to send more data. Meanwhile, it must stay idle waiting for the
answer, and so, it is sending data at 1 Gbps rate only during 500 s every 30 ms.
This is less than 2% of the time, converting our excellent 1 Gbps link into a poor
20 Mbps link.
Fortunately, this problem was solved in [Jac1992] including a new Window
Scale option in TCP that allows to shift the Window field up to 16 bits to the left,
thus allowing windows of up to 2
32
bytes. SCTP will not suffer this problem at
least in a very long time, as the Advertised Receiver Window Credit can make use
of a buffer of up to 4 Gbytes.
The Number of Outbound Streams and Number of Inbound Streams are used to
negotiate the quantity of streams
27
used in the association by each endpoint.
Every SCTP association is composed of at least one outbound stream going from
each host. So, every host has at least an outbound stream to send data to the other
host, and an inbound stream to receive data from the other host.
During the initialization phase, the client sends inside the INIT chunk the
information about how many inbound streams it is willing to accept, and how
many outbound streams it would like to open. The server also includes this
information in its answer, so the minimum of the number of requested outgoing
streams and the number of manageable incoming streams by the peer will be
chosen, whichever number is smaller.
The streams feature appeared firstly in the 6
th
version of MDTP (firstly called
flows and then modified to avoid confusion, as that term was used for other
purposes in other protocols somehow related to MDTP). It was a compulsory
feature but it added complexity to the protocol so it was decided that it should be
optional in the next revision of MDTP. However, as they were a convenient
remedy to avoid head-of-line (HOL) blocking (as explained in section 5.3), they
became compulsory again in the next version, as they are in SCTP.
In the 6
th
version of MDTP the streams had to be opened one at a time, using
a special stream opening procedure. When opening a large quantity of streams,
this procedure was long and inconvenient. So, in the last version of MDTP
included the possibility of opening several streams during the establishment
phase. When SCTP came into play with its extensibility possibilities only this
initial opening was kept, and the possibility of opening and closing streams during
the life of the association was completely removed. This was done because if it is
proven in the future that this feature is desired, one can always make an easy
extension to SCTP to deal with the problem. Meanwhile it is better not to add
features to the protocol that maybe will be never used.
In the early stage of design, stream 0 was reserved for control purposes. This
was elegant in a way, but few weeks were enough for the designers to realize that
streams were only related with upper user data transmission and thus it was at
least paradoxical using it for control purposes (it should be the upper user that
should specify the use of different streams). Anyway, stream 0 still kept its special
status as it was always implicitly open when an association was established
(presently, stream 0 must be explicitly and compulsorily open).

27
The term stream is used in SCTP to refer to a sequence of user messages that are to be delivered to
the upper-layer protocol in order with respect to other messages within the same stream. This is in contrast to
its usage in TCP, where it refers to a sequence of bytes. The use of the streams will be further explained later
in section 5.3.
59

The value of the first Transmission Sequence Number (TSN) must be included in
the Initial TSN field of both the INIT and INIT ACK chunks. The TSN is a
number included in every DATA chunk to allow the receiving SCTP endpoint to
acknowledge its receipt and detect duplicate deliveries (thus with an equivalent
functionality than the Sequence Number in TCP). The Initial TSN is simply the
value of the TSN that the INIT or INIT ACK sender will include in its first DATA
chunk sent. It is usually set to the same value than the Verification Tag.
The last part of the INIT and INIT ACK chunks contains parameters. We will
deal with them in the next sections.

When the client sends the INIT chunk requesting the establishment of the association,
it creates a data structure that keeps the information needed to manage that association, the
Transmission Control Block (TCB). The TCB will be used during the whole life time of
the association, keeping the information about timers, received and sent TSNs, and all the
necessary data to keep the association up and running. It is important to note that the server
will not create the TCB until it receives the answer to the INIT ACK it sent.

4.3.1 The parameters

All the parameters defined in the basic SCTP specifications are meant to be used
during the first two legs of the establishment phase. Thus, only the INIT and INIT ACK
chunks are able to carry parameters so far. The ERROR and ABORT chunks can carry
error causes, which are syntactically the same, but with different semantics (more about
this in sections 6.2 and 7.2). Also the HEARTBEAT and HEARTBEAT ACK chunks can
carry a similar TLV structure, but its internal structure is implementation-specific (see
section 6.1). Some SCTP extensions use new parameters, but they have not been
standardized yet (as discussed in chapter 8).
All the INIT and INIT ACK parameters that appear in the SCTP specifications are
discussed in the next sections.

4.3.1.1 What is your address?

The IP Address parameters, in the INIT chunk, list the valid IP addresses that the
client will use as a source of its datagrams and that the server can use as the destination of
its datagrams (and vice-versa in the INIT ACK chunk). Opposed to TCP, an SCTP
association can take advantage of a multihomed host using all the IP addresses the host
owns. This feature is one of the most important ones in SCTP as it gives some network
redundancy that is really valuable when dealing with telephony signaling. As seen in
section 2.2.1, in the SS7 world everything is duplicated, and the idea of loosing a TCP
connection due to the failure of one of the network cards was one of the major problems
that made SCTP necessary.
Initially, multihoming was also used for load sharing. The idea was using the
destination addresses available in a round robin fashion, and so sending 1/n part of the
traffic to each of the n available destinations, thus avoiding congestion. Quickly this idea
was discarded, as SS7 links are engineered to be loaded just at a 40% of their capacity at
most and so they should be never congested. Moreover, transmitting datagrams selecting
the destination address in a round robin fashion actually means that if any of all the
network cards suddenly stops working, the association will not be lost but there will be
undesired retransmissions, delaying the transmission of the information (so we are
60

multiplying by a factor of n the probabilities of suffering some kind of network failure).
Even more, every change in the address used would likely produce out of order datagram
arrival to the receiver. This is not generally a nice thing, as it produces extra buffer
consumption, the sending of more acknowledgements and even retransmission of packets
(more about this in section 5.2). So, in SCTP only one address is used, the Primary
Address, while the rest are left as a backup in case the Primary Address becomes
unavailable.
Another discarded idea regarding the use of multihoming, was sending the datagrams
duplicated to all the destination addresses. This idea was forgotten since it would multiply
the load by the number of destination addresses and there were doubts about the gain that it
could provide.
At the beginning, only IPv4 addresses were considered. This was really a very
shortsighted design, that fortunately was modified in the first version of SCTP after the
Oslo IETF meeting in July 1999. However, this was not the last addition to the IP Address
parameter suite. Listing inside the body of the SCTP datagrams the addresses that are
going to be used, instead of only using the one that appears as the source address in the IP
header, produces some operation problems when dealing with a Network Address
Translator (NAT) [Sri2001].
NATs are a special kind of routers that were created as a short-term solution to IPv4
address depletion. The 32-bit field for IPv4 addresses yields a total of 4,294,967,296
addresses. This quantity would be enough to address most of the people in the whole
world. However, only about 20 - 30% of those addresses can be used, so we make routing
efficient enough (having a hierarchy in the address allocation).
NATs are a lesser evil that is lasting more time than expected (IPv6 with its 128-bit
addresses is the long-term solution that will obsolete the use of NATs). The arguments in
favor and against NATs frequently take on religious tones, with each side passionate about
its position. The author is on the side of people against the use of NATs.
NATs are always placed at the borders of stub domains
28
and they take advantage of
the fact that a small percentage of hosts in a stub domain are communicating outside of the
domain at any given time (indeed, many hosts never communicate outside of their stub
domain). Because of this, only a subset of the IP addresses inside a stub domain needs be
translated into globally unique IP addresses when outside communication is required.
Meanwhile, the addresses used inside the domain can be reused in several different stub
domains. So, one globally unique Class C IPv4 network (more about IPv4 network classes
in [Tan1996, section 5.5.2]) can be used by more than 254 hosts in the whole world
(usually one of them acting as a router). The basic operation of a NAT is shown in Figure
4-5.
The figure represents two stub domains, each of them having a NAT that connects a
LAN to the Internet. Let us call Stub A the one that externally uses the globally unique
Class C IPv4 address block of 195.217.176.0/24 and is connected to the Internet through
the NAT with IPv4 address 195.217.176.1 (the one on the left of Figure 4-5). Let us call
Stub Z the one that externally uses the globally unique Class C IPv4 address block of
195.17.34.0/24 and is connected to the Internet through a NAT whose IPv4 address is
195.17.34.1 (on the right in the figure). As we see, both NATs use Class C IPv4 addresses,
globally unique, while both stub domains use Class A IPv4 network addresses inside their
domains (both use network 10.0.0.0/8). This kind of network can contain more than 16

28
A stub domain is a domain, such as a corporate network, that only handles traffic originated or
destined to hosts in the domain.
61

million hosts. However, these Class A addresses can be used only inside the domain itself,
as they are not globally unique.

Figure 4-5: Basic NAT operation

In Figure 4-5 the NAT operation is explained with an example of an IPv4 packet
traversing two NATs (situation which is normally referred to as Twice NAT). A host in
Stub A internally represented as 10.114.206.48 sends a packet to the IPv4 address
195.17.34.9 (the destination address is known using an Application Level Gateway (ALG)
[Sri1999] that returns the right answer to a DNS query, but we will not discuss that earlier
phase). The IPv4 packet will have 10.114.206.48 as its source address and 195.17.34.9 as
its destination address. When that IPv4 packet reaches the NAT router, it translates the
source address, changing it from 10.114.206.48 to 195.217.176.131. In fact, the router
could have chosen any address of its Class C network (from 195.217.176.2 to
195.217.176.254) that is not being used in that moment by any host inside Stub A for
external communications (an address that is not part of any connection between a host in
Stub A and any other host outside the stub domain). Then, in case the IPv4 datagram is the
first datagram of a connection (for example, if it is a TCP SYN segment) the NAT reserves
that address (195.217.176.131) as being the address used by host 10.114.206.48 outside
Stub A. So, IPv4 packets that arrive to the router of Stub A and that are destined to
195.217.176.131 will be internally sent to host 10.114.206.48.
This change will mean for example that the IPv4 Header Checksum will have to be
recalculated, and depending on the type of information carried by the IPv4 packet some
more changes will have to be done (for example, if it carries a TCP segment, the TCP
Checksum will also be recalculated, or if it carries an ICMP message, the source address of
the IPv4 header inside the ICMP message will also have to be modified).
The IPv4 packet is sent then to the Internet, where it will be routed to the NAT at the
border of Stub Z (195.17.34.1). That NAT will use an internal table to know that packets
directed to 195.17.34.9 must be actually sent to host 10.170.8.92, so it will make the
necessary changes to the incoming packet and it will resend it, so finally the right host will

Parameters
Initial TSN
Initiate Tag =Tag A
Chunk Type =1 (INIT) Chunk Flags (Reserved) Chunk Length
Checksum
Verification Tag =0
Source: 195.217.176.131
Destination: 195.17.34.9

The
Internet
10.27.28.15
10.170.8.92
10.114.206.48
10.170.8.47
LAN
195.217.176.1
195.17.34.1
Parameters
Initial TSN
Checksum
Source: 10.114.206.48
LAN
Parameters
Initial TSN
Initiate Tag =Tag A
Checksum
Verification Tag =0
Source: 195.217.176.131
Parameters
Initial TSN
Initiate Tag =Tag A
Checksum
Verification Tag =0
Source: 195.217.176.131
62

receive the packet. The answer to that packet will suffer the same changes in its way back
to host 10.170.8.92 but in the reverse order.
As we see, NATs provide the feature of having more than 254 hosts while using a
Class C Network. This is helpful as the 16.382 Class B Networks (with space for up to
65,534 host) were almost exhausted, used by companies that normally use more than 254
hosts in their network but far less than 65,534, so wasting lots of IPv4 addresses.
This feature can not be always transparently provided to the hosts because the NAT
not always have all the necessary information to make the translation of addresses
(especially in issues related with security, where IPv4 packets carry things such as digital
signatures). In any case the solution has the disadvantage of breaking the End-to-End
(E2E) principle
29
inside an IP network, and making up for it with increased state in the
network.
SCTP is also affected by the existence of NATs as, due to its multihoming capabilities,
the addresses used in the association are included inside parameters in the INIT and INIT
ACK chunks. If those addresses were not translated as well, the receiver of the INIT chunk
would mistakenly use those non globally unique addresses. This problem forced one of the
next three solutions:

Updating the software of existing NATs to look inside the SCTP datagrams and
determine if some modifications should be done to its content, translating the
addresses carried inside parameters.
Not including any IP Address parameter (i.e., not using multihoming) if there is
any NAT in between.
Waiting until IPv6 is deployed and so there will not be any need for NATs any
more and we will not have to worry about them.

As NATs are widely used and SCTP was expected to be used before IPv6 is a reality
in the Internet, not any of these solutions could be seriously considered. So, after some
debate in the distribution list, a solution to the NAT traversal problem was found thanks to
the extensibility features of SCTP, and in its 9
th
release the Host Name Address parameter
was included. This parameter simply includes the host name of the sender of the INIT or
INIT ACK chunks, so the receiver can make the DNS query and the NATs can forget
about sniffing inside SCTP datagrams, thus making easier the whole operation with NATs.
However, the idea was almost discarded as it brought some potential security problems
(regarding the need of a DNS query) that were finally fixed.
The parameters also carry information about the Supported Address Types. This
parameter was included at the same time when the Host Name Address parameter was first
included. The problem it solves is that, if the INIT receiver wants to send us a Host Name
Address parameter and we are not able to resolve such kind of addresses, we will not even
be able to answer to the INIT ACK chunk, and there will not be any way the association
could be established. Telling to the peer that we do not support Host Name Address
parameters can avoid this situation (if apart from host names the peer can also send us IPv4
Address and/or IPv6 Address parameters). Of course, this parameter is only useful inside
the INIT chunk.

29
The so-called End-to-End principle notes that certain functions can only be performed in the
endpoints, thus they are in control of the communication, and the network should be a simple datagram
service that moves bits between these points. This improves network reliability. A discussion about this
model is held in [Car2000].
63

The multihoming features of SCTP also impose some problems to the use of the IP
Security Protocol (IPsec) ([Ken1998a] defines the whole architecture of IPsec, and all the
encryption and authentication algorithms, key management and security protocols are
specified in the RFCs number 2402 to 2412). This is because the whole IPsec model was
designed thinking on connections that did not make use of multihoming. Every source-
destination pair of addresses has to use a single key that must be first securely exchanged
using a protocol such as the Internet Key Exchange (IKE) [Har1998]. So, even if there is
the possibility of creating and exchanging a key for every source-destination pair of IP
addresses, when the number of IP addresses used by the endpoints is large, the whole
process of maintaining all those secure associations becomes clumsy. The work regarding
the use of SCTP with IPsec is published in [Bel2001].
During the early stage of SCTP design, there existed the implicit feature of starting an
association with an endpoint on behalf of another one. There were quite many security
implications if this was allowed (and few reasons to do it). So finally it was forbidden and
the source address of the SCTP datagram carrying the INIT or INIT ACK chunk is always
part of the association (unless a Host Name Address parameter is used, but in that case the
resulting INIT ACK will be discarded if it is not directed to the INIT sender).
There is one Internet-Draft, [Coe2001], which compiles the issues raised by SCTP in
regard to multihoming on the Internet.

4.3.1.2 The king of the parameters: The State Cookie

The INIT ACK chunk always carries a special parameter, the State Cookie (normally
simply referred to as the Cookie). This is the parameter that makes possible getting rid of
attacks similar to the SYN attack used in TCP and shown in section 4.2.
It was included in the first version of SCTP, being the basis of the whole establishment
phase. It does not really have any internal structure, as it must be transparently echoed by
the receiver of the INIT ACK chunk, for whom the Cookie is meaningless. However, as the
intention of the Cookie is to move to the network and the client the task of the storage of
the information needed to open the association when the Cookie is echoed to its sender,
there must be a method to validate that it remained unmodified during its return travel
through the network. So it is highly recommended to include a Message Authentication
Code (MAC) in the Cookie. The current recommended MAC is the Keyed-Hashing
algorithm for Message Authentication (HMAC) described in [Kra1997].
HMAC makes use of any iterative cryptographic hash function such as Message
Digest 5 (MD5) [Riv1992] or Secure Hash Standard 1 (SHA-1) [NBS1995] (which are
the two most widely used cryptographic hash function nowadays), in combination with a
secret key. Thus, the INIT ACK sender should calculate the HMAC of the Cookie, using
also a secret key that is not known by anybody else (and that should be changed every now
and then). When the Cookie is echoed back and received by the INIT ACK sender, it
should recalculate the HMAC of the bytes of the Cookie, using again its secret key. If the
result is the same contained in the Cookie, it means that nobody modified it (or that a wise
attacker somehow guessed the secret key).
During the first releases of SCTP, it was suggested that MD5 should be used for the
HMAC. Later on, that suggestion was taken off due to the fact that MD5 is considered a
weak cryptographic function nowadays, as explained next. The strength of any one-way
hash function is defined by how well it can randomize an arbitrary message and produce a
unique output. One might think that it would take on the order of 2
m
operations to subvert
64

an m-bit message digest, but in fact, 2
m/2
will often do using the Birthday Attack
30

[Yuv1979]. Making a mathematical study, it can be proven that if some function, when
supplied with a random input, returns one of k equally-likely values, then by repeatedly
evaluating the function for different inputs, we expect to obtain the same output after about
1.2k
1/2
iterations. MD5 generates a digest of 128 bits, so it would be expected that about
2
64
messages would have to be processed before we find two messages with the same
digest (using only for this purpose the last designed supercomputer in the U.S. nowadays,
able of making about 10
15
operations a second, it would still take about a year to calculate
such quantity of MD5 digests). However, studying the internal structure of MD5, in
[Dob1996] a way was described such that one could find, in about 10 hours and with a
Pentium-PC, two messages with the same digest with a probability of about 0.05% (while
this kind of attack does not yet threaten practical applications of MD5, it comes rather
close).
[Ste2000] recommends that the Cookie should be as small as possible to avoid
fragmentation. A Cookie is usually smaller than 100 bytes. Apart from the MAC already
discussed, most of the SCTP implementations include in the Cookie the next fields:

The information exchanged in the INIT and INIT ACK chunks: the Verification
Tag of both the client and the server, the client's Advertised Receiver Window
Credit and Initial TSN, the number of the incoming and outgoing streams and the
valid IP addresses used by the client (or its hostname).
The lifetime of the Cookie, so a hypothetical attacker would not have enough time
to crack the MAC included.
The Tie-Tags.

The Tie-Tags are two 32-bit values that are normally set to 0. However, in case the
INIT is received when an association is already established (or it is in its establishment
phase), they carry the copies of both the client and server's Verification Tags in the
moment the INIT arrived to the server. This information, together with some rules
regarding the election of the Verification Tag depending on the state in which the receiver
of the INIT is (see section 5.2 of [Ste2000]), help to identify the situations such as:
initialization collision, restart of the peer, receipt of old or retransmitted datagrams and
false packets generated by attackers. The concept of the Tie Tag was first included during
the last stage of SCTP design as the response to the impossibility of differentiate the
situations stated above in some cases.
As stated above, the lifetime of the Cookie is limited. Thus, in case the delay between
the two hosts is large and the lifetime of the Cookie is too short, establishing an association
might become impossible. So, the INIT sender may ask for an enlargement of the Cookie
lifetime with a Cookie Preservative parameter. It simply includes the suggested Cookie
life-span increment. The receiver of this parameter may choose to ignore it due to its own
security reasons.

4.3.1.3 Other parameters

SCTP capabilities can be extended creating new chunks and/or parameters. As the

30
The name of this attack comes from the answer to the question "How many people do you need
before the probability of having two or more of them with the same birthday exceeds 50%?". The answer is
that only 23 people are needed. Taking into account that with 23 people one can make (23x22)/2 pairs, each
of them with a probability of 1/365 of being a hit, it is not really so surprising.
65

sender might need no answer to the new chunks or parameters, there exist the ambiguity of
a receiver actually processing the chunk or parameter, acting as it is supposed to and not
sending back any answer, and a receiver that simply discards the received information
because it does not know how to manage it. In the later case, depending on the Chunk Type
or Parameter Type (as explained in section 3.1.2) the receiver may send back an
Unrecognized Parameters parameter inside the INIT ACK, or an ERROR chunk (more
about this in section 6.2). The receiver of such parameter may decide to set up the
association without the extended functionality, or abort the establishment procedure.
The last defined parameter, is the ECN Capable parameter. Its internal shape has not
been specified, and just its Parameter Type has been reserved for future use of ECN. This
parameter should indicate that the INIT or INIT ACK sender understands ECN messages.

4.4 The last two legs: The COOKIE ECHO and COOKIE ACK
chunks

The last two legs of the whole four-way handshake are much simpler than the first two
ones. They are shown in Figure 4-6.

Figure 4-6: Establishment phase in SCTP (last two legs)

Basically, the receipt of the INIT ACK chunk triggers the sending of the COOKIE
ECHO chunk, which carries the same Cookie received inside the INIT ACK chunk. Of

Cookie + Other Parameters
Initial TSN
Chunk Type = 2 (INIT ACK) Chunk Flags (Reserved) Chunk Length
Checksum
Verification Tag =Tag A
Parameters
Initial TSN


Received Cookie
Chunk Type = 10 Chunk Flags (Reserved)
Chunk Length
Checksum
Source Port Number

Received Cookie
Chunk Type = 10
(COOKIE ECHO)
Chunk Flags
(Reserved)
Chunk Length
Checksum
Verification Tag = Tag Z
Chunk Type = 11 Chunk Flags (Reserved)
Chunk Length
Checksum
Verification Tag =A
Chunk Type = 11
(COOKIE ACK)
Chunk Flags
(Reserved)
Chunk Length
Checksum
66

course, the datagram carrying that chunk must have its Verification Tag set to the Initiate
Tag value received in the INIT ACK chunk.
Upon the receipt of the COOKIE ECHO chunk, the server might open a new
association with the client (if it has resources and the received Cookie is valid and not stale
yet). It is in this moment when the server creates its TCB, and before the receipt of the
COOKIE ECHO nothing is saved in the server about the association that is in its
establishment phase. Then, the server sends back the COOKIE ACK chunk, which does
not really carry any extra information but tells the client that the new association was
successfully created.
As stated before, the initial goal of the establishment phase was to be able to send data
as soon as possible. The use of a four-way handshake initialization procedure instead of a
two-way one would delay the sending of data by one Round Trip Time (RTT). But this is
not necessarily true, as the last two datagrams exchanged in the SCTP establishment phase
can carry any other chunk (including the DATA chunk) bundled with the COOKIE ECHO
or COOKIE ACK chunks. Therefore, when comparing SCTP's and MDTP's establishment
phase, we see that the client must wait for a single RTT before it can send any data, which
is the same quantity of time in both protocols. The server must wait for an RTT between
the receipt of the INIT and the receipt of the COOKIE ECHO chunk. This means one RTT
extra wait when comparing with MDTP. However, as usually the server can not send any
data to the client before the client itself has made a request, in the normal case both the
client and the server suffer from the same delay with the two-way handshake and the four-
way one, but the four-way is much more secure.
The so-called Cookie Mechanism is a very neat solution to most of the problems with
which SCTP has to deal with, and it is one of SCTP's greatest improvements over TCP.

Doing the hard work: Transmission of data
67

5. DOING THE HARD WORK: TRANSMISSION OF DATA

The aim of any transport protocol is the transmission of data. In this aspect, SCTP has
evolved a lot since the first version of MDTP to the publication of the RFC. As the
designers of SCTP had complete freedom, they included almost all the features that in TCP
are included as successive extensions (some of them can not be used at the same time,
mostly due to space problems in the TCP options field).
In this chapter we will explain the evolution of data transmission in SCTP, and how
new additions to TCP's functionality fit inside SCTP, such as the congestion control
mechanism, the selective acknowledgements or the report of the receipt of duplicate data.

5.1 Basic data transmission

The two chunks used for data transmission are the DATA chunk, used by the data
sender and the one that carries the user data, and the SACK chunk, used by the data
receiver and the one that carries the acknowledgement of the receipt of the DATA chunks.
In Figure 5-1 we see the normal way in which data transmission takes place.
As we can see from the figure, every DATA chunk is identified by its TSN. This value
plays the same role as the Sequence Number field of the TCP header, with a subtle
difference. The TSN counts DATA chunks sent and not the bytes carried on them as the
Sequence Number does. Therefore, two consecutive DATA chunks will have two
consecutive TSNs. During the first 6 releases of the MDTP specification, the MDTP and
TCP behaviors were exactly the same in this aspect (even the fields were called exactly in
the same way, see Figure 3-1). But in one of the many design team's discussions held in
April 1999 it was decided that packet marking instead of byte marking was more desirable
for signaling transport. In this way, SCTP can somehow use better the 32 bits of the TSN
(but this is not a big deal, since one should have 2
31
bytes outstanding, 2 Gbytes, before the
difference could be of any help, and this is highly unlikely).
This packet marking can be done in SCTP because the user data is sent to the network
inside data blocks, the DATA chunks, which can be uniquely identified by its TSN, and so
all the bytes included inside them. In TCP every byte is marked depending on its order in
the byte stream being sent to the receiver, and they do not belong to any superior structure.
Thus, a TCP sender has the ability of freely rearrange the quantity of bytes of user data it
wants to include in a segment. Once the user data has been sent inside several TCP
segments (and thus fragmented in specific pieces), those segments can be joint or split later
on. So, for example in case of retransmission, the TCP data sender has the possibility of
including in a single segment what was previously included in several different (and
consecutive) segments.
Joining DATA chunks is not a problem in SCTP either, due to its bundling ability
(present since the first version of MDTP). This means that more than one chunk can be
included in a single SCTP datagram. So in case of retransmissions, an SCTP data sender
can put together in a single datagram several DATA chunks previously sent inside their
own datagram. However, what is a real limitation in SCTP is that once a DATA chunk has
68

been sent, the data carried inside it can not be split later on and sent inside several smaller
DATA chunks. This can be a problem if the MTU decreases (see section 5.4).
As seen in the figure, when a DATA chunk arrives to the receiver of data, it must send
back a SACK chunk reporting its receipt. The Cumulative TSN Ack is used in the same
way as the Acknowledgement Number in TCP. But again, it acknowledges the receipt of all
the previous TSNs up to and including the Cumulative TSN Ack, while in TCP bytes are
acknowledged, not the segments that carry them.
After the Cumulative TSN Ack, the SACK chunks carry the Gap Ack Blocks. They are
used to acknowledge data received out of order. The Cumulative TSN Ack acknowledges
all the datagrams received up to the TSN it states (acknowledging that TSN as well).
However, as the DATA chunks can arrive disordered to its destination, or some of them
may even be lost, we need a mechanism to tell the data sender that we have received those
TSNs out of order. Thus, if a TSN falls inside a Gap Ack Block it means that it has reached
its destination and the data sender does not have to retransmit it even if the Cumulative
TSN Ack does not acknowledge it. The Gap Ack Block Start and Gap Ack Block End are
16-bit numbers because they express TSNs relative to the Cumulative TSN Ack.
TCP can only report the last byte received in order (using the Sequence Number)
unless it uses the option for selective acknowledgement defined in [Mat1996]
31
. As
happened with some other TCP extensions, this ability was directly included in SCTP in its
basic specification.
We can also see in Figure 5-1 that the SACK chunk carries at the end a list of
Duplicate TSNs. The use of such list is not explained in the whole specification of SCTP,
but this is not a mistake. This feature was added in the 8
th
version when the experts in
congestion control of the TSVWG suggested incorporating it. At that time they were
working on an extension to the Selective Acknowledgements for TCP that could report also
duplicate data segments received, work that was finally published in [Flo2000]. Again,
SCTP inherited this TCP functionality.
As expected, to provide reliability, if the acknowledgement of a certain TSN is not
received within an interval of time, it is retransmitted. However, in SCTP we can play with
another variable than in TCP, which is the set of addresses used by the data receiver. A
data lost might mean either that the path to that IP address is congested, that a router in the
way in misbehaving and loosing datagrams, or simply that the network card of the receiver
is broken. So, when several addresses can be used at the same time, it is advised that when
making a retransmission of a DATA chunk we use a different address than the one to
which the DATA chunk was sent the last time. In this way, the sender takes profit of the
multihoming capabilities of SCTP to provide an extended reliability (if any of the receiver
addresses is properly working, the data transfer will effectively take place). There are,
however, some concerns about a malicious use of multihoming to artificially enlarge the
sending limits (so using more network resources than allowed) that will be explained in
section 5.2.
The DATA chunks only send data, and the SACK chunks only acknowledge data. In
TCP, if both ends are transmitting data, a data segment can also acknowledge data
received. This is a nice feature since it saves the bandwidth consumed by

31
Both in SCTP and in TCP the acknowledgement of data arrived out of order is taken as advisory only.
User data is not considered fully delivered until it is acknowledged by the Cumulative TSN Ack or the
Sequence Number respectively. This is because the data receiver can drop received data that has not been
delivered to the upper user yet (although this should be done only in extreme circumstances such as buffer
shortage).
69

acknowledgements. It can also be achieved in SCTP by bundling DATA chunks with a
SACK chunk.

Figure 5-1: Basic data transmission

As seen before, MDTP could send piggybacked acknowledgements as TCP does
without further problems, so bundling was initially designed for another reason. The reason
is that TCP transports a simple stream of bytes, and it is the task of the upper user to insert
the proper marks inside the user data so the receiver can identify several data units inside a
single byte string received. As SCTP was initially designed to carry telephony signaling
packets, whose length is usually in the range of 100 bytes, sending every message in a
single SCTP datagram would cause a lot of overhead. So it was one of the design goals that
an SCTP endpoint could send several small messages inside a single datagram to soften the
header overhead, and that is the reason why bundling was necessary. Messages inside the
Duplicate TSN#D

. . .
Duplicate TSN#1
Gap Ack Block #GStart Gap Ack Block #GEnd
. . .
Gap Ack Block #1 Start Gap Ack Block #1 End
Number of Gap Ack Blocks =G Number of Duplicate TSNs = D
Cumulative TSN Acknowledgement
Chunk Type =3 (SACK) Chunk Flags ( Reserved) Chunk Length
Checksum
Duplicate TSN #D

. . .
Duplicate TSN #1
Gap Ack Block #G Start Gap Ack Block #G End

. . .
Gap Ack Block #1 Start Gap Ack Block #1 End
Number of Gap Ack Blocks = G Number of Duplicate TSNs = D
Cumulative TSN Acknowledgement
Chunk Type = 3
(SACK)
Chunk Flags
( Reserved)
Chunk Length
Checksum
User Data
Payload Protocol Identifier
StreamIdentifier Stream Sequence Number
Transmission Sequence Number
Chunk Type = 0 (DATA)
Reserved U B E Chunk Length
Checksum

User Data
Payload Protocol Identifier
Stream Identifier Stream Sequence Number
Transmission Sequence Number
Chunk Type = 0
(DATA)
Reserved U B E Chunk Length
Checksum
70

byte string received were initially identified in MDTP thanks to the Part and Of fields (see
Figure 3-1). In SCTP they are identified by their Stream Sequence Number (SSN) . The
use of the SSN field and the streams is discussed below in section 5.3.
SCTP was designed to be able to carry a number of signaling protocols (the adaptation
layers defined so far are mentioned in section 8.2). Since the beginning of the existence of
the SIGTRAN working group, it was accepted that one of the features that SCTP should
support was the identification of the upper protocol it was transporting as its payload (see
section 2.5). However, no protocol identifier field was included anywhere in SCTP until its
6
th
version.
There were several possible options to identify the protocol. One of the easiest ones
was to simply use different SCTP well-known ports for different protocols carried by
SCTP (in the same way that TCP uses port 80 when carrying HTTP or 21 when FTP is the
payload protocol). This way, it would be very easy for middle boxes such as proxies or
firewalls to know which is the protocol being transported by SCTP and act in consequence.
However, this had the problem that only one SCTP association transporting one type of
protocol could be established between two endpoints. Moreover, if a firewall relies on the
SCTP port to discard or not a datagram, this barrier can be surpassed by simply using some
other port.
There was also the possibility of adding a protocol identifier field in the common
header as IPv4 and IPv6 do (with their Protocol and Next Header fields respectively). This
would have the same advantages as the well-known ports approach, and none of its
drawbacks.
But there was a feeling that the messages managed in the signaling protocol being
quite short, it would be nice to have the possibility of bundling several messages of
different protocols in the same SCTP datagram. So finally it was decided to add a Payload
Protocol Identifier field inside the DATA chunks, as seen in Figure 5-1. Initially that field
was going to be only one byte long, but finally it was decided that it should be a 32-bit
value (not only because just one byte was not maybe enough, but also because a 32-bit
value fitted perfectly in the existing DATA chunk). Of those 32 bits, 16 bits would be used
for the protocol identifier, 8 bits for the variant, and the last 8 bits for the version.
One nice feature of TCP that avoids sending too many acknowledgement segments
without data is the so-called Delayed ACK Algorithm as described in [All1999]. Basically
it consists in sending an acknowledgement every second received datagram containing
data, and never delaying the acknowledgement of a segment more than a fixed quantity of
time (usually 500 milliseconds). As happened almost with every nice feature of TCP,
SCTP also inherited it.

5.2 Some solutions to avoid congestion

The behavior described in the previous section is the way in which data transfer should
be done in case nothing goes wrong. Unfortunately, it is hard to find such a perfect network
especially when dealing with the transmission of data through the Internet. In a real
situation packets are reordered in their way, and some of them are discarded or even
duplicated. The probability of this actually happening is related with the network usage: if
the network is used to send more packets than it is prepared to, all these problems arise.
So, when dealing with data transfer, one of the typical problems is designing
algorithms that help the data sender to know the state of the network, and also the
processing capabilities of the receiver. The goal of such algorithms is something which is
71

not really easy: not sending more traffic than the network and the receiver can handle (so
the retransmissions of lost packets are kept to a minimum), and avoiding unnecessary
retransmissions (so we only retransmit those packets that were really lost).
Normally, congestion in the network produces packet loss, which in turn triggers
retransmissions (usually making the duplicate receipt of several packets), which leads to
more congestion. Therefore, the best cure against congestion is prevention, as once it is
produced it is hard to deal with it.
Packet loss due to congestion has two origins. These two different problems are
usually illustrated hydraulically as it is done in Figure 5-2.

Figure 5-2: Two causes of congestion

In Figure 5-2 (a) we see a tap pouring water on a funnel. The tap would represent the
data sender, and the water drops would be the equivalent to the SCTP packets. The funnel
and the pipe could be considered as the Internet, the glass receiving the water drops would
be the memory buffer of the data receiver. Eventually there will be someone drinking that
water, who would play the role of the upper user of SCTP processing the received
information and freeing the buffer space.
So, in the (a) case, the pipe is thick (thus the bandwidth is big and the network is not
congested), but the receiver has a small capacity (a small buffer). So if we open the tap too
much (we send lots of datagrams), the receiver would be flooded and it will loose part of
the water (data) sent before anybody could drink it (passed to the upper user). This waste
in water (datagrams dropped) could be avoided if we simply would know about the
capacity of the glass (the buffer space) and we would open the tap consequently.

The Internet

The Internet
(a) Congestion at the receiver (b) Congestion in the network
72

This kind of congestion is relatively easy to manage. We have already discussed in
section 3.1.1 how MDTP solved this problem with its In Queue field. SCTP addresses this
problem by the use of the Advertised Receiver Window Credit in the INIT and INIT ACK
chunks (already seen in section 4.3), as well as in the SACK chunks. This is basically the
same that TCP does. Every time the data receiver sends a SACK, it tells the data sender
about the state of its buffers in that moment. So, the data sender should not send more data
than the receiver can buffer. When the SACK reaches the data sender the buffer space at
the receiver might be different, but as the receiver reports also the TSNs seen so far, the
data sender can easily calculate how much outstanding data it is allowed to have.
In Figure 5-2 (b) the capacity of the receiver is not a problem, as instead of having a
glass we have a whole bucket, but we still have problems. As the thin pipe (the congested
network) can evacuate less water (data) than the tap (data sender) is pouring, there will be
a moment in which the water level at the funnel (data travelling in the Internet) will grow
so much that again, the water will be lost. And this problem is much more difficult to
address, as the width of the pipe is not really known, at most it can be guessed from some
other information. Here is when the congestion avoidance algorithms come into play.
MDTP dealt with congestion in the network, having a variable limit on the number of
outstanding datagrams. In its initial specification, a simple table said how to decrease or
increase that limit when several quantities of datagrams where lost or acknowledged, but
this was a very primitive basis for what finally was used.
From the first version of SCTP the same congestion algorithms used in TCP were
adopted with several variations but with the same Additive Increase Multiplicative
Decrease (AIMD) behavior. These algorithms are published in [All1999], and were firstly
devised by Van Jacobson in [Jac1988]. They have been used for a long time now by most
of the TCP implementations, and they were chosen not only because they work but because
it is convenient to have the same sending capabilities than TCP. Otherwise, if SCTP used
algorithms that made it more congestion-sensitive than TCP, TCP flows would outcompete
SCTP flows for capacity, and vice-versa.
There are basically four intertwined algorithms that will be quickly described below.
They use two variables, the Congestion Window and the Slow Start Threshold (normally
called cwnd and ssthresh). The first one limits the number of outstanding bytes that the
data sender can have, and the second one helps to use the right algorithm in the right
moment.
It is worth noting that in TCP there is one of such variables for the whole TCP
connection, while in SCTP there is one per receiver address. The cwnd variable of a
specific destination address indicates the quantity of bytes that can be outstanding on that
particular address at a given time. So, the bigger a cwnd is, the more data is allowed to be
injected into the network destined to that address.
There is an open debate about the possibility of using a single cwnd for the whole
association instead of one per destination address. Having several of them could allow the
data sender to have more outstanding data than it is meant to, without breaking any rule of
the protocol, just making load sharing among the interfaces. This is not the idea of
multihoming, which is supposed to be used only as a backup in case the Primary Address
crashes. However, as different interfaces usually mean different paths, and different states
of congestion, there should be a way of applying different congestion variables to different
destination address. This is one of the problems of multihoming, it has never been
seriously tested before the creation of SCTP and the consequences of its use are not
completely known yet.
73

When the data transmission starts or when no data has been sent for a long time, SCTP
uses the slow start algorithm. The initial name for this algorithm was soft start, which does
not really give a better idea of what it is about, since it is not really slow neither soft. Slow
start is used to probe the network to determine the available capacity, so the idea is that the
cwnd is initially fixed to at most twice the value of the MTU of the address. However,
usually the network is able to carry much more than that quantity without major efforts.
So, during the slow start phase, when a SACK chunk is received, the value of cwnd is
increased by the total size of the acknowledged DATA chunks (limiting this increase to
one MTU worth of bytes if more data has been acknowledged). The result is that cwnd
increases exponentially, doubling every RTT. The complete rules are a little bit more
complicated, but the interested reader can check section 7.2.1 of [Ste2000].
When cwnd reaches the value of ssthresh, SCTP changes its behavior to the
congestion avoidance algorithm. In this phase, the cwnd is increased by at most one MTU
per RTT, so it grows linearly. Again, the complete rules are written in section 7.2.2 of
[Ste2000].
If cwnd continues growing, we should reach a point in which the network starts
loosing packets. A packet loss is considered always as a symptom of congestion because
with the modern technology it is quite unusual that a packet is dropped due to its
corruption when traversing a noisy channel. Therefore, unless there is a reasonable doubt
(if we are using satellite links for example), network congestion is always declared
responsible of the packet losses. So, if a DATA chunk is not acknowledged within a
certain period of time (this time is called Retransmission Time-Out (RTO) and we will
deal with it later, in section 5.5), it is retransmitted. But this causes almost catastrophic
consequences to the flow of data, as the cwnd is reduced to one MTU to avoid congestion,
starting again with the slow start algorithm. To help recovering from this situation,
ssthresh is set to one half of the old value of cwnd (so it takes few RTTs to recover our
sending capabilities to one half of the ones we had before), but in any case the overall loss
is quite big. To see this graphically, let us take a look at Figure 5-3.
In Figure 5-3 (a) we see what should be the normal progression of a data transmission
if there is no packet losses (the normal case sometimes even happens for small data
transfers). For the shake of simplicity we measure both cwnd and ssthresh in MTUs (it is
supposed that all the DATA chunks carry the maximum allowed quantity of bytes) as
shown in the left axis, and the time is measured in RTTs. The value of cwnd and ssthresh
appear as a solid line (blue and pink respectively). As we see, initially cwnd is set to 2 and
ssthresh to 16 (as an example). The green circles represent the DATA chunks sent (whose
TSN is the one that appears in the right axis), and the red squares are the SACK chunks (its
height indicates the value of the Cumulative TSN Ack, measured in the right axis). We also
assume that the data receiver is using the Delayed ACK Algorithm and that the RTT is
about 30 times the time of putting a whole MTU size packet in the line. That means that if
we are using a 10 Mbps Ethernet with a 1500 bytes MTU, the RTT would be 36
milliseconds. Finally, we also make the unrealistic assumption of having a RTO that is set
to 3 RTTs, which is convenient not to make the graph very large.
As we see, during the first RTTs the cwnd is increased exponentially and in about 5
RTTs it reaches the chosen value of ssthresh (we go from 2 MTUs to 16 MTUs in 5 RTTs,
which is quite a fast increment). Then, cwnd starts growing linearly, being increased by
one MTU every RTT. We see that at the end, after few more time than 17 RTTs (about 600
milliseconds in our example) cwnd is set to 27 MTUs and we have sent 287 TSNs (which
would mean more than 400 Kbytes in the environment described), 260 of them already
acknowledged. As no packets were lost, ssthresh was not modified at all.
74

Figure 5-3: Evolution of cwnd with and without packet losses

We can see the devastating influence of a single packet loss in the whole transmission
in Figure 5-3 (b). The beginning of the transmission is exactly the same as in Figure 5-3
(a), but right after reaching the congestion avoidance phase, TSN 34 is lost. The sender
continues sending normally, but as the incoming SACK have all the same Cumulative TSN
Ack, the cwnd is not increased during 3 RTTs. Then, the timer expires and TSN 34 is
resent, ssthresh set to 8 MTUs (one half of cwnd), and cwnd set to a single MTU. This
drastically lowers the sending speed. As we see, it takes about 5 RTTs to leave the slow
start phase, and then cwnd continues growing slowly. After 17 RTTs, cwnd is set to 11
MTUs, ssthresh to 8 MTUs and 129 TSNs have been sent, 118 of them already
acknowledged.
Summarizing, a single packet lost roughly halves the throughput of an association
32
.
However, we are not the first ones to notice this, and luckily people already made some
fixes to this behavior so this is not exactly the way in which things really work. To palliate
the effects of a single packet drop another algorithm called fast retransmit is used. The
heart of the algorithm is to already retransmit a DATA chunk when the SACKs show that
several other DATA chunks sent later than that DATA chunk have already arrived to the
destination, while the DATA chunk is still unacknowledged. In this way we can avoid the
time-out of the retransmission timer.

32
Although the figures completely depend on how much data we have to send and when the
retransmission happens.
(b) One Packet Lost (a) No Packet Loss
0
5
10
15
20
25
30
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Time (RTTs)
C
o
n
g
e
s
t
i
o
n

W
i
n
d
o
w

(
M
T
U
s
)
0
50
100
150
200
250
300
T
S
N
s
cwnd sstresh TSN sent TSN acknowledged
0
5
10
15
20
25
30
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Time (RTTs)
C
o
n
g
e
s
t
i
o
n

W
i
n
d
o
w

(
M
T
U
s
)
0
50
100
150
200
250
300
T
S
N
s
75

In TCP, a data segment is fast retransmitted upon the arrival of 3 duplicate ACKs (4
consecutive ACKs with the same Acknowledgement Number). Due to the use of Delayed
ACKs (only used when there are no gaps in the incoming data), a data segment is fast
retransmitted when the data receiver has gotten 3 or 4 later segments. This algorithm was
defined for TCP before the use of its option for selective acknowledgement was widely
deployed. So, in SCTP, due to its compulsory use of Gap Ack Blocks, the algorithm is
slightly different: if a TSN is not acknowledged in 4 consecutive received SACKs while
any other newer TSN is acknowledged in any Gap Ack Block of those 4 SACKs, the TSN
must be retransmitted. Moreover, both cwnd and ssthresh variables are set to one half of
the value of cwnd in the moment of the fast retransmission. In practice, this should work
pretty well, but SCTP specification has a bug related with this fast retransmit procedure
that makes it only work when there are few TSNs outstanding. Otherwise the same
procedure is applied several times and the final result is sometimes even worse than when
the fast retransmit procedure is not used. As stated before, SCTP specification is being
studied and there are some needed changes so far, one of them being this fast retransmit
issue. Those changes are published in [Ste2002b], and Figure 5-4 shows the differences
between the use of fast retransmit in [Ste2000] and [Ste2002b].

Figure 5-4: Use of fast retransmit in [Ste2000] and [Ste2002b]

The main problem with fast retransmit in the present specification of SCTP is that it
allows the same TSN to be fast retransmitted several times (every fourth received SACK
not acknowledging it and acknowledging subsequent TSNs). So when there are several
(b) Using Fast Retransmit as defined in
[Ste2002b]
(a) Using Fast Retransmit as defined in
[Ste2000]
0
5
10
15
20
25
30
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Time (RTTs)
C
o
n
g
e
s
t
i
o
n

W
i
n
d
o
w

(
M
T
U
s
)
0
50
100
150
200
250
300
T
S
N
s
0
5
10
15
20
25
30
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Time (RTTs)
C
o
n
g
e
s
t
i
o
n

W
i
n
d
o
w

(
M
T
U
s
)
0
50
100
150
200
250
300
T
S
N
s
76

TSNs and their acknowledgements in flight, the same algorithm is repeatedly applied,
causing cwnd and ssthresh to decrease much more than would be desirable. This behavior
is shown in Figure 5-4 (a), while the revised one appears in Figure 5-4 (b). We can see that
there is a subtle difference between them.
There is another problem with the fast retransmit procedure for SCTP. The data
receiver should stop using the Delayed ACK algorithm when it finds any gap in the
incoming sequence of TSNs. So, if the datagrams are reordered in the network, and one
TSN arrives to the destination before a number n of other TSNs (at least 4 TSNs), there
will be n 3 TSNs that will be retransmitted due to the fast retransmit algorithm. This
issue is also solved in [Ste2002b] by only triggering a fast retransmission of a TSN upon
the receipt of 4 SACKs that not only acknowledge other TSNs sent later, but that do not
increase the Cumulative TSN Ack. In other words, as far as the acknowledgements arrive in
sequence no fast retransmission will be issued (of course the first SACK containing the
gap breaks the sequence, but we would need other 3 ones in which the Cumulative TSN
Ack is not advanced).
Finally, the fourth algorithm used for congestion control is called fast recovery, also
defined in [All1999] and used right after a fast retransmission. TCP without the Selective
Acknowledgement option can not inform the data sender about anything else but the last
data segment received in order. As a fast retransmission is originated due to the
acknowledgements generated by data received out of order, even if the data is doing it to
the receiver, there is no possibility of sending anything else but a duplicate
acknowledgement (it is also interesting the use of NAKs by the data receiver, described in
[Fox1989], that make the sender to send a specific data segment, but this option has never
been widely implemented).
In the typical case, the data sender receives several duplicate acknowledgements and
suddenly, when the retransmitted data reaches the destination, an acknowledgement
segment with a big advance in the Acknowledgement Number is received (this can be seen
for the SCTP case in Figure 5-4). As this is the expected behavior, we can anticipate it
already increasing the cwnd when the duplicate acknowledgements are still arriving, and
this is basically the fast recovery algorithm. SCTP, however, does not need that algorithm
due to its use of Gap Ack Blocks, so the problem is elegantly solved.
An SCTP data sender should follow these guidelines not to flood the network with
excessive traffic, but as it is usually recommended, any other option that is less aggressive
(in the sense of injecting lest packets to the network) is always accepted.
Finally, the Duplicate TSNs at the end of the SACK chunks can be used for congestion
avoidance purposes. Although there is no standard algorithm that could be used to take
advantage of this information, there are some guidelines about its use. The idea is that this
information can help recognizing when an unnecessary retransmission was done and we
can take then the opportune actions. It is generally accepted that the Duplicate TSNs can be
useful to create an adaptive fast retransmit algorithm as discussed in section 8.1.3. Some
Internet-Drafts regarding this issue have already been published.

5.3 Several connections inside a single association: the use of
streams

A stream, as defined in [Ste2000], is:

77

o Stream: A uni-directional logical channel established from one
to another associated SCTP endpoint, within which all user
messages are delivered in sequence except for those submitted
to the unordered delivery service.

Therefore, in a way, a stream is a kind of subconnection inside an SCTP association.
The number of streams used is negotiated during the association establishment as already
shown in section 4.3.
UDP packets do not carry any sequencing information that allows the receiver to order
them upon arrival. Some applications do not need such ordering, but it is often desirable (if
not necessary) that the messages are delivered to the upper user in the same order as they
were sent to the network. Telephony signaling applications are this kind of applications.
Those applications that need keeping the order of the data sent through the network
may choose TCP as their transport protocol. TCP is ordered, but strict order-of-
transmission data delivery is also a restriction for many applications. TCP is byte-stream-
oriented and that means that it does not have any way to recognize the beginning and
ending of individual messages. So, the whole flux of bytes is all managed the same way,
and the data must be delivered to the upper user in the same order it was sent to the
network. This is because there is no way a TCP receiver can know that several parts of the
continuous byte stream are unrelated. This way of transferring data causes the so-called
Head-Of-Line (HOL) blocking, which is illustrated with an example in Figure 5-5.
If we make a typical request to an HTTP server to download a web page when surfing
the Internet, normally we will receive several different files, containing text, graphics or
sound. In Figure 5-5 the right side represents the client, and the left side is the server side,
that sends three small files (differentiated by the color of the packets containing them)
upon the request of the client. For the shake of clarity, we suppose that each file is
contained in two datagrams. To illustrate the problem we suppose that the first part of the
first file transferred is lost. So, in Figure 5-5 (a) we see what happens when the three files
are sent using a single TCP connection: as the first datagram is lost, even though the
second and third files entirely arrived to the client they can not be delivered to the upper
user as all the data sent by the server must be passed to the user strictly in order. This is the
HOL blocking problem.
Normally, when dealing with HTTP transfers, things do not work like this. Usually the
client opens several different TCP connections, one independent connection per file, and
closes it once the file is completely transferred. This is shown in Figure 5-5 (b), and as we
see, this way of doing is not affected by the HOL blocking as the files can be delivered to
the user as soon as they arrive (of course all of the files but the first one, whose beginning
is still missing). However, we still suffer from the delay involved in opening and closing
the TCP connections, and what is worse, we are wasting resources by having several TCP
connections open at the same time between the same two endpoints. As the servers have a
limitation in the number of open TCP connections they can have at the same time, using
several of them for the same client lowers the overall number of clients that can be served
simultaneously.
In Figure 5-5 (c) we see the way SCTP could handle this problem: using a single
association with different streams for different files. This way, the server can save
resources, not having more than one association per client. But there is another advantage
of using a single SCTP association with several streams instead of using several TCP
connections. Apart from saving resources and avoiding delay establishing those TCP
connections, as all the streams belong to the same association, they all share the same
78

congestion avoidance mechanisms discussed in section 5.2. In Figure 5-5 (b), the different
TCP connections have different congestion avoidance parameters, and that can give the
client an excessive share of bandwidth. When using different TCP connections, each
connection does not know about the existence of the others, and behaves as if it was the
only one. So, in our example, the client would use three times the bandwidth of a single
TCP connection because at the server side these three connections are managed
independently. This would hurt other clients having a single TCP connection, and makes
congestion avoidance a harder issue.

Figure 5-5: Head of Line Blocking

As seen in Figure 5-1 the DATA chunks carry two sequence numbers: the TSN and
the SSN. The TSN is global for the whole association, and it is used to recognize packet
losses as a DATA chunk is uniquely recognized by its TSN. No matter to which stream it
is directed, the TSN of a new DATA chunk will always be set to the last used TSN
(a) A single TCP connection
TCP connection
2 1 6 5 4 3
Buffered
TCP user
6
5
4
3
2
(d) A single SCTP association sending unordered user messages
SCTP association Stream 0
1/0 2/0 6/0 5/0 4/0 3/0
SCTP user
Stream 0
6/0
5/0
4/0
3/0
2/0
Delivered
Delivered
Buffered

(c) A single SCTP association with several Streams
2/1 4/1
3/1
6/1
5/1
Delivered Delivered Buffered
SCTP user
Stream 1

Stream 2

Stream 0
Stream 2
Stream 1
Stream 0
SCTP association
6/1 5/1
4/1 3/1
2/1 1/1
(b) One TCP connection per file
TCP connection C
2 1
TCP connection A
2 1
TCP connection B
2 1
2 2
1
2
1
Delivered Delivered Buffered
TCP user
B
TCP user
C
TCP user
A
79

incremented by one
33
. The acknowledgements also use this number. In Figure 5-5 (c) the
TSN is the number that appears first in the datagrams, and globally identifies them.
In SCTP the TSNs mainly do the same work as the Sequence Numbers in TCP. So
they are not anything new, even if TSNs identify DATA chunks and Sequence Numbers
identify single bytes in the overall flow of data. The new stuff is the SSN and the Stream
Identifier that appear in the DATA chunks. The Stream Identifier, as can be guessed by its
name, identifies the stream to which this DATA chunk is directed. In Figure 5-5 (c) the
server makes use of three streams, but as the Stream Identifier is 2 bytes long, it could use
up to 65,536 different streams. This number was thought to be a reasonable compromise
between stream capabilities and overhead during the design of SCTP. The SSN identifies a
message sent to a given stream, so that all the pieces of a user message have the same SSN
and they carry consecutive TSNs.
As the user messages can be bigger than one MTU worth of data, there is a need for
Fragmentation, so bigger messages can be chopped and included in several SCTP DATA
chunks. Fragmentation is a new feature in SCTP, as TCP does not have any need for it
because it manages every single byte as an independent entity identified by its Sequence
Number.
We have already seen in section 3.1.1 how MDTP was able to fragment user messages
of up to 255 times the MTU minus the space of the headers (up to 371,280 bytes in the
typical case of using IPv4 and having a MTU of 1,500 bytes). SCTP uses another way to
fragment virtually any message no matter its length.
The mechanism used is similar to the one used by IPv4 to fragment datagrams when
the MTU of the next network the datagram must traverse is smaller than the size of the
datagram itself. In IPv4, the Fragment Offset field indicates where in the whole original
IPv4 datagram this fragment is located, and the More Fragments flag is set to 0 if this
fragment is the last one. Thanks to them (and the length of the fragments) the receiver can
organize the pieces received and determine when it has received all of them. In SCTP the
DATA chunks use two flags, the B (Beginning Fragment) flag and the E (Ending
Fragment) flag. In the DATA chunk containing the first part of an user message the B bit
is set, while the E bit is set only in the last one. So, an unfragmented message carried
inside a single DATA chunk will have both flags set, and an intermediate fragment of a
user message will have these two flags unset. Moreover, all the DATA chunks containing
fragments originated by a single user message will have the same SSN and their TSNs will
be consecutive. In this way, the only limitation we have is the TSN, and so there is a
possibility of sending fragmented messages up to 2
31
times the MTU (in Ethernet this is
more than 2 Terabytes). This message size is far bigger than any expected buffer space
within the next decades.
This way, the datagrams sent to a specific stream will be delivered to the user in the
same order they were put into the network (and so the client will have a clean copy of the
files the HTTP server sent). But as the files were sent using different streams, the order of
delivery of the files will not have to be necessarily the same in which the server transferred
them. So, we avoid the HOL blocking and the overhead of having one open SCTP
association per file transferred (the streams are cheap to manage).
As the final user of a web page is a human who will read it, the problem described
inside this HTTP context does not seem to be that horrible (but it is still quite nice to be

33
The TSN is a 32-bit number and so, once it reaches the value ffffffff (hexadecimal), it wraps and the
next value is 0. However, in the early versions of MDTP the 32-bit Sequence Number field could use
numbers from 0 to 7fffff000 (hexadecimal). Before reaching that value, a special procedure to reset the
Sequence Number should be done. This scheme did not last much.
80

able to start reading text before all the images have been downloaded for example).
However, we should think about some more critical applications such as telephony
signaling. We could for example use one SCTP stream to carry the signaling information
of a specific phone call, and even though several calls will be managed inside a single
SCTP association, they will be internally treated as different flows of data, and so delays or
losses in one of them will not affect the others. Other applications that could make use of
streams are real time multimedia applications. Thinking about teleconferences we could
send voice and image through different streams, and so, in the typical case that the link
does not have enough bandwidth to carry the images (at least not all of them on time), we
could still hear what is happening independently of the image.
But this is not the only feature of SCTP regarding the ordering of data. As seen in
Figure 5-1 the DATA chunk uses three flags, the B and E flags already commented, and
the U (Unordered) flag. An unordered user message is delivered as soon as it arrives (all
its fragments) to the destination. All the DATA chunks of an unordered message have the
U flag set. This kind of DATA chunks do not use the SSN or the Stream Identifier fields,
but they can still be reassembled thanks to the TSN and the B and E flags. These kind of
messages could also help to avoid the HOL blocking problem, as one could simply send
unordered messages that would be delivered to the upper user as soon as they arrive. This
in fact would provide a functionality similar to UDP with the possibility of sending
fragmented messages that would be reassembled in the right way at the destination. This
possibility is shown in Figure 5-5 (d) and as we see, it also solves the HOL blocking
problem.
So, as we have seen, SCTP DATA chunks are ordered at three levels:

At the user message level: All the DATA chunks containing fragments of a bigger
user message are always ordered at the destination, so the user messages are
always reconstructed in the right way. This level of ordering is always present and
it is provided by the B and E flags, as well as the TSN.
At the stream level: The user messages contained in the DATA chunks are also
ordered inside streams. However, DATA chunks directed to different streams are
unrelated and there is no required ordering among them, and the unordered
messages are always delivered upon arrival, without any attempt of ordering. The
SSN and the Stream Identifier provide this level or sequencing.
At the association level: All the DATA chunks sent inside an association are
sequenced so they can be unambiguously acknowledged. The TSN carries the
information to make this ordering possible.

Thanks to these three levels of ordering, the user messages can be sent to the receiver
either in the same order as they were sent by the data sender, partially ordered (using
streams), or completely unordered (sending unordered DATA chunks), even if the
messages must be fragmented. These three possibilities seem to be sufficient for almost
any ordering scheme.

5.4 Size matters: MTU discovery

We have already spoken in previous sections about the MTU and fragmentation, both
for TCP and SCTP, but we have not explained yet why this has to be done, and how the
MTU is calculated. In this section we will try to answer both questions.
81

Neither TCP nor SCTP have any Total Length field in their headers, so, apparently,
they do not have any need for fragmentation. If the user gives us 1 Mbyte of data, why not
simply make a datagram containing that 1 Mbyte in the user data field and send it to the
receiver? In TCP it would be as easy as that, and in SCTP we should make several DATA
chunks of up to 65,535 bytes long each including the DATA chunk header, and bundle
them in a single SCTP datagram.
In fact, this can not be done due to the limitations of the network layer. In IP there is
an upper limit for the length of the datagram of 65,535 bytes (in IPv4 this also includes the
header, in IPv6 it does not). Thus TCP or SCTP should provide datagrams to the IP layer
that are small enough to fit in an IP datagram. But again, the IP datagrams must be sent
using the physical network that connects the sender to the Internet, and the 65,535 bytes
limit of IP is too much for any existing network. So, that is not the real limit for the IP
datagrams and one can not see IP datagrams of 64 Kbytes going around in the Internet. The
physical network used to transmit the IP datagrams has a maximum size for its frames, the
MTU, varying from some hundredths of bytes up to several thousandths. There are several
reasons that explain this limitation in the maximum size of packets:

The Medium Access Control (MAC) layer protocol itself. If it has for example a
Length field occupying 2 bytes, then the biggest packet will be at most 65,535
bytes long.
Hardware limitations. If for example we are applying Time Division Multiplexing
(TDM), then the speed of the network limits the size of the biggest packet.
The power of the checksum decreases as the length of the packet grows.
Therefore, if we want to achieve certain value of protection against corruption,
packets must have a maximum size limit.

Table 5-1 shows some of the most typical MTU values found in the Internet, taken
from [Mog1990]. As can be seen, there is a big difference among the MTUs of different
networks. In real life nobody takes care of MTUs smaller than 576 bytes and some of the
values in the table are only rarely used (the 1,500 bytes MTU of Ethernet is by far the most
used one). Fortunately, there are some groups of similar MTUs called plateaus (with a
difference between the biggest and smallest MTU in the group stated in the Variation field
of Table 5-1). This makes things easier to the algorithm that discovers the MTU, as stated
below.
What will happen if the sender gives to the IP layer a datagram that fits in a Token Bus
frame (IEEE 802.4) and the receiver is located inside an Ethernet network? The answer (in
IPv4) is that the router in between will eventually fragment the datagram into pieces that fit
the MTU of the second network and send them. The IP layer of the receiver side will
reassemble the received fragments and deliver the original datagram to the TCP or SCTP
engine as if it would have never been fragmented. That same process would be done if
both the sender and receiver where located in their own Token Bus network, connected by
an Ethernet network in between.
So, if the IP layer is able to fragment the datagrams, why not simply make TCP or
SCTP datagrams that fit in the biggest IP datagram and let that layer make all the
fragmentation needed? There are some answers that prove that this is not the best way. Let
us look at an example in Figure 5-6.
In the example, a host sends a datagram that fits the local network. That big datagram
reaches the router, which must resend it through another network with smaller MTU. The
big datagram is too large for that network, so the router fragments it into, let us say, three
82

smaller pieces and sends them. All those fragments have to traverse the Internet before
they arrive to the final router, and they can follow different paths. In the example shown in
the figure, the third fragment is lost somewhere in the Internet, so even if the two other
pieces arrive correctly to the receiver, they will simply be stored there at the IP layer
waiting for the third piece, which will never arrive. After some time, the whole original
datagram will be retransmitted.

MTU Type of Network Variation (%)
65,535 Official maximum MTU
65,535 Hyperchannel
0.00
17,914 16 Mb IBM Token Ring 0.00
8,166 IEEE 802.4 0.00
4,464 IEEE 802.5 (4 Mb Max)
4,352 FDDI (Revised)
2.57
2,048 Wideband Network
2,002 IEEE 802.5 (4 Mb Recommended)
2.30
1,536 Exp. Ethernet Nets
1,500 Ethernet Networks
1,500 Point-to-Point
1,492 IEEE 802.3
2.95
1,280 Official Minimum IPv6 MTU 0.00
1,006 SLIP
1,006 ARPANET
0.00
576 X.25 Networks
544 DEC IP Portal
512 NETBIOS
508 IEEE 802/Source-Rt Bridge
508 ARCNET
13.36
296 Point-to-Point (low delay) 0.00
68 Official Minimum IPv4 MTU 0.00

Table 5-1: Some MTUs found in the Internet

So, the result of sending a big message is that in case any of the pieces is lost (and
having several pieces makes loosing one of them easier) the whole datagram and not only
the needed pieces will be retransmitted. Moreover, the fragmentation issue complicates the
receiver's operation. Things are somehow harder if several pieces must be assembled
before the datagram can be delivered to the IP user. The reassembly algorithms are not
very efficient, and some efforts have been done to design the best possible algorithm, such
as the one specified in [Cla1982].
Also, having to keep the fragments in the memory until the last one arrives
unnecessarily wastes resources at the IP layer in cases when one of the fragments has been
really lost. When dealing with lousy channels, fragmentation can cause a severe loss of
throughput as it is more difficult assuring the arrival of several smaller fragments than one
single big datagram. In an extreme case where every datagram is fragmented into for
example 10 pieces, and the channel looses in average 1 every 10 packets that traverse it,
we can effectively have zero throughput. Moreover, routers need more time to manage an
IPv4 datagram if they find out that it must be fragmented. So it is not surprising that in
83

case of congestion they start dropping datagrams that exceed the MTU of the next hop. A
discussion about the problems of IP fragmentation and how to overcome them can be
found in [Cha1998].

Figure 5-6: IP fragmentation

Finally, there is even a stronger reason to explain why limiting packets to the MTU
boundary is convenient: IPv6 routers do not fragment large datagrams that do not fit in the
next network's MTU. Instead of that, they send back to the datagram sender an ICMPv6
Packet Too Big message including the MTU of the network that was unable to carry such a
big IPv6 datagram.
So, not surpassing the MTU threshold is convenient, but being as close as possible to
that limit is also important. If we simply send small IP datagrams not to have any problems
with MTUs, including few bytes of user data in each packet, we waste network resources
because the datagrams have little information and a lot of overhead due to the IP header.
In a TCP connection, during the establishment phase, both endpoints exchange the
Maximum Segment Size (MSS) option. It carries the value of the maximum segment size
that the network of the sender of this option can manage. Basically it is set to the MTU of
the network minus the length of the IP and TCP headers (thus in a 1,500 bytes MTU
Ethernet, and using IPv4, the MSS would be set to 1,460). This establishes an upper limit
that must not be surpassed, but not the lower limit (there can be networks in the path from
one peer to the other with a lesser MTU value). SCTP does not have anything like that.
As we see, there are some reasons why transport protocols such as TCP or SCTP
should implement the so-called Path MTU Discovery algorithm. This algorithm is
specified for IPv4 in [Mog1990] and for IPv6 in [McC1996], but they both share the same
basic idea with some differences.
The IPv4 header has a flag called Don't Fragment (DF). This flag, when set, means
that routers should not fragment this IPv4 datagram (thus behaving as in IPv6). This flag
was meant to advice routers that the receiver might not be able to reassemble fragments.
So, in case the MTU of the next network is smaller than the size of the datagram, the router
sends back the ICMP Fragmentation Needed and DF Set message (including also the
MTU of the network if the router is [Mog1990] compliant). Therefore, the main idea of the
Path MTU Discovery algorithm is starting sending IP datagrams at most as big as the local
Small MTU LAN I iti l TSN N b f N b f Ad ti d R i I iti t T T A
Chunk Chunk Ch k Ch k V ifi ti T 0 S P t D ti ti
I iti l TSN N b f N b f Ad ti d R i I iti t T T A
2 1
Chunk Chunk Ch k Ch k V ifi ti T 0
2
1
3
2
1

I iti l TSN N b f N b f Ad ti d R i
I iti l TSN
Ad ti d R i I iti t T T A
3
2
1
The
Internet Large MTU LAN
Parameters
Initial TSN
Checksum
Chunk Chunk Ch k V ifi ti T 0 S P t D ti ti

2
1
84

hop allows (and also smaller than the received MSS in TCP), with the DF bit set. Then, as
soon as we receive one ICMP message telling us that the packet was so large, we start
using the immediately lower value for the MTU in Table 5-1 (or the MTU value inside the
ICMP message, if it includes that information). The values in the table are grouped in the
so-called plateaus, which are helpful to converge to the MTU value quicker. In this way
we notice when the MTU decreases.
To be aware when the MTU grows, the data sender increases every certain time the
value of the MTU, also following Table 5-1. If the datagram was so big and it receives
back the ICMP message it uses again the previous MTU value (or the MTU value inside
the ICMP message, if it includes that information). The lost packet will have to be
retransmitted (either retransmitting the same datagram with DF bit unset, fragmenting the
IP datagram at the source, or, for TCP, including the user data in several smaller
datagrams). Retransmitting one datagram every so often is considered a lesser evil than
sending smaller packets. Using this method we will discover the smallest MTU of the
networks involved in the path from the sender to the receiver, which is exactly what we are
looking for.
In IPv6 basically everything is the same, except for two subtle differences. First, there
is no DF bit at all, so packets will never be fragmented in a router. And second, when the
packet is too big for the next network the router sends back an ICMPv6 Packet Too Big
message that always includes the size limit of that network. So, the Table 5-1 is not used,
because when we receive the ICMPv6 message it always carries the exact information of
the next hop's MTU. So, when we want to test for a bigger MTU value, it is enough to send
a datagram as big as the MTU of the local network and then use the information in the
ICMPv6 message received, if any. Again, the packet that triggered the ICMPv6 message
will be lost, and so the information it carries should be retransmitted. This time no DF bit
can be unset, and we should either fragment the IPv6 datagram at the source, or use smaller
TCP segments.
There are some problems with the IPv4 implementation of the MTU Discovery, as
discussed in [Lah2000]. The most important one is the so-called Black Hole Detection. A
Black Hole is a router that discards an IPv4 datagram due to its size, but for some reason
the datagram sender never receives the corresponding ICMP message. This can be simply
caused by bugs in the router software, or due to firewalls that filter those ICMP messages
so they never reach their destination. This kind of problem is hard to find, and usually
leads to time-outs and finally the connection is aborted.
There are some other practical problems. If the peer tests a new MTU in a moment of
heavy transference of data, several IP datagrams will be lost before we receive the ICMP
message and we restore the old value of the MTU. Moreover, once a DATA chunk has
been created in SCTP and a TSN has been assigned to it, the TSN series must be followed,
and so there is no possibility of dividing the already created DATA chunk into two of them
with different TSN (something that in TCP is possible).
Among the possible solutions to these problems the author of this Master's Thesis has
decided to use another approach for MTU discovery in his SCTP implementation. Instead
of enlarging the MTU and sending the normal SCTP packets with the new bigger value, a
HEARTBEAT chunk of the desired length is sent instead (more about HEARTBEAT
chunks in section 6.1). This datagram will have the DF bit set as every datagram sent.
Since the internal structure of the HEARTBEAT chunk is not defined in [Ste2000] it is
easy to make a HEARTBEAT chunk of the desired size. If we receive the subsequent
HEARTBEAT ACK, it means that the MTU tested is valid, and it can be enlarged up to
that value. If instead of the HEARTBEAT ACK we receive an ICMP message telling us
85

that the HEARTBEAT sent was too big, we do not have to modify anything as the MTU
was never increased. Only in case the ICMP contains the MTU value, we should modify
our MTU if it is different than the one included in the ICMP message. If we do not receive
anything, it might mean that we are dealing with a Black Hole, that the network is
congested and the HEARTBEAT was lost, or that the destination is down.

5.5 I will wait for you: RTO calculation

The time the datagrams take to reach the receiver side and its acknowledgement to
arrive back to the sender, is a very important thing the data sender should know. This
measure of time is called Round Trip Time (RTT). The importance of knowing the RTT is
due to the need of having some measure that serves us to set the value of the
Retransmission Time-Out (RTO), which is used for the retransmission timers.
One could think that once we have the RTT, the calculation of the RTO should be
easy, but that is not completely true. This same problem has been studied for a long time in
TCP, and quite many algorithms have been tested and used. So, among the properties
SCTP has inherited from TCP, the RTO calculation is one of them, with the main
difference that SCTP keeps an RTO for every used destination address.
The RTT calculation is quite straightforward. It is as simple as saving the information
about the time when one TSN was sent, and when the acknowledgement is received, just
calculating how long it took to arrive. There are, however, some things that must be taken
into account. Of course when measuring the RTT, one has to use chunks that are
acknowledged upon receipt, such as INIT, COOKIE ECHO, DATA or HEARTBEAT
chunks (normally DATA chunks are used for the RTT measure of the Primary Address and
the HEARTBEAT chunks are used for the rest or addresses). When using the
HEARTBEAT chunks there is no problem, as one can always include inside the chunk
itself the time when it was sent. That information will come back inside the HEARTBEAT
ACK chunk (see section 6.1) and we will use it to make our measure. However, if DATA
chunks are used instead, we must take care because if a DATA chunk was retransmitted
one never knows which transmission of the DATA chunk triggered the acknowledgement.
This apparently simple rule has its own name: the Karn's algorithm (Karn's rule also says
that the RTO should be doubled every time a retransmission is issued). It is not surprising
that SCTP uses it. Some TCP implementations avoid that problem with the retransmissions
by using the TCP's Timestamps option, as defined in [Jac1992].
Another thing to take into account is that due to the use of delayed SACKs the TSN
received is not immediately acknowledged, and that affects to the accuracy of the RTT
measure. And it is precisely the variation in the value of the RTT (due to multiple reasons)
what makes the calculation of RTO harder. If RTO is set to a small value, then there is the
possibility of making retransmissions when they were not needed. In the opposite case, if
RTO is so large, we can delay for so long the retransmission of a lost packet. Therefore,
the problem is finding the right RTO, taking into account that its value depends on the state
of the network and it changes.
As has been already said, at the beginning MDTP was only thought to be used for
telephony signaling transport. That meant that it would be used in well-behaved networks,
which will not be congested and will always be under our supervision. In such an
environment, the RTT was not expected to change, and it was likely to be quite small. So
the designers chose a fixed value for RTO of 160 ms. Then they changed it to be 160 ms
plus the last calculated RTT. Finally, in the last versions of MDTP, RTO was set to 160
86

ms, plus the maximum RTT measured ever, plus the maximum time an acknowledgement
could be delayed (in MDTP that last value was negotiated beforehand in the establishment
phase).
The algorithm used was still so simple and with the first version of SCTP things
changed. As the designers were expecting to use it in the Internet instead of private
networks, the problem faced with the variation of RTT was completely different. The
expected probability density of acknowledgement arrival times changed from something
similar to Figure 5-7 (a), to something like Figure 5-7 (b).

Figure 5-7: Probability density of acknowledgement arrival times

As can be seen in the figure, in a private network there is no big problem to calculate
the RTO. But in the Internet the RTT measured can vary rapidly, making the election of
the RTO a more difficult task. So more elaborated algorithms must be used to calculate the
RTO.
TCP has always used a more complicated algorithm. As defined in [Pos1981c] a TCP
implementation should calculate the Smoothed Round Trip Time (SRTT) by using the
low-pass filter:

SRTT = SRTT + (1 - )RTT

Then, the RTO is calculated as SRTT. The value of was typically set to 7/8, and
was always set to 2. But in 1988 Van Jacobson showed that the fixed value of made it
fail to respond when the variance went up, being able to adapt to loads of at most 30%
[Jac1988]. To improve this he proposed to use the mean deviation of the values of RTT as
an easy to calculate approximation to the standard deviation. That algorithm was finally
published as an RFC in [Pax2000] and basically adds another calculation previous to the
(b) Probability density of acknowledgement
arrival times in the Internet
(a) Probability density of acknowledgement
arrival times in a private network
0
0. 1
0. 2
0. 3
0 10 20 30 40 50 60 70
Round Tri p Ti me (ms)
P
r
o
b
a
b
i
l
i
t
y
RTO
1
RTO
2
0
0.1
0.2
0.3
0 10 20 30 40 50 60 70
Round Trip Time (ms)
P
r
o
b
a
b
i
l
i
t
y
RTO
87

one of SRTT, which is the calculation of the Round Trip Time Variation (RTTVAR) with
the formula:

RTTVAR = (1 - )RTTVAR + |SRTT R|

Finally, the estimation of the RTO is modified to be:

RTO = SRTT + 4RTTVAR

Moreover, after every retransmission time-out the RTO value must be doubled. There
is a lower and an upper limit for RTO, usually set to 1 and 60 seconds respectively.
As happened with many other TCP features, the SCTP designers did not surprise us,
and SCTP inherited this scheme from TCP, being the one that appears in [Ste2000].

5.6 The ideas left on the way

During the design of SCTP, people proposed several modifications to the data
transmission scheme described that never had the support of the community and were
discarded. One of them was the use of a special chunk called the CANCEL chunk. The
aim of this chunk was not to retransmit stale data when SCTP was used as the transport
protocol of real-time applications. There was a proposal to send this special chunk
(basically a DATA chunk containing only the TSN number and no data) instead of the
original one if the sender already knew that the receiver would discard the retransmitted
packet because the information would arrive too late. In this way the otherwise wasted
bandwidth could be used for other purposes, and also this chunk would avoid that the
retransmission of an old chunk could delay the delivery of another salient packets.
Finally this proposal was discarded, mainly for two reasons. First, it would add some
more complexity to the protocol (especially when dealing with fragmented messages) to
add what finally seemed to be little gain. And second, because using CANCEL chunks the
data sender would not be sure of what the receiver really got (and thus we would convert a
reliable transport protocol into an unreliable one) and which is the state of its buffer. To
avoid the second problem, it was suggested that the SACK chunk could send a list of
cancelled chunks, but this made worst the first problem. It was then proposed to send the
same DATA chunk simply removing the data field. However, following a standard
Bekerley socket Application Programming Interface (API) a zero length read would
mean the end of the connection.
The final decision was to avoid sending zero length DATA chunks and leaving the
issue open so that the feature could be added in the future. That is the nice thing of having
a protocol with the extensibility possibilities of SCTP: if you are not sure that something
will work, you can always leave the problem to future generations.
During some time, the idea of creating and destroying streams on demand was also
considered, and an Internet draft was about to be written. However, people in the
distribution list agreed that the extended functionality was not worth the added complexity
to the protocol, because there was a rough consensus that this ability was not really useful
and that it would be sheldom used. One could always tear down an established association
and open a new one with the necessary number of streams. It was also pointed out that
there was the possibility of opening the maximum number of outbound streams. All in all,
you can always program your SCTP implementation in a way that a stream only consumes
88

resources if it has been used at least once (the only problem of this is that the SCTP data
receiver is the one that must be programmed in this way). Finally the proposal was
forgotten.

It is not all plain data
89

6. IT IS NOT ALL PLAIN DATA

Once we have established a new connection most of the SCTP datagrams will contain
either DATA or SACK chunks (or both). But they are not the only ones that are exchanged
by the peers involved in the association.
SCTP has a mechanism to verify that the peer endpoint is up and running even there is
not a data transfer under way. This procedure helps keeping track of the state of
associations that are sheldom used. Moreover, as SCTP peers can be multihomed, normally
only one of them, the Primary Address is used, while the others remain as a back up in
case the Primary Address fails. But if we are not sending data to those other addresses, we
need some other way to know their state. This is the so-called path heartbeat mechanism,
discussed in the next section.
However, using the path heartbeat mechanism, we can only know about a complete
malfunction of one of the peer's addresses, or the whole peer itself. So, SCTP has also a
way to tell the other host that something is going wrong at our side, even though it does not
necessarily prevent us from continuing working. This information may help the peer to
adapt better to our needs, or simply to know why things are not working as expected. We
will speak about this in section 6.2.

6.1 Are you alive? The path heartbeat mechanism

As has been already said, one of the main features of SCTP is its use of multihoming.
But when the peer endpoint has several different IP addresses in use, one of them is
considered to be the Primary Address and is the one to which the datagrams are normally
directed. The rest are kept as a backup and are only used if the Primary Address fails. This
is somehow problematic, because there must be a way to know in which state is an address
that is only rarely used.
Knowing the state of unused address is vital to make the right choice when the
Primary Address goes down. TCP has the controversial keepalive mechanism specified in
[Bra1989], that basically consists in sending data that is outside the window, which should
trigger the sending of an acknowledgement. Upon receipt of that packet, we conclude that
the peer is still alive, but if we do not receive anything it could both mean that the peer
might be down or that the packet was lost in the network, and so we should try again. It is a
controversial mechanism because it can tear down an otherwise perfectly good connection
if we are facing congestion in the network (and thus the packets are being lost). Moreover,
it normally consumes unnecessary bandwidth (if the connection is not being used, who
cares if it is still in good conditions?) that would even cost money for and Internet path that
charges for packets. Due to these reasons, the keepalive mechanism should only be invoked
in server applications that might otherwise hang indefinitely and consume resources
unnecessarily if a client crashes or aborts a connection during a network failure.
The equivalent algorithm in SCTP is the path heartbeat mechanism. It was added in
the early stage of design of MDTP, in the 4
th
version, because the designers were
concerned about not having a way of keeping track of the state of unused addresses (both
90

their reachability and their RTT). It has never been strongly criticized as the TCP's
keepalive mechanism, because it solves a similar yet different problem.
In SCTP we use only one destination address to send data, the Primary Address, and if
that address fails we must use any of the rest. But if there has been a failure in one of the
addresses, the probabilities that some other address is not working either are higher. This is
because normally the addresses are physically placed in the same host, and it is highly
probable that datagrams directed to any of those addresses will share part of its path to the
peer. As the idea consists in using one of the backup addresses to quickly solve the
problem, we must be quite sure that the new address used is in good conditions. In any
case, there are some people that think that this feature is in a way useless, so the path
heartbeat mechanism can be disabled if the upper user decides so.
Initially, the heartbeat type of datagram used in MDTP had a fixed format, with 8
bytes to include the time in which the datagram was sent. Upon receipt of the heartbeat
datagram, that information should be included in the answer directed to the source address
of the received datagram, so the heartbeat sender could make an RTT measure. The
sending frequency was initially set to one heartbeat sent every 4 seconds to any address
that stayed idle
34
during that time. Later, it was made adaptive adding the last measured
RTT to those 4 seconds. But this late change did not really make that much difference as
the value of the RTT is usually in the order of some tens of milliseconds. If a certain
amount of heartbeat datagrams were unanswered, the destination address was considered
as unreachable.
In the first version of SCTP the same structure was kept (using the HEARTBEAT
chunk and its acknowledgement, the HEARTBEAT ACK chunk). Nevertheless, to avoid
flooding the network with HEARTBEAT chunks, only a single HEARTBEAT chunk
could be sent (to any of the idle addresses) every 4 seconds plus the last RTT measured. In
the 5
th
version of the specifications of SCTP the path heartbeat mechanism was deeply
modified. These were the main changes:

Both the HEARTBEAT and HEARTBEAT ACK chunks were modified. Instead
of having 8 bytes to save the time in which the HEARTBEAT chunk was sent, it
included an opaque TLV structure of undetermined size that should be copied in
the HEARTBEAT ACK chunk. This was done because some SCTP
implementations were unable to choose the source address of their datagrams. So
upon the receipt of the HEARTBEAT ACK chunk it could be difficult to find out
to which address the HEARTBEAT chunk was sent. Having an opaque structure
gave more freedom to the implementations to include whatever they wanted.
The designers considered that being able to have only one unanswered
HEARTBEAT chunk per association at a time was not enough. So they undid the
previous change, managing every address independently of the rest.
The period of heartbeating was also modified, being set to the RTO of the address
to which the HEARTBEAT chunk was sent. That value was actually the smallest
period for heartbeating, because the upper user could define any heartbeating
period as long as it was bigger than the RTO. But usually all the RTOs are set to
the minimum value of 1 second, and so to avoid sending the HEARTBEAT
chunks in bursts, they should be sent once per RTO with jittering of +/- 50%, and
exponential back-off or the RTO if the previous HEARTBEAT chunk was
unanswered.

34
An address is considered to be idle during a period of time if no chunk eligible to measure the RTT
(INIT, COOKIE, DATA or HEARTBEAT) has been sent during that period or time.
91

The discussion about if only one HEARTBEAT chunk should be in flight per
destination address or per association continued. The final decision was to choose the latter
choice because when having lots of destination addresses the overhead produced by the
heartbeat algorithm was considered too much. So in the 10
th
version of SCTP specification
this feature was modified again, allowing only one unanswered HEARTBEAT chunk per
association.
Until the 9
th
version of the SCTP specifications, a HEARTBEAT chunk was
considered lost if it was unanswered one RTO (with jittering of +/- 50%) after it was sent.
But the designers wanted to give more freedom to the implementors to adjust this time so
they created the Heartbeat Interval concept. The Heartbeat Interval is simply a quantity of
time configurable by the upper user. When a HEARTBEAT chunk is sent to a specific
address, it is considered to be lost after the RTO of the address to which the HEARTBEAT
chunk is sent (with jittering of +/- 50%) plus the value of the Heartbeat Interval.

Figure 6-1: The path heartbeat mechanism in SCTP

This was the last change in the heartbeat algorithm. Figure 6-1 shows the internal
structure of the HEARTBEAT and HEARTBEAT ACK chunks. The Heartbeat
Information field typically carries the IP address to which the HEARTBEAT chunk was
directed, as well as the time when it was sent. So upon the receipt of the HEARTBEAT
ACK we can make the necessary measure of the RTT to be able to calculate the RTO (see
section 5.5). However, as the internal structure of the Heartbeat Information field is
completely undefined, one can use the heartbeat algorithm even to make a measure of the
MTU (more about this in section 5.4).
Sender-specific Heartbeat Info
Heartbeat Info Type = 1 Heartbeat Info Length
Chunk Type =4 (HEARTBEAT) Chunk Flags (Reserved) Chunk Length
Checksum

Chunk Type = 4
(HEARTBEAT)
Chunk Flags
(Reserved)
Chunk Length
Checksum
Heartbeat Info Type =1 Heartbeat Info Length
Chunk Type =5 (HEARTBEAT A.) Chunk Flags (Reserved) Chunk Length
Checksum

Chunk Type = 5
(HEARTBEAT A.)
Chunk Flags
(Reserved)
Chunk Length
Checksum
92

6.2 You are wrong: the Operational Error chunk

When designing a protocol, one always specifies how things should be done.
However, there are quite many circumstances that might make things go in a different way,
from simple implementation bugs or hardware failures, to corruption of packets in the
networks or even external attacks.
SCTP is quite a complicated protocol and many problems can appear. Some of them
can even be solved by the SCTP implementation itself if it knows what is happening. If the
problem is so important that it needs some fixes outside the SCTP protocol itself, one can
always take a look to the packet traces taken from a protocol analyzer. However, quite
many times one would need to know about the state in the other peer to really understand
what is going on. ICMP (and ICMPv6) is a protocol exclusively designed to report errors
in the processing of IP datagrams and to give some diagnostic tools to the network
manager. It is used among other things to verify the existence of a path going to a specific
IP address, to report congestion in a router, to indicate the impossibility of delivering a
specific datagram, or even to implement the Neighbor Discovery algorithm.
ICMP not only serves to debug problems at the IP layer, but it is also used for example
by TCP to implement the MTU Discovery algorithm (making use of the Packet Too Big
message) as has been told in section 5.4, or to modify its sending rate when the Source
Quench message is received (more about this in section 3.1.2). But there are some
problems that are too specific of the transport layer that can not be solved with ICMP.
Thus, it is interesting to have a mechanism that reports errors at the TCP or SCTP level.
TCP does not have any method to report errors. It faces transmission errors such as
received datagrams apparently not directed to the host that received them by responding
with a datagram that has its RST flag on. The receipt of such a datagram will abort the
connection, not having the TCP implementation any possibility of fixing any problem.
MDTP was not initially a very complicated protocol (in fact, the 6
th
version of MDTP
was considered as already too complicated for the designers, who cut part of its
functionality in the next version). Thus it did not have any way to report any error to the
peer endpoint. During its evolution it got complicated and when SCTP was born the
designers decided to include a mechanism to notify certain error conditions to the peer
endpoint. This design idea was translated into the inclusion of a certain chunk, the ERROR
chunk, whose shape has not changed at all during the whole design phase of SCTP. In
Figure 6-2 we can see how it looks like:

Figure 6-2: The ERROR chunk in SCTP

Error Causes
Chunk Type = 9
(ERROR)
Chunk Flags
(Reserved)
Chunk Length
Checksum
Parameters
Chunk Typ = 9 (ERROR) Chunk Flags (Reserved) Chunk Length
Checksum
93

The ERROR chunk contains one or more error causes. As shown in Figure 3-2, the
error causes are TLV structures since the 2
nd
version of the SCTP specifications. In the
final version, 10 different types of error cause have been defined, and some other ones
have been defined in the extensions to SCTP. Let us take a closer look at the error causes
that are present in [Ste2000]:

The Invalid Stream Identifier error cause is sent when the peer sends us a DATA
chunk directed to a nonexistent stream. Normally this means that the peer is
broken and there is not that much to do, but the receipt of this error would help to
fix the implementation bug that originated it.
In case a mandatory parameter is missing in a received INIT or INIT ACK chunk
the Missing Mandatory Parameter error cause should be sent in response. This
error cause was defined in the first version of SCTP when some variable length
mandatory parameters were expected to be defined in the future. The reality is that
the only such parameter is the State Cookie of the INIT ACK chunk and so the use
of this error cause is very limited. It probably means that the INIT ACK sender
(the server) is not working properly.
As explained in chapter 4 the State Cookie included in the INIT ACK chunk has a
limited lifetime. If the server is too restrictive and sets that life span to a very
small value, or if there are long delays in the path joining the two hosts involved
in the association, it can happen that when the COOKIE ECHO chunk reaches the
server, the State Cookie is already stale. In that case, the server should send an
error cause of the Stale Cookie Error type, giving a hint to the client about the
problem that aborted the establishment of the association. This error cause
includes the value in milliseconds of how late the State Cookie arrived. Normally
this will trigger another attempt to establish the association including the Cookie
Preservative parameter in the INIT chunk to try to enlarge the lifetime of the State
Cookie.
Another cause to abort the establishment phase in SCTP is not having enough
resources to be able to open a new association. In that case, the peer lacking
memory should send the Out of Resource error cause, so the initiator of the
association can try to establish the association later.
The INIT chunk can include three types of parameters specifying destination
addresses to be used by the server: an IPv4 Address, IPv6 Address or Host Name
Address parameter. The server might not support some of these address types and
so it should send in response the Unresolvable Address error cause including
those addresses that can not be used. The client might simply give up or try again
not including those address types.
The SCTP protocol has been designed to be easily extensible. However, this
means that in the future new chunks and parameters would be defined, and those
implementations that do not know about those extensions will not understand
them. Obviously the sender of those chunks or parameters might want to know if
they caused the desired effect. So both the Chunk Type and Parameter Type have
one bit that pushes the receiver to send back the Unrecognized Chunk Type or
Unrecognized Parameters error cause in case it is not compatible with such
extension (see section 3.1.2).
A broken implementation can set any of the parameters of the INIT or INIT ACK
to an invalid value. The receiver of such invalid chunk should send back the
Invalid Mandatory Parameter error cause to help fixing the bug.
94

The receipt of a DATA chunk that does not include any data is a symptom that the
data sender has some problem. The No User Data error cause is thus sent in
response to such a DATA chunk to help fixing the bug of the data sender.
If a COOKIE ECHO chunk is received showing that the peer has restarted, we
should set up a new association. However, if the receiver of such chunk is in the
SHUTDOWN-ACK-SENT state, it meant that the peer crashed when trying to
shutdown the association. In that case it makes no sense to establish a new
association. So the receiver of the COOKIE ECHO chunk must send a datagram
with an ERROR chunk containing the Cookie Received While Shutting Down
error cause, bundled with a SHUTDOWN ACK chunk (more about this chunk in
section 7.3). The receiver of that datagram should answer with a SHUTDOWN
COMPLETE chunk, and probably it will not try to re-establish the association.

Some of those error causes help the SCTP implementations to solve a problem that
might be transitory. But some others are normally included inside an ABORT chunk (see
section 7.2) instead of an ERROR chunk. This is because they are sent in response to a
datagram that proves that the peer has some important bug and then the association must
be finished. However, they are always useful and help finding problems that otherwise
would be more difficult to fix.

This is the end: The shutdown and abort algorithms
95

7. THIS IS THE END: THE SHUTDOWN AND ABORT ALGORITHMS

Releasing a connection is always easier than establishing it. But in any case, one can
find more difficulties than expected, and so the final design is the result of an evolution in
which the pitfalls that appeared were solved. The final procedure as appears in [Ste2000]
will be slightly modified, but in any case let us see which were those problems and how
they were managed.
As can be seen from the state diagram of Figure 3-3 there are two ways to end an
association, the graceful shutdown procedure, and the abortion of the association, but this
has not always been like that. In the next sections we will explain how the terminating
process evolved from a simple one-way procedure in MDTP to the abort and shutdown
procedures in SCTP. These two mechanism to terminate an association will be discussed in
separate sections.

7.1 Terminating associations in MDTP

In the initial versions of MDTP there were two ways of finishing an association. One
of them was the so-called Endpoint Drain, which basically consisted of sending a special
message to the peer endpoint of an association. That message did not need to be
acknowledged, and the association was simply terminated, erasing any information about it
in the sender side as soon as the message was sent, and the same in the receiver side as
soon as it was received.
The other way of finishing associations was the so-called Termination of an
Endpoint. When this procedure was called, all the associations were terminated by sending
another special message to all the peer endpoints. At the end, it was much the same than
the Endpoint Drain, sending a message that was not acknowledged to terminate the
association. The only difference was that all the associations were terminated and not just
one.
There was no acknowledged way of shutting down an association. The explanation of
this is the same than for some other early MDTP properties: MDTP was meant to be run in
an environment in which packet losses should be a really rare event. As the delivery of the
packets was assured by the reliability of the network itself, the acknowledgement did not
seem to be necessary. Moreover, one of the initial design principles was that associations
should be established and terminated as quick as possible. Thus, having to wait for an
acknowledgement was considered as a loss of time.
This was the schema used in the first 7 versions of MDTP. Again, as the protocol was
gaining popularity and starting to be looked at as a much general protocol than a simple
telephony signaling transport. This change in purpose had to be translated into changes in
its design. This rather innocent terminating process has to be changed, as anybody forging
the peer's address could tear down an association.
Several changes were done. The Termination of an Endpoint procedure (which was
meant to be used only rarely, when the endpoint had serious problems) was left as it was,
just changing the morphology of the datagram sent, which was then called an Abort
96

datagram. But the Endpoint Drain procedure was modified and renamed to Graceful
Shutdown of an Association. The new mechanism was, in any case, still quite simple.
The main improvement was the birth of the Verification Tag concept. However, during
that time, it was not located in the Common Header in every datagram. The Verification
Tag was inserted only when some susceptible information was carried inside the datagram,
such as establishment datagrams, stream management datagrams, and terminating
datagrams (basically all the datagrams but the ones that simply carried data or
acknowledgements). So, the shutdown initiator sent a special datagram (the Shutdown
datagram) carrying the peer's Verification Tag and the last in-order TSN received. But as
the peer at that point might still have some data to send, it could continue sending data
until all of it was acknowledged. After this it should erase all the information about the
terminated association and reply with the Shutdown Acknowledgement datagram, carrying
the shutdown initiator's Verification Tag. At that moment the shutdown initiator also
erased the information about the association and the whole process was finished.

7.2 A hard end for an association's life: Aborting an association
in SCTP

When SCTP came into play, this same scheme was used. Again, there was an abort
procedure, in which the party wanting to abort the association simply sent an ABORT
chunk and deleted the information about the association. And there was also a shutdown
procedure, in which one peer sent the SHUTDOWN chunk, which was answered with a
SHUTDOWN ACK chunk just as stated above.
The abort procedure was kept mostly the same as it was in the last versions of MDTP.
But then the fellows of the distribution list started to think about what would happen if the
ABORT or SHUTDOWN ACK chunk was lost when running SCTP in lousy
environments. In that case, one peer would be terminated while the other would still think
that it was up and running. Following the normal procedure of aborting an association
when a maximum number of consecutive data retransmissions had been issued, it could
take even minutes to consider the peer as unreachable. If the other peer is not sending data
or does not have the heartbeat mechanism enabled (see section 6.1) it could be that the
resources allocated for that association would never be freed. Therefore, the concept of the
Out Of The Blue (OOTB) datagrams arose. An OOTB datagram is one that seems to be
valid but that is not directed to any of the open associations (due to a bug in the sending
party, or because we crashed and have just recovered). In case a host received an OOTB
datagram it should reply with an ABORT. But as the host did not know the peer's
Verification Tag it should use the one carried in the incoming OOTB datagram instead
(sent with the Reverse Verification Tag, as it is said in the SCTP jargon).
There are some exceptions to the management of OOTB datagrams: the INIT and
COOKIE ECHO chunks (they fit in the OOTB datagram definition, but obviously, when
somebody is trying to establish a new association we do not know anything about it in
advance), the ABORT chunk (that should not be answered at all to avoid a datagram
storm), and the SHUTDOWN chunk (which should be answered with a SHUTDOWN
ACK instead
35
). Having the OOTB datagram concept, as soon as one datagram was sent to

35
In the initial designs the SHUTDOWN ACK chunk should carry an all zeros Verification Tag, but
this was modified so it carries a copy of the Verification Tag of the received datagram as in the case of the
ABORT chunk.
97

an already terminated host, we would receive an ABORT chunk back, thus quickly closing
our side of the association.
Not only the way of using the ABORT chunk was modified, but the ABORT chunk
itself. Initially the ABORT chunk did not have any body at all. It only had the compulsory
chunk header and nothing else. To be able to tell the peer something about the cause of the
error that originated the abortion of the association, one had to bundle an ERROR chunk
with the ABORT chunk (being the ABORT chunk the last one in the datagram, otherwise
the ERROR chunk would not be read). This was considered to be a clumsy thing to do, and
so finally the ABORT chunk was modified to be able to carry the same error causes used
in the ERROR chunks as explained in section 6.2.
Some time later, due to the obligation of sending an ABORT chunk in response to an
OOTB datagram, another modification was done. One of the reserved flags in the ABORT
chunk was renamed to be the T (TCB Missing) flag. This flag is set in case the ABORT
chunk is sent in response to an OOTB chunk, meaning that no Transmission Control
Block (TCB) was found belonging to this association. As not having the TCB means that
we do not know the peer's Verification Tag, a datagram carrying an ABORT chunk with
the T flag set has its Verification Tag field set to the same value as the Verification Tag of
the received OOTB datagram (i.e., it carries the Reverse Verification Tag). The receipt of
an ABORT chunk with its T flag set, normally means that the peer has restarted.
The final ABORT procedure was set to be as shown in Figure 7-1 below:

Figure 7-1: The abort procedure in SCTP

As can be seen there, the abort procedure is really simple, but still gives to the receiver
of the ABORT information to at least figure out the reason of its receipt. In any case, the
abort procedure should be rarely used, and any peer wanting to tear down an association
must always use the graceful shutdown mechanism explained in the next section. Only
when that procedure fails, of if the host has some internal problems, should the ABORT
chunk be sent.

7.3 I am done, could you finish as well? The shutdown procedure

We have already commented that the last versions of MDTP already had a way to
gracefully shutdown an association. In the first version of SCTP, the shape of the datagram
changes considerably, and thus the shutdown procedure was also modified. In any case, the

Parameters
Chunk Typ = 6 (ABORT) Reserveddf T Chunk Length
Checksum

Error Causes
Chunk Type = 6
(ABORT)
Reserved T Chunk Length
Checksum
98

basis of the process remained the same: the closing side sends a SHUTDOWN chunk, that
has to be answered by the peer with a SHUTDOWN ACK chunk once it has received the
acknowledgement of all the data it sent.
However, when SCTP went to the final revision (at least one of the first 6 final
revisions) a problem related with the graceful closing of the associations was highlighted.
When a host that has sent at least twice the SHUTDOWN chunk received a SHUTDOWN
ACK chunk with the Reverse Verification Tag, there was no way to differentiate one of the
next two situations. It could be that a previous SHUTDOWN chunk made it to the peer
endpoint but the corresponding SHUTDOWN ACK with the right tag was lost. Or it could
be as well that the peer endpoint simply restarted (possibly sending us an ABORT chunk
that was lost) and so it directly replayed to our SHUTDOWN chunk with the
SHUTDOWN ACK chunk carrying the Reverse Verification Tag. The problem is that the
SHUTDOWN chunk sender does not really know if the SHUTDOWN ACK chunk was
lost, or if the peer crashed (probably loosing some data).
In TCP this situation is somehow palliated with the existence of the TIME WAIT
state. This state basically consists in keeping the information about a connection for some
time (common implementation values are 30 seconds, 1 minute or 2 minutes [Ste1994])
after sending the final acknowledgement, just in case it is lost and we have to send another
one later.
There was another issue, the difference between TCP's and SCTP's shutdown
procedure. In TCP there is the concept of the half-closed connection
36
. TCP treats every
single duplex connection as two simplex ones that must be closed independently. So you
can tell to the peer that you are done with your data and you are not sending anything else,
thus closing your part of the connection, while the peer is still sending you data (so the
overall connection is just half-closed). This means that the TCP's closing procedure is a 4-
way handshake one, in which one of the peers has to send a datagram carrying the FIN
(from Finalization) flag set, telling to the other that it will not send any more data. Then it
receives the acknowledgement of that datagram (as a normal data acknowledgement, since
the FIN segment occupies one byte in the sequence space), and finally the procedure is
repeated on the other side. Normally this procedure is shortened, setting also the FIN flag
in the datagram that acknowledges the first FIN segment, and so half-closed connections
are not so common.
However, half-closed connections are really useful for a commonly used application
[Ste1994]: the Remote Shell (RSH). This application is used in the UNIX environment
and executes a command on another remote system. For example, if we are in a host called
helsinki and we type the command:

helsinki % rsh madrid sort < datafile

the sort command will be executed on the host madrid (which has a rshd server) with
standard input for the rsh command being read from the file named datafile. In that
moment rsh creates a TCP connection between itself (in the helsinki host) and the
program being executed on the madrid host (sort in this case). The rsh copies
standard input (datafile) to the connection established, and then copies from the
connection to standard output (our terminal). On the madrid server, the rshd server

36
Although it is mostly a matter of taste (is the bottle half-full or half-empty?), a half-closed connection
is one in which only one direction of data flow has been closed, while a half-open association is one in which
only one side of the connection thinks that it is open (see section 4.2). Sometimes the term half-open is used
in both cases.
99

executes the sort command so that it takes the standard input from the TCP connection
and copies the standard output to the TCP connection created. But the sort program, as
many other programs, cannot generate any output until all the input has been read (in other
words, when the end-of-file in the input is reached). Therefore, the sort program will
only start sending back the results of its action to helsinki as soon as we close the
outgoing flow of data from helsinki to madrid (thus providing the end-of-file mark
required). That is the reason why half-open connections are sometimes valuable in TCP
(the same result could have been obtained using two TCP connections, but using a single
one with half-close is better).
So, after some deliberation, the closing procedure was modified in the 11
th
version of
the SCTP specification. In Figure 7-2 we can see the chunks that are involved in the whole
procedure. As can be seen in the figure, the first two chunks were not modified: the
SHUTDOWN chunk carrying the Cumulative TSN Ack, and its reply, the SHUTDOWN
ACK chunk. However, another new chunk was added to the whole procedure, the
SHUTDOWN COMPLETE chunk. The TCB is erased at the initiator side as soon as it
receives the SHUTDOWN ACK, and the other side deletes its TCB when it receives the
SHUTDOWN COMPLETE chunk.

Figure 7-2: The shutdown procedure in SCTP

In addition, there was another modification. In case a SHUTDOWN ACK chunk is
received and there is no TCB belonging to that association (i.e., the SHUTDOWN ACK is
an OOTB datagram), the receiver will in any case answer sending back a SHUTDOWN
Cumulative TSN ACK
Chunk Type =8 (SHUTDOWN) Chunk Flags (Reserved) Chunk Length
Checksum
Cumulative TSN ACK
Chunk Type = 8
(SHUTDOWN)
Chunk Flags
(Reserved)
Chunk Length
Checksum
Chunk Type =14 (SHUTDOWNC.) Reserved T Chunk Length
Checksum
Chunk Type = 14
(SHUTDOWN C.)
Reserved T Chunk Length
Checksum
Chunk Type = 9 (SHUTDOWN A.) Chunk Flags (Reserved) Chunk Length
Checksum
Chunk Type = 9
(SHUTDOWN A.)
Chunk Flags
(Reserved)
Chunk Length
Checksum
100

COMPLETE chunk. But as the sender of the SHUTDOWN COMPLETE does not have
any knowledge about the association, it would use the Reverse Verification Tag. As the
ABORT chunk, the SHUTDOWN COMPLETE also has a T flag, which must be set in
these cases.
In any case the problem about loosing the SHUTDOWN COMPLETE chunk and
having one side with the association open is still there. But now there is a difference. Even
if we had to retransmit the SHUTDOWN ACK chunk and then we received a
SHUTDOWN COMPLETE with the T flag set, we know that the peer was done with its
data as it started the shutdown procedure sending us first the SHUTDOWN chunk. And we
also know that the peer received all our data since it had to acknowledge it before we were
able to send the SHUTDOWN ACK chunk. So no matter if the peer restarted or not, the
final result would have been the same.
However, the peer who sent the SHUTDOWN COMPLETE chunk can not be sure
that the other one received it and closed the connection. So, why not adding a fourth leg to
the procedure so we can wait for the acknowledgement of the SHUTDOWN COMPLETE
and we can close the association being sure that the peer did the same?
Unfortunately that does not work either. There is a famous problem regarding this
issue that is called the two-army problem (see section 6.2.3 of [Tan1996]). Imagine that
there is a Russian army in the middle of a valley, surrounded by two Finnish armies, one in
each of the two hills beside the valley. Each of the two Finnish armies is smaller than the
Russian army, so in case any of them tries to attack, it will be defeated by the Russians.
This situation is graphically shown in Figure 7-3.

Figure 7-3: The two-army problem

However, the two Finnish armies together are bigger than the Russian one. Therefore,
the Finns will only be victorious if they attack the Russians simultaneously with their two
armies. The point is that they have to agree on a date to do that attack. In the very
improbable case that none of the Finns in the two armies has a mobile phone or any other
way of communication with the other army, they should send one of their soldiers across
the valley to pass to the other army the information about the day of the attack. This way,
once both armies know the date, they can attack at the same moment and defeat the
Russian army. Let us imagine that the left Finnish army sends one of its men to the right
side to tell them to attack on December the 6
th
. But what would happen if the soldier were
captured in his way? Then, the right army would not know about the agreed date of the
attack, and they would not move, and so the left army would be defeated. Thinking about
this possibility, the left army probably will not attack either.
To avoid this situation, they tell the soldier to ask the right army to send another
soldier back, so they can be sure that their soldier made it to the other side of the valley.
But now, the right army is in a similar situation: they know that the left army is willing to
101

attack on December the 6
th
, but how can they be sure that their man will arrive safely to the
left hill? As there is the possibility that the soldier is captured, they can not take the risk of
charging into battle as possibly the left army will not do it either.
Let us improve the process by sending a third soldier from the left valley to tell the
right army that their brave soldier told them that they know about what will happen in
December the 6
th
. But then, how will the left army know that the right army knows that the
left army knows that the right army knows about the date? Adding a fourth trip will not
help. In fact it can be easily proven that there is no perfect way of doing the expected work.
Let us imagine that there is a perfect procedure, then the arrival of the last soldier is
necessary or not. If it is not necessary, do not send him and check if the previous soldier
was necessary or not, and so on. This way we end up having a procedure in which the last
soldier has to reach the other side of the valley or the whole procedure will fail, so, what
would happen if that last soldier was captured? The other army will not attack. In
consequence, the army that sent the soldier, knowing about this possibility, will not attack
either. Even if the soldier got through, the army that received the last soldier would know
that other army can not be sure that they know about the date, so they will not attack.
If the soldiers are replaced by SCTP datagrams, the Finnish armies are replaced by
two hosts having an SCTP association between them, and the valley with the Russian army
is replaced by a lousy channel as the Internet, we have exactly the same problem that when
closing the association and trying to be sure that the other host also closed it. As having a
half-close association is not as important as being defeated in a war, a three-way
handshake is usually good enough for our purposes.
With this modification, becoming the shutdown mechanism a three-way handshake, it
was closer to the one used in TCP (when the second and third leg are joint, as explained
above). But still, the scheme is asymmetrical, because one end forces the other one to stop
sending data: as soon as the SHUTDOWN chunk arrives, the upper user is told not to pass
any new data to SCTP. There was a proposal of using one of the flags in the SHUTDOWN
ACK to either simply mean that the SHUTDOWN chunk was received, or to also indicate
that the host has sent all its data and it is waiting for the SHUTDOWN COMPLETE. This
would mean that TCP's semantics would be kept in SCTP unmodified. But finally this idea
was discarded.
After some discussion there was a consensus about not keeping the TCP's half-close
semantics. The reason for that is that there are bad-behaved clients that never close their
flow of data and so the TCP connection is never released at the server, that ends up flooded
with open connections. To avoid this problem, some TCP implementations start a timer
while in the FINWAIT-2 state (after receiving the acknowledgement of the first FIN
segment sent). When that timer expires, they close the connection and so they do not keep
waiting for a datagram with the FIN flag set.
Yet another issue related with this has recently appeared. The peer receiving the
SHUTDOWN chunk will not send the SHUTDOWN ACK chunk until it has finished
sending its data. Meanwhile it will send its DATA chunks, and this would cause that the
peer wanting to close will not terminate the association as it is still receiving data.
Therefore, an implementation could decide not to close an association by simply accepting
new data from the upper user, or by sending duplicate DATA chunks. To avoid this, one of
the future modification in the SCTP specification that appears in [Ste2002b] is the
existence of a guard shutdown timer that is started right after sending the first
SHUTDOWN chunk. When that timer expires, we close the association no matter if the
peer is still sending us data.

And now? SCTP extensions and SCTP users
102

8. AND NOW? SCTP EXTENSIONS AND SCTP USERS

The Internet is a changing world. New technologies appear almost daily and a good
transport protocol should be able to adapt to new environments. TCP's possibilities of
extension are limited to 6 flag bits (reduced to 4 bits by the ECN the extension for TCP
described in [Ram2001]) and 40 bytes for options. This is far from enough to make new
versions of TCP which are backwards compatible and that include the features needed in
some fields. Precisely this was one of the reasons why a new transport protocol for
telephony signaling started to be designed instead of enhancing TCP.
Avoiding this same limitation in the future was one of the design principles of SCTP.
The extensibility possibilities of SCTP are practically unlimited due to its internal
structure, graphically shown in Figure 3-2. The only problem found with this architecture
is related with its fixed common header. In the first versions of SCTP there were some
reserved bits in the common header that could be used in the future to indicate a special
processing of the whole datagram including the header itself. However, that reserved field
disappeared when the checksum was enlarged from 16 to 32 bits.
The designers missed this lacking of spare bits in the common header right after the
final publication of SCTP specifications, during a long discussion about if it was better the
use of a strong and expensive checksum, or a weaker but cheaper one. It would have been
easier having several checksum schemes and having a flag in the common header telling
which one had been used. It is true that an equivalent result could have been achieved
negotiating the use of one or another checksum during the establishment phase, or simply
using a weak checksum and including a new chunk carrying the strong checksum when
necessary, but they would have been less efficient (more about the checksum problem in
section 9.1).
Apart from this problem with the common header, we can say that the extensibility
possibilities of SCTP are excellent, and there is even one section in RFC 2960 that deals
precisely with this: how the protocol can also be extended through IANA. IANA is not
only in charge of the Chunk Types, Parameter Types and Error Cause Types, but also port
numbers and Payload Protocol Identifiers.
In October 31
st
of 2000, the RFC containing the SCTP specifications was released.
SCTP is not widely deployed yet and the existing implementations are still experimental
ones (there were 19 different SCTP implementations tested in the third interoperability
session organized in April 2001 in Nice, France). Nevertheless, there have been already
quite many attempts to extend its features. Some of them were done even before SCTP was
completely finished. Also, there are some applications that find interesting the use of SCTP
as their transport protocol.
In the following sections we will take a look to the main extensions to SCTP, and also
we will quickly speak about some of the Internet-Drafts and RFCs that document
applications that use SCTP.

103

8.1 The SCTP extensions

RFC 2960 took about 27 months of work to be written (since the first version of
MDTP), which is quite a long time. The main problem was that lots of people wanted to
modify lots of things, and there was always the desire of changing the way of saying things
and adding more and more features to the basic SCTP specifications.
The authors (mainly Randall R. Stewart and Qiaobing Xie) made an excellent work
trying to reach consensus whilst avoiding making unnecessary changes that would delay
even more the publication of the RFC. One of their best weapons against changes was
precisely the SCTP capabilities to be easily extended. So the tactic used was not adding in
the basic specification of SCTP any fancy feature that could be helpful in restricted
environments, but writing an extension document instead. This would cause that the whole
SCTP specifications would be spread all around many RFCs, but that is better than not
having any of them at all.
The most important extensions to SCTP are described in the next subsections.

8.1.1 This is my new address: Adding and deleting addresses, and per
stream flow control

One of the main features of SCTP is its ability to use several origin and destination IP
addresses in a single association. However, one of the biggest problems that this feature
has is that it is not flexible at all. The addresses to be used are negotiated during the
association establishment and they are not changed at all, unless the association is restarted
(which is not a clean way to do it).
There were some reasons why the ability of changing the IP addresses in use was
important. The first idea was to simply be able to plug or unplug the network cards of one
host and add or delete the corresponding IP address to or from the association. This would
not only help to remove on the fly a broken card and replace it with another one (having a
different IP address assigned than the old one), but also would provide the same type of
services that exist in the SS7 world that allow a link set to add an additional link without
interference with the operation of the link set.
Another problem that this new extension could solve was related with the renumbering
feature of IPv6. In IPv6 it is possible that a site renumbers all of its nodes, for example
when it switches to a new network service provider. This already causes some problems to
TCP connections, that must be terminated before the renumbering takes place (see section
4.1 of [Tho1998]). TCP implementation can at most tell the upper user that one address is
about to be changed. Also, as the new address should be available in advance, most of the
TCP connections should already be using the new address in the moment the old one is
released. But for long lasting connections this will not help either.
The Internet draft containing the extension to add or delete IP addresses is called (for
obvious reasons) the AddIP draft [Ste2002c]. It has evolved a lot since its first release,
published even before the SCTP specifications were ready as an RFC. Figure 8-1 shows
graphically this evolution:
Initially the AddIP draft included two different extensions. One was obviously the
possibility of adding and deleting our source IP addresses, but as the addition or deletion
request should be acknowledged, another new feature was added in the same draft. The
authors of the extension thought that there would be some other extensions that would
make requests that should be reliably acknowledged. Therefore they specified a general
way to send parameters inside a new control chunk that should be acknowledged (basically
104

making use of serial numbers for both the chunk and the parameters), the Reliable Request
Procedure. Moreover, this new feature was designed not to interfere in the congestion
control mechanism defined in SCTP, so the reliable requests were treated as if they were
DATA chunks from the congestion control point of view.
At the same time, almost the same authors of the AddIP draft wrote another one that
added the possibility of applying flow control on a per stream basis, called the Srwnd draft
(from Stream Receiver Window), which used the Reliable Request Procedure. As happens
with the use of multihoming, the avoidance of the head-of-line blocking by using several
streams is one of the basic features of SCTP (see section 5.3). But so far the flow control is
performed both on a per association and per address basis (as explained in section 5.2). So,
there is still the possibility that one single stream uses all the resources exhausting the
buffer capacity of the receiver. Basically this extension proposes dividing the Receiver
Window space among the used streams. As it is expected that a single SCTP association
will carry the signaling data of several telephone calls, one per stream, this new extension
was warmly welcome as a very valuable one.

Figure 8-1: Evolution of the AddIP draft

With the time being, the AddIP draft was divided into two: the AddIP draft itself and
the RelReq draft. This was quite a straightforward movement, as the Reliable Request
Procedure was a very general one that had nothing to do with its specific use to add or
delete addresses. Apart from this change, another nice feature was added to the AddIP
draft, the possibility of recommending the peer which address should be its Primary
Address, a valuable suggestion when the Primary Address is about to be deleted. In
parallel, the Srwnd draft continued evolving.
In the next months, the RelReq draft was modified to better provide the functionality
needed by both the AddIP and Srwnd drafts. Thus, the initial aim of having a very general
way for reliably transferring control chunks, was being lost. So, after some comments in
the list, and some IETF meetings, it was decided that the RelReq draft would be discarded,
and its functionality would be added to the AddIP draft, adapting it better to its needs. As
stewart-srwnd
2
Sep 11 2000
Nov 3 2000
stewart-addip
2
Sep 7 2000
Nov 15 2000
sigtran-srwnd
1
Jan 31 2001
Jan 31 2001
sigtran-relreq
2
Feb 2 2001
Feb 23 2001
sigtran-addip
2
Feb 2 2001
Feb 23 2001
tsvwg-addip
5
May 7 2001
Jan 29 2002
D Dr ra af ft t I In nf fo or rm ma at ti io on n
Draft Name
N. of Versions
Date First Vers.
Date Last Vers.
D Dr ra af ft t E Ev vo ol lu ut ti io on n
U Up pp pe er r D Dr ra af ft t u us se es s
L Lo ow we er r D Dr ra af ft t
105

the Srwnd draft also used the RelReq draft, it was merged with the new AddIP draft, and
some modifications were done, such as the possibility of limiting the flow of a stream to a
number of bytes as well as to a number of user messages.
In the whole life of the AddIP draft, 14 versions have been issued. The last AddIP draft
written so far was published in November 2001. It is expected to become an RFC soon,
and some SCTP implementations already include its functionality. It is expected to be first
tested in the next interoperability session, which will be carried out in San Jos, California,
during March 2002).

8.1.2 Can I trust you? Reliable and unreliable streams

By definition, as it appears in its specifications, "SCTP is a reliable transport
protocol". That means that the data sent to the peer using SCTP is guaranteed to reach its
destination (unless the network or the hosts are not working at all) by retransmitting the
data in case it is not acknowledged.
When transporting telephony signaling this seems to be the right thing to do, but SCTP
has a wider range of operation and so there are some applications that do not really want
this. For example, if we have joint a multicast group and we are receiving the emission of a
radio station through the Internet, or if we are just using any application that transmits
digitized speech over IP. In these cases, it is usually desirable not to retransmit the lost
packets and not delay the transmission of the new ones. This will cause that the guy
listening will realize that there are some cuts and interruptions when the packets are lost.
But the data can be consumed at the receiver at most as quickly as it is produced (if we are
for example hearing uncompressed audio at a fixed rate of 64 Kbps, there is no way we can
hear a minute of radio emission in less than one minute). Thus, retransmitting old packets
while holding the transmission of the new ones that are being created will also cause
interruptions, and the sending queue (and the receiver's buffer) will be every time fuller
and fuller.
In some other applications, data simply expires. So retransmitting it when it is already
stale not only makes subsequent data more likely to arrive late to the receiver (since its
transmission must be delayed while the previous data is retransmitted), but also floods the
network with useless packets that will be discarded when they reach their destination.
However, not only the data that arrives late or never do it cause problems. In our
example of transmission of real time voice over IP, it is also preferable to listen to a
slightly corrupted emission than not listening anything at all. If someone is hearing the
speech of somebody speaking, surely few corrupted bits will not make such a big
difference to the listener's ears, while having interruptions is much worse. So discarding a
datagram that arrived corrupted to its destination is not always the best option.
UDP already solves these two problems to some extent. In UDP there are no
acknowledgements, and thus no retransmissions at all, but that also means that there is no
congestion control. Moreover, the checksum can be turned off by simply setting it to all 0,
so this alleviates the second problem, but leaves unprotected not only the data carried
inside the UDP datagram but also the whole UDP header. Thus UDP is not precisely the
best possible solution.
So, some of the authors of the SCTP specification started to write a draft called Usctp
(from Unreliable SCTP) [Xie2001a]. There they defined a new parameter used to set some
outbound streams as unreliable, so that a DATA chunk sent to any of those streams will
never be retransmitted (this concept evolved to a limited number of retransmissions). In
turn, when the retransmission timer expires, a special chunk used to advance the peer's
106

Cumulative TSN Ack was sent (note that a similar feature, the CANCEL chunk, was about
to be included in the SCTP specifications and finally was discarded, as explained in section
5.6). This way, one could have a single association in which some streams could be used to
transmit unreliable data which is not likely to be retransmitted (for example real time
multimedia traffic), and some other streams to transfer reliable information (such as data
files). This draft started to be written even before the final specifications of SCTP were
published as an RFC.
This was the only new feature contained in the Usctp draft in its first version. Later on
in the next release, the problem about corrupted data stated above was also addressed
including a special kind of DATA chunk that was only partly covered by the Adler-32
Checksum. But this brought another problem, since before checksumming an incoming
datagram, one should parse it and take a look to see if any of those special DATA chunks
was present, and then calculate the Adler-32 Checksum over the right bytes. So in a way
you were being too confident, considering that the data inside the datagram was not
corrupted before you verify that, and so, why calculating the checksum at all?
After some discussion in the list about a way of avoiding this problem, the proposed
solutions were just making things too complicated. So, this feature about including data not
covered by the checksum was finally dropped. The point was that the checksum
verification procedure would consume too much processing time, and in any case we could
simply accept SCTP datagrams having a wrong Adler-32 Checksum if we want (however,
the advantage of the Usctp extension is that it would protect the headers).
Further discussion in the IETF meeting made that the whole draft was finally
withdrawn after its 6
th
version and more than one year of work involved. Things were
getting too hard while the advantages of having this draft were shrinking. Some of the
discussed problems were these [Xie2001b]:

Having new functionality makes always things more complicated. Feedback
received from SCTP application designers was that things were already fairly hard
and that there could be interoperability problems if a transport service is too
complex.
There were already some limitations in the Usctp draft. For example, unordered
DATA chunks could not be used, and unreliable data could not be fragmented.
That made the whole draft less useful than expected.
TCP is the basis of SCTP. So the designers were a little bit scared of going so far
away with SCTP. All the data should get through, and the receiving application
could use unordered DATA chunks to deal with datagrams that arrive too late.
Canceling a piece of data sent but not yet acknowledged was quite a new feature,
and the sender would not really know if the receiver got the data or not. Even if
people wanted this feature, more experience was needed.
There should be many easier ways to send data unreliably than using SCTP. Most
of its complexity comes from its reliability and so making things harder just to
avoid using that feature does not make much sense.

There was a long discussion about if the draft should be forgotten or not, but after a
couple of weeks of mail exchange in the distribution list, it was accepted that the draft was
not interesting any more, and so it expired without any new release. But this does not
necessarily means that SCTP will never have an unreliable data transfer mode. This was
already the second attempt of including such functionality and surely in the future this
possibility will be further studied and developed.
107

8.1.3 Be ready to adapt to your environment: The adaptive Fast Retransmit
algorithm

The fast retransmit is an algorithm already commented in section 5.2 that helps
avoiding a retransmission time-out by making a quick retransmission of a certain TSN
when subsequent TSNs are arriving to the peer. It was mostly copied from [All1999]
modified to adapt it to SCTP's characteristics (mostly due to the existence of Gap Ack
Blocks in the SACK chunks).
It has proven to be a valuable algorithm that improves throughput. But SCTP has a
nice feature that has not been used to make the fast retransmit algorithm even better: the
Duplicate TSNs at the end of the SACK chunk. It was included after a proposal made by
an Internet congestion expert, but it is not presently used. It was left there to be used in the
future after some studies show the way it could be used.
As for [Ste2000] we must receive four consecutive SACK chunks reporting one TSN
as missing before we fast retransmit that TSN. Why four and not another quantity? Simply
because it seems to be a reasonable number not to give time to the retransmission timer to
expire, but at the same time to avoid unnecessary fast retransmissions.
But different networks have different behaviors and what could seem to be a
reasonable trade off in one of them it is not so in some other. The reordering of packets in
the network is one of the worst enemies of the fast retransmit algorithm. It can trigger
unnecessary fast retransmissions, that not only waste network resources but also diminish
the throughput as already seen in Figure 5-4. And, opposite as what was generally thought,
reordering is not such a strange event. The study done in [Ben1999] shows that, under
certain network load, more than 90% of TCP connections suffered from reordering.
The receiver of a duplicate TSN must compulsorily notice it by including a Duplicate
TSN inside a SACK chunk. That will tell the data sender that it made an unnecessary
retransmission. So it could undo the last changes in the congestion avoidance variables
(namely the values of the cwnd and ssthresh variables) that would get the data sender back
to the state previous to the retransmission.
This was the basic idea that the author of this Master's Thesis, the main author of
SCTP (Randall R. Stewart) and one expert in Internet congestion, co-author of TCP's
congestion avoidance algorithms (Mark Allman), used in [Ari2001] to modify the current
SCTP's fast retransmit algorithm. The main procedure was creating, every time a fast
retransmission was issued, a record containing the TSNs retransmitted, the cwnd and the
ssthresh values. If some time later, the data sender receives a SACK chunk containing as
Duplicate TSNs all those TSN that were retransmitted, it would mean that the whole fast
retransmission was unnecessary and then ssthresh should be set to the old value of cwnd,
so SCTP could exponentially reach again that value in few RTTs.
That would undo the damage done in the sender's transmission capabilities, but that is
not all. In SCTP, one needs exactly four chunks reporting a TSN as missing before a fast
retransmission is issued. If due to the stated algorithm we notice that some of the
retransmissions are spurious because of reordering in the network, that number of SACK
chunks could be increased. On the other hand, if the retransmission timer expires and some
of the TSNs to be retransmitted were already reported as missing several times, it might
well mean that our fast retransmission threshold is quite high, and so it should be
diminished. In this way the fast retransmission algorithm becomes adaptive.
Finally, the authors decided that some real testing was needed before the draft could
be published, to prove that the whole algorithm really worked. What is more, the draft
actually covered two different problems. One of them was what to do when realizing that a
108

spurious fast retransmission was issued, and the other was how to know that the
retransmission was bogus. The receipt of the Duplicate TSNs was a very neat way to know
it, but not the only one. For example, a surprisingly quick acknowledgement of a
retransmitted TSN might also mean that the SACK was sent due to the receipt of a
previous copy, and so the retransmission was unnecessary anyway. Even a specialized
mechanism could be created, such as an extension to include time stamps that would tell
the data sender if a SACK was triggered by the last transmission of a TSN or by a previous
one.
Finally the document was divided into two. In [Bla2001a] appears the algorithm to
detect spurious retransmissions by either the inspection of the Duplicate TSNs in SCTP, or
by the use of the TCP extension to report receipt of duplicate data segments documented in
[Flo2000]. The other document, [Bla2001b], discusses the algorithm that both reverts the
congestion control state previous to the fast retransmission (modifying the values of cwnd
and ssthresh) and modifies the fast retransmit threshold. It also allows to introduce some
delay before we make the fast retransmission, instead of making it right after the fast
retransmit threshold is reached (if we realize that we made an unnecessary retransmission).

8.2 Is anybody using SCTP? Some applications that use SCTP

SCTP was born about one year ago and it is not widely known yet. But in any case
there exist already some applications that use SCTP as their transport protocol. Most of
them are, however, new protocols related with telephony signaling transport, which was
the initial field for which SCTP was designed. Let us comment first about those adaptation
protocols.
To make SS7 signaling transport over IP networks possible, an SS7-IP gateway must
provide the means for translating SS7 messages into IP datagrams, and vice-versa.
However, that translation can be done at several layers. Even though there is no need to
provide translation at all levels in the SS7 stack, authors are writing adaptation modules
that can translate SS7 signaling at the SCCP level, as well as at the MTP3 and MTP2 (there
are even two proposals for MTP2).
The SCCP-User Adaptation Layer (SUA) [Lou2002] is a protocol designed to
transport any SCCP-User signaling (such as TCAP) over IP using SCTP, in a seamless
way. SUA can be used between a Signaling Gateway (SG) and an IP signaling endpoint (a
Service Switching Point (SSP) or Service Control Point (SCP)), but can also provide
transport of SCCP user information directly between IP endpoints rather than through a
SG. The SG is needed only to assure interoperability with SS7 signaling in the switched-
circuit network.
SUA is able to support both SCCP unordered and in-sequence connectionless services,
as well as bi-directional connection-oriented services, either with or without flow control
and detection of message loss and out-of-sequence errors (i.e., SCCP protocol classes 0
through 3).
As seen in Figure 2-3, there is an interface defined between ISUP and SCCP.
However, it has not been implemented yet and thus SUA will not be able to carry ISUP
messages until that interface becomes available. The first release of SUA was submitted in
March 2000, and after 11 versions it is expected to become an RFC soon.
The MTP3-User Adaptation Layer (M3UA) [Sid2002] works at a lower layer than
SUA. It directly replaces MTP3, and it provides support for the transfer of all SS7 MTP3-
User Part messages, such as ISUP or SCCP over IP using SCTP.
109

M3UA can be used between an SG and a Media Gateway Controller (MGC) or IP
telephony database. M3UA extends access to MTP-3 services at the SG to remote IP
endpoints. In case the IP endpoint is connected to several SGs, the M3UA layer at the IP
endpoint keeps track of the status of configured SS7 destinations and routes messages
depending on the availability and congestion status of the routes to these destinations via
each SG.
M3UA provides accommodation of larger blocks than the 272-bytes limit of MTP2,
without the need of segmentation and re-assembly at the upper layer. At the SG, the
M3UA layer provides interworking with MTP3 management functions to support seamless
operation of signaling between the SS7 and IP networks. The M3UA layer at an IP
endpoint keeps the state of the routes to remote SS7 destinations and may request the state
of remote SS7 destinations from the M3UA layer at the SG. The M3UA layer at an IP
endpoint may also indicate to the signaling gateway that M3UA at an IP endpoint is
congested.
M3UA was started to be defined more than two years and a half ago, and as SUA, it is
expected to become an RFC soon.

Figure 8-2: SS7-IP adaptation layers

At the MTP2 level we have two different protocols that translates SS7 into IP. One of
them is the MTP2-User Adaptation Layer (M2UA) [Mor2002], and the other is MTP2-
MTP1
MTP2
MTP3
SCCP
TCAP

STP

SG

SCP
(a) Adaptation with SUA
MTP1
MTP2
MTP3
SCCP
IP

SCTP
SUA
NIF
IP

SCTP
SUA
TCAP
MTP1

MTP2
MTP3
SCCP
TCAP

STP

SG

SCP
MTP1

MTP2
IP
SCTP
M2PA
MTP3
(d) Adaptation with M2PA
IP
SCTP
M2PA
MTP3
SCCP
TCAP
MTP1

MTP2
MTP3
SCCP
TCAP
MTP1

MTP2
IP
SCTP
M2UA
NIF
IP
SCTP
M2UA
MTP3
SCCP
TCAP
(c) Adaptation with M2UA

STP

SG

MGC
MTP1
MTP2
MTP3
SCCP
TCAP
(b) Adaptation with M3UA
MTP1
MTP2
MTP3
IP
SCTP
M3UA
SCCP
IP
SCTP
M3UA
SCCP
TCAP

STP

SG

MGC
110

User Peer-to-Peer Adaptation Layer (M2PA) [Geo2001]. They both replace the MTP2
protocol, adapting the MTP3 protocol to the SCTP/IP stack.
M2UA provides an equivalent functionality to its users as MTP2 provides to MTP3. It
is used between a SG and a MGC. The SG keeps the availability state of all MGCs to
manage signaling traffic flows across active SCTP associations.
M2PA also provides the same functionality than MTP2. However, unlike M2UA,
M2PA supports complete MTP3 message handling and network management between any
two SS7 nodes communicating over an IP network. IP SPs work as normal SS7 nodes
using the IP network instead of the SS7 network. Every IP signaling point has an SS7 point
code and thus they are SS7 nodes.
M2PA makes easier the integration of SS7 and IP networks by allowing nodes in the
SS7 networks to access IP telephony databases and other nodes in IP networks making use
of SS7 signaling. In turn, M2PA makes possible for IP telephony applications to access
SS7 databases.
Both M2UA and M2PA are still Internet-Drafts, and as SUA and M3UA they are
expected to become RFCs within the next few months.
The differences among these four adaptation layers can be seen in Figure 8-2. The
figure represents the case in which an SG connects one STP in the SS7 network with an
SCP that is located in an IP network, and shows the protocol stack used when the STP
sends TCAP queries to the database. In case of (c), we observe that there is a new protocol
layer that we have not mentioned yet, the Nodal Interworking Function (NIF). Basically,
the NIF serves as an interface between MTP2 and M2UA within the SG.
In [Mor2001] another adaptation layer is specified, the ISDN Q.921-User Adaptation
Layer (IUA). The ITU-T recommendation Q.921 defines the data link level protocol used
in ISDN signaling, also known as the Link Access Procedures on the D-channel (LAPD).
IUA replaces Q.921 and uses SCTP as the transport layer, and provides transparent
adaptation to Q.921 users, such as Q.931.
However, SCTP has also been pointed to be the transport protocol to be used with
protocols not related with telephony signaling. There are proposals to run SIP and SDP
over SCTP ([Ros2001b] and [Fai2001] respectively). In [Jun2001], we can read a
description of the usage of the Transport Layer Security (TLS) [Die1999] protocol using
SCTP. The TLS protocol provides communications privacy over the Internet and allows
client/server applications to communicate in a way that is designed to prevent
eavesdropping, tampering, or message forgery.
Changes to be made in RFC 2960
111

9. CHANGES TO BE MADE IN RFC 2960

The design of SCTP was done taking a lot of care in every change, listening to every
discrepant voice and trying to consult to the specialists in fields such as congestion
avoidance algorithms, Internet security, or even the creators of IPv6, so there would not be
unexpected problems in the future.
In addition, two interoperability sessions were organized before the publication of
RFC 2960, to empirically ensure that there were no major hidden problems in the
specifications of SCTP. Those test sessions showed some weak points of SCTP that were
modified. However, all the care taken seemed not to be enough, and after the publication of
the SCTP specification as an RFC, another interoperability session was organized and
some more errors where found. Simple debate in the distribution list also brought some
other issues.
All those defects of editorial or technical nature that appear in RFC 2960 are
documented in several Internet-Drafts. All those drafts documenting the changes to be
made in the present specifications of SCTP will be merged with the RFC 2960 itself to
produce a new and modified RFC in the future, as happened for example with the
specifications of IPv6. In the next sections we will comment about those changes.

9.1 The checksum dilemma

The history of the checksum that appears in the common header of every SCTP
datagram has been quite active. We can differentiate several stages inside this evolution.
At the very beginning of SCTP design, no checksum was used at all. Then, weak
checksums such as the ones used in IP and TCP were used. Later one, when the designers
started to realize that SCTP would not be cloistered inside the SS7 networks, they looked
for stronger data integrity protection. However, at the end the Adler-32 Checksum was
chosen, which finally proved to be weaker than expected.
Several months after the RFC 2960 was published, the designers decided to modify the
checksum scheme used. This is the biggest modification that will be made to the SCTP
specifications and unlike all the others, this change is not backwards compatible.
In the next sections we will follow the steps taken in the election of the different
checksum mechanisms used during the design of SCTP, and the reasons beneath those
changes.

9.1.1 The good old days: Letting others protect the data integrity

Initially, MDTP datagrams did not carry any kind of checksum as shown in Figure
3-1. As with many other initial features of MDTP, the reason behind this was that the
designers where designing something for an ideal world as SS7 networks are meant to be.
The detection of corrupted data was delegated to the communication links and platforms
used that carried the MDTP packets.
Some time later it was noticed that the focusing of the problem in this way was not the
right thing to do. As the telephony networks are going digital (the local loop, however,
112

continues being mostly an analog twisted copper pair), corruption of packets due to noisy
channels is infrequent. Much of the corruption happens, not during data transmission, but
during buffering in switches when data is copied. This kind of data corruption at the
network layer cannot be detected at lower layers, so some kind of protection against it
should be supplied at a higher level.
At the beginning, MDTP frames were supposed to be carried inside UDP datagrams,
which protects the data with the so-called TCP Checksum [Pos1980], also known as
Internet Checksum. The TCP Checksum is simply the 16-bit one's complement of the
one's complement sum of a pseudo header containing information from the IP header (the
source address, the destination address, the protocol, and the UDP length), the UDP
header, and the data, padded with zero octets at the end (if necessary) to make a multiple of
two octets.
This sum is used by IP (not including any kind of pseudo header in its calculation),
UDP and TCP, and catches any 1-bit error in the data, and over equally distributed values
of data it is expected to detect other kind of errors at a rate proportional to 1 in 2
16
.
However, it has two major limitations [Pat1995]: the sum of a series of 16-bits values is
the same, regardless of the order in which the values appear, and the value of the checksum
is unaffected by the addition or deletion of zeros.
The TCP Checksum is used in Internet because it offers a sharp choice between
performance and error detection capabilities, but in the SS7 world there is a need for
stronger protection against corrupted messages. MTP calls for less than one undetected
error every 10
9
received packets
37
, and the TCP Checksum was not sufficient to meet this
requirement. Some studies published in section 3.3 of [Pax1997] show that in average, one
every 5,000 TCP packets arrives corrupted to the destination. This high level of corruption
in TCP packets is mostly due to router bugs and not because of problems in the
transmission lines. As with a 16-bit checksum one expects not to detect 1 corrupted packet
every 65,536 received packets with errors, the final result is that in average about 1 packet
out of 310
8
packets arrives corrupted and is accepted as a valid one. So, some other
different scheme should be used to meet the SS7 requirements.

9.1.2 The quest for a stronger scheme: The Cyclic Redundancy Check

Being unable to accomplish the SS7 requirements, it was proposed to make
compulsory the use of the IPsec Authentication Header (AH) [Ken1998b] in the IP
packets carrying MDTP datagrams. As the AH includes a strong error check (an Integrity
Check Value (ICV) using by default HMAC with either MD5 or SHA-1 as defined in
[Mad1998a] and [Mad1998b] respectively) its use would diminish the number of
undetected errors. But AH was not a cheap solution in terms of the time it takes to be
calculated. Moreover, the multihoming capabilities of MDTP would make things even
worse, since the keys used are valid only for a given pair of source and destination IP
address. So, the use of AH was never recommended.
Some other solutions where inspected and finally it was decided to include a Cyclic
Redundancy Check (CRC) of 16 bits to protect data from corruption. After some debate, it
was decided that the checksum would protect both the data and the MDTP header, and that
it would be calculated using the MDTP datagram itself, not including any pseudo header
containing IP parameters, as it is done for both TCP and UDP. Although the CRC was also
a 16-bits long checksum, the main difference between it and TCP Checksum is that due to

37
The ANSI specification of SS7 allows at most one undetected error every 10
9
received packets. The
ITU-T limit is more restrictive, calling for less than one undetected error every 10
10
packets.
113

the way the CRC is calculated, it provides specific protection against some usual errors.
Section 3.2.2 of [Tan1996] discusses the internal structure of CRC and we will also show
here its basic properties.
A CRC is a Polynomial Code. A Polynomial Code treats a bit string as a
representation of a polynomial with the only coefficients of 0 and 1. Therefore, if we have
a bit string of length k it represents the degree k-1 polynomial b
k-1
x
k-1
+ b
k-2
x
k-2
+ ... + b
1
x
+ b
0
, where b
n
represents the value (1 or 0) of the bit in position n of the bit string. As an
example, the 8-bits string 11010011 would represent the degree 7 polynomial x
7
+ x
6
+ x
4

+ x
1
+ 1. As the values of the coefficients can only be 1 or 0, the polynomial arithmetic is
done modulo 2, according to the rules of algebraic field theory. So, in modulo 2,
subtractions and additions are both equivalent to the EXCLUSIVE OR logic operator,
without carries for addition or borrows for subtraction. A long division is carried out the
same way as it is in binary except that the subtraction is done modulo 2.
If a Polynomial Code is used, then the sender and the receiver agree in advance on the
use of a Generator Polynomial, G(x), of degree g (thus represented by a string containing
g + 1 bits). For that polynomial, both the high and low order bits must be 1. So, the whole
idea is that, if we have a datagram of m + 1 bits (m must be bigger than g) that represents
the polynomial M(x) of degree m, we will append at the end of M(x) a g-bit long checksum
so the whole datagram including the checksum will be divisible by G(x). So, the algorithm
for computing the checksum is the next:

We append g zero bits to the low-order end of the datagrams, so it contains now m
+ g + 1 bits and corresponds to the degree m + g polynomial x
g
M(x).
We divide the obtained polynomial x
g
M(x) by G(x) using modulo 2 division.
Subtract the remainder of that division (which is always g or fewer bits) from
x
g
M(x) using modulo 2 subtraction. The result is the checksummed frame that
will be transmitted. We will call C(x) the polynomial it represents, which will be
divisible by G(x).

When the receiver gets C(x) it must only make the division using the same G(x), and if
it finds out that the remainder is not 0 it means that the received frame is corrupted.
One can show mathematically the kind of errors this checksum can identify. If the
receiver instead of C(x) receives a frame with errors, it can be represented by C(x) + E(x).
Due to the use of modulo 2 algebra, the coefficients of E(x) with a non-zero value will
represent the bits that arrived corrupted. So when the receiver applies the method and
makes the division, we will have [C(x) + E(x)] / G(x) = E(x) / G(x), since C(x) / G(x) = 0.
Only the errors that correspond to a multiple of G(x), so E(x) / G(x) = 0 will be undetected,
all the rest will be caught.
In the case that E(x) is composed of a single term, so E(x) = x
i
, where i is the position
of the errored bit, if G(x) has more than one term, it will never divide E(x) so all the single
bit errors will be detected using this method.
If there have been two isolated errors, then E(x) = x
i
+ x
j
where i > j, and so E(x) can
be represented as E(x) = x
j
(x
i-j
+ 1). As the coefficient of G(x) for x
0
is 1 (its low order bit
must be 1 by definition), x
j
cannot be a factor of G(x) and so in this case G(x) will not
divide E(x) (and so it will discover the double error) unless it divides (x
i-j
+ 1). There are
simple low degree polynomials such as x
15
+ x
14
+ 1 that will not divide any term of the
form (x
k
+ 1) for any k below 32,768 (thus giving protection against any double error in
frames smaller than 32 Kbytes). This is a major improvement over the TCP Checksum, as
it does not detect any double bit error if the errored bits are separated a multiple of 16 bits.
114

Another interesting mathematical property of polynomials is that there is no
polynomial with an odd number of terms that has (x + 1) as a factor in the modulo 2
system. This is easy to show, as if E(x) contains (x + 1) as a factor it could be expressed as
(x + 1)F(x), and then evaluating E(1) = (1 + 1)F(1), as 1 + 1 = 0 in modulo 2, it would
mean that E(1) = (1 + 1)F(1) = (0)F(1) = 0. This is not compatible with having an odd
number of terms, as in that case substituting x for 1 we would have an addition modulo 2
of an odd number of 1 terms, thus the result would be 1 and not 0. This means that if G(x)
contains (x + 1) as a factor, it will detect all the errors with an odd number of bits swapped.
Finally, a Polynomial Code with a generator polynomial G(x) of degree g will detect
all burst errors
38
of length g. A burst error of length k is represented as E(x) = x
i
(x
k-i
+
... + 1), where i determines how far from the right-hand end of the received frame the burst
is located. If G(x) contains a x
0
term, it will not have x
i
as a factor, and so if the degree of
the parenthesized expression is less than the degree of G(x), the remainder can never be
zero. In case the burst length is g + 1, the remainder of the division by G(x) will be zero if
and only if the burst is identical to G(x). By the definition of a burst error and G(x), the
first and last bit will be in both cases 1, so the burst will match G(x) if the g 1
intermediate bits match. If all combinations are regarded as equally likely, the probability
of such an incorrect frame being accepted as valid is
g-1
.
For any other burst error longer than the degree of G(x), the probability of a bad frame
getting through unnoticed is
g
, assuming that all bit patterns are equally likely.
Therefore, due to its much better capabilities of catching transmissions errors, the
designers decided to include a 16-bits CRC checksum in the header of MDTP. They chose
as the generator polynomial, the one standardized by the ITU-T, x
16
+ x
12
+ x
5
+ 1, as can
be found in section 8.1.1.6.1 of [ITU1996], called CRC-CCITT. It contains (x + 1) as a
prime factor and so, it catches all single and double errors, all errors with an odd number of
bits, all burst errors of length 16 or less, 99.9969% of 17-bit error bursts, and 99.9985% of
18-bit and longer bursts.
Some people were against this decision, as they claimed that it takes too long time to
calculate the CRC-CCITT compared to TCP Checksum. It is true that in software, the
calculation of the TCP Checksum is faster. This is because processors are efficient when
doing additions, and because the implementation of the TCP Checksum has been enhanced
in its long life in many ways, for example by the use of incremental update when it is used
in IP, as shown in [Rij1994]. However, when implemented in hardware, a CRC checksum
can be implemented as a simple shift register circuit with some XOR gates. This makes it
much faster than any implementation of TCP Checksum (hardware implementation of the
TCP Checksum has also been studied, for example in [Tou1996]). In practice this circuit is
almost always present (especially for the calculation of the CRC-32 in Ethernet and Token
Ring LANs), and looks like the one in Figure 9-1 for the case of CRC-CCITT:

Figure 9-1: Hardware implementation of CRC-CCITT

38
A burst error of length k is an error in which all the errored bits are contained in a fragment of the
original frame of at most k bits long, being the first and the last bits of that substring errors, irrespective of
the value of the rest of the bits of the substring between the first and last one.
X
15
X
14
X
13
X
12

X
11
X
10
X
9
X
8
X
7
X
6
X
5
X
4
X
3
X
2
X
1
X
0

Message to be Checksummed
115

Of course, the circuit shown in Figure 9-1 is a simplified one, but it is still valid. The
bits (what we noted as x
g
M(x) with g equal to 16) enter from the right side, and every time
a new bit enters the circuit, the shift registers move the bits they contain to the left and
introduce a new one from its right (the one generated by the XOR gate). When the last bit
of the message has entered the circuit, the value of the bits in the registers is the checksum.
When sending a message, that value should be subtracted (modulo 2) from x
g
M(x) so
when the receiver gets the checksummed message and applies this same algorithm to the
received message, it will have all the bits in the registers set to 0 at the end of the process.
With this method any leading 0 bit would leave the whole circuit in the same state (all
zeros), and then a truncated message that starts with zeros will have a valid checksum. So,
the CRC-CCITT sets all the bits of the registers to 1 before starting the process.
The software implementation is not that simple. However, there are quite many
enhancements that make things not so painful. Having a table of 256 words of 2 bytes
allowing calculating the CRC one byte at a time (instead of one bit at a time) makes the
whole calculation much faster. Normally bigger tables are not used (one of 65,536
elements for example) because then the cache memory does not help that much and the
whole process is actually slower. In [Wil1993] there is a very complete discussion about
different implementations of different CRCs with high performance.
In any case, especially for small devices, the CRC calculation could be very costly,
and so it was made optional, using a flag in the MDTP header. When that flag was set, it
meant that the datagram was protected by the use of CRC-CCITT. This mechanism was
used during the first five versions of SCTP too.

9.1.3 From a 16-bit to a 32-bit checksum

The CRC-CCITT, with its 16 bits and its error detection capabilities, is usually limited
to use with messages of less than 4 Kbytes long (there are enough ways to corrupt
messages larger than 4 Kbytes that catching only 99.9985% of them is considered
inadequate). Initially, SCTP was designed for telephony signaling transport, where the
messages are usually a few hundred bytes long, so 4 Kbytes was not a big limitation.
Nevertheless, there was an increasing feeling that SCTP would be able of broader
applications. Therefore, the designers started to look for a 32-bits checksum.
Some checksums were proposed. Firstly, there was an idea of simply extending the
TCP Checksum to 32 bits (TCP-32 Checksum). The problem was that it would keep the
same kind of problems than the TCP Checksum. Maybe the overall rate of undetected
errors would go from 1 in 2
16
to 1 in 2
32
, but such kind of checksum was never tested, and
this idea was quickly discarded.
Some people suggested the use of the Fletcher Checksum. This checksum was firstly
published in 1982, and it has been studied for a long time. It was even proposed to be used
in TCP in [Zwe1990] using a TCP option, but in practice TCP has never used any other
checksum different from the TCP Checksum.
There are two flavors of the Fletcher Checksum, the 8-bit Fletcher Checksum (which
results in a 16-bit checksum) and the 16-bit Fletcher Checksum (which in turn gives a 32-
bit checksum), the second one considered as a possible candidate. Basically, the 16-bit
Fletcher Checksum considers the message that is going to be checksummed as a list of 16-
bits fragments, from F[1] to F[n], and uses two 16-bits accumulators, A and B, initially set
to zero. The main loop calculates (with i ranging from 1 to n):

116

A = A + F[i]
B = B + A

The additions are made calculating the 16-bit one's complement of the right side of the
addition (although some versions use the two's complement instead). So, at the end of the
cycle, A contains the 16-bit one's complement sum of all the 16-bit fragments of the
message, and B will contain (n)F[1] + (n-1)F[2] + ... + F[n]. Those two 16-bit
accumulators are joint at the end of the process to form the checksum (A65,536 + B).
When performed in two's complement, the 16-bit Fletcher Checksum detects all single
bit errors, a single error of less than 16 bits in length, and all double bit errors separated by
16 bits or less [Pat1995]. Unlike in the TCP Checksum, in the 16-bit Fletcher Checksum
the information about the order in which the bytes appear in the original message is
reflected in the value of the checksum, so if a message is corrupted in a way that some
bytes are reordered, the checksum should catch the error. The major known failing of the
checksum is that it is unaffected by zeros being added or deleted from the front of a packet
[Pat1995].
But still, Fletcher algorithm was stronger than TCP Checksum, and it has been studied
for almost two decades (and used in the ITU-T X.224 / ISO 8073 standard), so at the end a
variation of it was chosen, the so-called Adler-32 Checksum [Deu1996]. It is an extension
and improvement of the 16-bit Fletcher Checksum. The differences are that the A and B
accumulators are still 16-bit long, but the F[n] fragments are only 8-bit long. Moreover,
the additions are done without making any kind of one's or two's complement, and they are
done modulo 65,521 (the biggest prime number smaller than 65,536). The A accumulator is
initially set to 1 (thus avoiding the leading zeros addition or deletion problem stated above)
and B is set to 0. Finally, the result is stored in 32 bits as B65,536 + A.

9.1.4 The Adler-32 Checksum: We have a problem

The Adler-32 Checksum was the one finally chosen, and it is the one that appears in
[Ste2000]. However, this is not the end of the story: several months later, some researchers
took a look at the SCTP checksum, and they complained. The problem with the Adler-32
Checksum is that, for short packets, it is noticeably weaker than the alternatives. Of course,
since the primary application of SCTP is signaling transport and call signaling typically
uses packets of less than 200 bytes long, this is a major problem. As the accumulators in
Adler-32 Checksum perform additions of bytes, it is unlikely (or even impossible) that
small packets could make those accumulators wrap, and so, it is guaranteed to give poor
coverage of the available bits. The resulting checksum is not random enough, having a
high correlation with the number of bytes of the packet (when that packet is small enough).
The A accumulator is a simple addition of the values of the bytes in the message to be
checksummed. As the maximum value for an 8-bit unsigned number is 255, and A is
initially set to 1, it will never wrap if the message contains less than (65,521 1) / 255 =
257 bytes, which is the normal case. If we make a deeper study, and we consider that
obviously not all the bytes will be set to 255, and that the value 0 is quite popular, the
results are even worse.
The B accumulator is the addition of the values of A. In the best case, all the bytes of
the packet are set to 255, and so, as A is initialized to 1, if we have n bytes in the message
then, B = (1 + 255) + (1 + 255 + 255) + ... = n + 255 + 2255 + ... + n255 = n + 255(1
+ 2 + ... + n) = n + 255[(1 + n)/2]n. The B accumulator wraps when it reaches 65,521,
so solving the equation for B = 65,521, we have that for packets smaller than 22 bytes, it is
117

impossible that B wraps. If we consider 128 as the value of the bytes instead of 255, that
value grows up to 32 bytes. So, B will almost always wrap (the smallest SCTP datagram is
32 bytes long), but not A. So we are almost wasting the 16 bits of the A value.
This problem started a long discussion of about three months long (which has not
really finished). People in the list were debating if it was worthy to modify the checksum
(because that would be backwards incompatible), or if another kind of method should be
used (such as an initial negotiation). Finally, it was accepted that as not too many
implementations of SCTP were already done, and in any case they could be easily changed
(as they were implemented in software), it was better to use a single checksum algorithm
than having the possibility of using several ones, as usually that design possibility causes
interoperability problems.

9.1.5 Going back to the roots: Using the CRC-32 as the checksum

Once it was decided that the checksum should be changed, the problem was which one
to use. To overcome this problem, there were some proposals. The simplest one was
modifying the Adler-32 Checksum algorithm to make the additions two bytes at a time.
That would make the accumulators to wrap always. Some other proposals that were
already discarded in the past, such as TCP-32 Checksum or 16-bit Fletcher Checksum,
were studied again. But the one that finally seemed to win the competition was the CRC-
32, which uses the same principle as any CRC-16 but with twice the bytes.
The main burden the use of CRC-32 had, was that people thought that it would be too
time consuming to calculate it. Small devices with low speed (such as mobile phones)
would especially suffer from this problem. But the error detection capabilities of a CRC-32
were much better than the ones of any other checksum. To illustrate this, let us see Table
9-1 with figures taken from [Oti2001a]:

Volume 1, 42 GB
(28,283,456 passes)
Volume 2, 40 GB
(26,992,768 passes)
Checksum
Used
Failures (%)
Failures
(relative to
16-bit
Fletcher)
Failures (%)
Failures
(relative to
16-bit
Fletcher)
Bytes
16-bit Fletcher 2.084,752 1.00 2.090,849 1.00 5.58
Fletcher-Adler 0.183,160 11.38 0.145,491 14.37 9.25
Adler-32 0.000,025 84,234.29 0.000,007 282,189.00 22.55
CRC-32 0.000,000 0.000,000 ~32.00

Table 9-1: Error-Detection capabilities of several checksums

Four error detection algorithms were used, three of them already discussed before,
being the fourth the so-called Fletcher-Adler Checksum. This checksum basically is the
Adler-32 Checksum but it initializes the A accumulator to 21,845 instead of 1, so making
the A and B accumulators more likely to wrap (which really did not help to the randomness
of its value).
The test consisted in transferring the data stored in two hard drives using SCTP as the
transport protocol. So, the files were chopped in pieces of up to 1,452 bytes, (a MTU of
1,500 bytes was considered, and the overhead of IPv4 header, SCTP header and DATA
chunk header was subtracted from it). Then the SCTP common header and the DATA
118

chunk header were added, and the checksum calculated. As most of the errors are not
originated in the network lines (due to noise) but when copying the datagrams inside the
buffer of routers, a stuck bit error was simulated. In the simulation the value of a specific
bit was changed every 4 bytes, so simulating an error in a memory cell of 32 bits with one
of those 32 bits damaged. The results are quite amazing, as 16-bit Fletcher Checksum
performs much worse than any other checksum, and even if Adler-32 Checksum was much
better than 16-bit Fletcher Checksum, it still failed to find the error many times. The last
row of the table, Bytes, means that if we suppose that with an n-bit checksum we should
miss just one error every 2
n
ones, then having the quantity of errors over the total, the
number shown in the table would be the number n of bytes so we miss one error every 2
n
.
So, once the people were convinced that CRC-32 was much better than any other, they
were still unsure if the extra time involved in calculating it would be a major problem. As
explained above, when the CRC is implemented in hardware, it is much quicker to
calculate than any other checksum, and when several implementation enhancements such
as the use of tables (to make the calculations of the CRC more than one bit at a time), the
difference in the calculation time in software is drastically reduced. So, Randall R. Stewart
made some measures of the time involved to calculate several checksums [Ste2001a], and
those results appear in Table 9-3 below:

Checksum Used Minimum Calculation Time
(s)
Maximum Calculation Time (s)
CRC-32 3 128
Adler-32 2 91
Modified Adler-32 40 60
16-bits Fletcher 15 50
TCP-32 3 15

Table 9-3: Calculation time consumed by several checksums

The way this measure was done can be consulted in [Ste2001b], but basically it was
calculated over pieces of 1,000 bytes of random data, and applying the different algorithms
to the same data. The Modified Adler-32 Checksum is the Adler-32 Checksum making the
additions two bytes at a time, instead of one. There were some later calculations with
improved algorithms that showed that the real overhead of the calculation of CRC-32 over
Adler-32 Checksum would be of about 5-10%, and so it was decided that it was worthy to
use CRC-32 instead. All in all, after 3 months of discussion, Moore's Law
39
says that the
processors were already about 12% more powerful, so it did not really make any sense
continuing with the discussion.
However, once it was decided that the checksum used by SCTP should be changed to
CRC-32 there were still more problems. It has been explained above in this section that the
calculation of the CRC is basically finding which bits should be put at the end of a
datagram so the polynomial represented by all the bits in the datagram is divisible (modulo
2) by the Generator Polynomial. So, it makes sense to insert the CRC field at the end of
the datagram. In this way, when verifying the value of the CRC one should only make the
division and if there is no remainder, one supposes that the datagram is not corrupted. This,
apart from being a clean way to calculate the checksum, would save some time in saving

39
Moore's Law exists since 1965, when Gordon E. Moore predicted that the number of transistors per
integrated circuit would double every 18 months [Moo1965]. Moore's Law still holds true.
119

the value of the checksum, setting those 32 bits to zero, and make a comparison at the end
of the process. By the time this is written, it seems that the 32 bits of the checksum will not
be moved from its present position.
Moreover, there is not a single CRC-32 but several, thus another decision to be done
regarding the checksum. The most popular CRC-32 is the one standardized by the ITU
[ITU1996] and used in Ethernet, Token Ring or Fiber Distributed Data Interface (FDDI)
networks among others, which is, x
32
+ x
26
+ x
23
+ x
22
+ x
16
+ x
12
+ x
11
+ x
10
+ x
8
+ x
7
+
x
5
+ x
4
+ x
2
+ x + 1. Another one is the one studied by Castagnoli in [Cas1993] which is,
x
32
+ x
28
+ x
27
+ x
26
+ x
25
+ x
23
+ x
22
+ x
20
+ x
19
+ x
18
+ x
14
+ x
13
+ x
11
+ x
10
+ x
9
+ x
8
+
x
6
+ 1. This second polynomial produces checksummed frames that have bigger Hamming
Distance
40
for messages of up to 1 Kbyte (and so it is better for low-noise binary
channels). Apparently, the one studied by Castagnoli (CRC-32c) is the one that will be
chosen.
The document that talks about this checksum change in SCTP is [Ste2002a], but it is
not the only proposal submitted. There are another two Internet-Drafts regarding the
checksum change. In one of them, [Ahm2001], there is a proposal to use CRC-32 instead
of Adler-32 Checksum but still having the possibility to interact with old SCTP
implementations. The algorithm is the easiest possible: apply CRC-32 in the first received
packet containing the INIT chunk and if it does not work, then apply Adler-32 Checksum
and keep applying the one that worked for the whole life of the association. This allows
establishing associations with old implementations, but only in the case we are not the
initiator (otherwise, our datagram containing the INIT chunk will use CRC-32 and the peer
will discard the whole packet). The proposal specified in [Oti2001b] goes further,
removing the checksum field from the common header and defining a new CHECKSUM
chunk that should also be the first one in every SCTP datagram, which can then use
several different checksums. Basically, this is the same as extending the common header to
contain an identifier of the checksum we are using. It is likely that the simple change from
Adler-32 Checksum to CRC-32c documented in [Ste2002a] will be the chosen option.
Two excellent discussions about the different checksum algorithms commented in this
section with their advantages and disadvantages appear in the Internet-Drafts [Cav2001]
and [She2001].

9.2 Errata: The Implementors Guide

The change in the checksum is the most important one to be done, but not at all the
only one. After about a year of inspecting the SCTP specifications it seems that they have
plenty of mistakes. Fortunately, most of them are just minor typos of editorial nature
caused by the well-known cut and paste habit. However, there are a few that definitely
have to be changed, and so the designers of SCTP proposed a second version of RFC 2960,
called the RFC 2960 Bis. This second version basically only included the changes related
with the checksum (surprisingly proposing the use of the 16-bits Fletcher Checksum), but
was expected to evolve as RFC 2960 did itself, to include all the necessary changes.

40
If we have a group of codewords (in our case the ones formed by checksummed messages), the
Hamming Distance between two codewords is the number of bit positions in which they differ. The
Hamming Distance of the code is the minimum of such Hamming Distances among every possible pair of
codewords. If the Hamming Distance of a code is H, then at least H single bits errors are needed for a
corrupted frame to be accepted.
120

However, this is not the way of working in the IETF nowadays. Instead of writing a
second version of the RFC, all the proposed changes are compiled in a separate Internet
draft called Implementors Guide [Ste2002b]. This document is co-authored by the author
of this Master's Thesis.
One of the main changes is related with the restart process. In the normal scenario, the
crashed host sends again the INIT chunk, the receiver of such chunk identifies it as
belonging to an already established association and sends back an INIT ACK chunk
containing a State Cookie with the Tie-Tags set to the old Verification Tag values (instead
of setting them to 0 as usual). When, later on, the COOKIE ECHO chunk is received
containing those Tie-Tag values, the receiver of that chunk recognizes that the peer has
restarted, and then the association is reset. However, there is a big security problem with
this mechanism. If an attacker sends an INIT chunk using a fake source address, setting it
to a valid source address of one of the already established associations (including also its
own IP address so it can receive the INIT ACK chunk), once the whole procedure is
finished, the old association would have been unnecessarily restarted. If we add to this the
possibility of using the AddIP extension (see section 8.1.1) to delete the fake used address,
the association will have been completely hijacked.
Moreover, as the Tie-Tags are sent as plain text, it would be easy for an attacker to
guess their value (for example, sending INIT chunks to both peers and comparing the State
Cookies received). As the Tie-Tags are set to the Verification Tag values, the attacker
would be able to send us valid datagrams.
To avoid this, the Implementors Guide states that a restart attempt will not be accepted
if the INIT chunk contains any new IP address that was not part of the old association.
Also, a new error cause is defined to indicate this situation, so the crashed host can restart
the association with less addresses (and eventually tear it down to be able to reinitiate and
use the whole set of addresses, if that is necessary).
Another interesting change is related with the fast retransmit algorithm. The present
SCTP specifications state that once a TSN is reported as missing (i.e., the TSN is
unacknowledged while any subsequent TSN is acknowledged inside a Gap TSN ACK) in 4
consecutive SACK chunks, the TSN should be fast retransmitted. This causes two major
problems:

There is no limitation on how many times a TSN could be fast retransmitted. In
the normal case, and especially in a high bandwidthdelay network, at any given
time there will be several DATA and SACK chunks on flight. So, immediately
after issuing the fast retransmission the old SACK chunks (and the ones produced
by the DATA chunks on flight) will still be arriving and reporting that TSN as
missing, thus triggering another unnecessary fast retransmission of the same TSN.
If a TSN is reordered in the network and arrives to the receiver before a number n
of TSNs, it will trigger the sending of n SACK chunks containing missing TSNs.
So, if n is bigger than or equal to 4, there will be n 3 TSNs that will be
unnecessarily fast retransmitted because they were not lost but simply reordered.
Moreover, if the data receiver is waiting for a specific TSN to fill a gap in its TSN
sequence, that TSN will be delaying the delivery of all the subsequent TSNs
(assuming the data must be delivered in order). Once the gap is filled, the data
receiver will suddenly have a big amount of data to deliver to its upper user and it
will free its buffer. If the data sender is waiting for the receiver's buffer to empty,
the arrival of the SACK chunk acknowledging the receipt of the retransmitted
TSN with an updated Advertised Receiver Window Credit will suddenly allow the
121

data sender to send a big amount of data. This would produce an excessive burst
of traffic that could flood the network.

Avoiding the first problem is quite easy, and so the Implementors Guide simply allows
a TSN to be sent only once via the fast retransmit algorithm.
The second problem is a little bit harder to solve. The main problem is that once a
TSN has arrived out of order, no matter in which order the other TSNs sent arrive, all the
unacknowledged ones will be considered as missing. So, let us say that we sent DATA
chunks from TSN 1 to TSN 6, and the order of arrival is 1, 6, 2, 3, 4 and 5. When TSN 6
arrives, TSNs from 2 to 5 are reported as missing, which is right. But as per RFC 2960,
when TSN 2 arrives, TSNs 3, 4 and 5 are also reported as missing because there is a later
TSN that is acknowledged in a Gap Ack Block. So, no matter that TSNs from 2 to 5 arrive
in order, at the end TSN 5 will be fast retransmitted. This can be avoided in a simple and
neat fashion, which is considering as missing only those unacknowledged TSNs previous
to any of the oldest newly acknowledged
41
TSN in the received SACK chunk. This is what
the Implementors Guide proposes.
The third problem is not hard to avoid either. A new protocol parameter is introduced
called Max.Burst that limits the maximum size of a burst of traffic (and its recommended
value is 4).
Some other minor problems covered in the Implementors Guide are related with the
path heartbeat mechanism (when to start and finish, and how an unacknowledged
HEARTBEAT chunk should be treated), with the shutdown procedure (how long to wait
for the SHUTDOWN ACK chunk), and some editorial defects as well as clarifications of
implicit features of SCTP that have traditionally caused problems to people of the
distribution list.

41
A newly acknowledged TSN in an incoming SACK chunk is a TSN that has been acknowledged for
the first time in the received SACK chunk.
Conclusions
122

10. CONCLUSIONS

We have just seen the history of SCTP so far. At this point, three years and a half after
the first version of what by then was called MDTP, SCTP is still a practically unknown
protocol and we could say that it is still under development. There is no practical
application that uses it yet, because there is no commercial implementation of SCTP
available in the market. Despite all these problems, there is a deep feeling that it will
succeed.
People at SIGTRAN just needed a relatively simple protocol, with specific
requirements for signaling transport, but things got complicated. The authors of SCTP
could have simply designed that needed protocol, not including all those features that
people in the distribution list were constantly proposing. But at the end, even if at some
stages it seemed that SCTP would never be ready and that the delay in its publication as an
RFC would make that companies would eventually develop their own solutions, all the
care taken in its design and all that time spent were worthy. Now we have a new transport
protocol that not only offers the necessary support for signaling transport, but also could
compete with one of the giants in the Internet, TCP.
SCTP has several features that make it more suitable than TCP for common Internet
applications. One of the weak points of TCP is the famous SYN attack explained in section
4.2. That attack was hard to do with computers running Microsoft Windows operating
system (at least without being discovered), but the new 2000 and XP versions make things
easier for attackers. Now, a transport protocol such as SCTP that is immune to that attack
due to its cookie mechanism is needed more than ever, and this simple fact can speed up
the deployment of SCTP.
The use of streams makes SCTP particularly suitable to be used in HTTP servers.
Presently, when we download a web page, a TCP connection must be set up for every
graphic element it contains, as well as for sound or video. For small graphics that occupy
few Kbytes, the five datagrams that must be sent for establishing and tearing down a TCP
connection are a considerable overhead. Using SCTP the server could simply open as
many streams as needed for the transport of those pictures and send the information
regarding each one using an independent stream. Moreover, the congestion algorithms are
there to provide a means for equally sharing the resources of the Internet, and if a host has
many established connections with a server all of them are considered independent and are
given a portion of those resources. A practical effect of this is that if there are for example
11 users accessing an HTTP server, 10 of them downloading pages containing just text
(thus using a single TCP connection), and one of them asking for a page containing 9
pictures (which would require 10 TCP connections), this last user will consume as much of
the HTTP server bandwidth and processing time as the other 10 users altogether. Using
SCTP, the 11 users would receive the same portion of resources.
Before SCTP, there was no transport protocol in Internet able to take profit of
multihomed hosts. The use of several network cards is quite common nowadays,
especially for servers that have high traffic demand. With SCTP, a multihomed host not
only provides a way of ensuring that a data connection will not be closed in case any of its
cards stop working, but also gives the possibility of deviating the traffic from congested
paths. If the multihomed host has its cards connected to different networks, it can distribute
Conclusions
123

the data flow among them or change the one it is using as soon as it experiences
congestion. In TCP, once a connection has been established there is no option about which
card to use
42
as only one can be used.
SCTP is message-oriented as UDP is. TCP does not have any kind of message
concept, and what it transports is seen as a simple flow of bytes. This results in the fact that
applications must provide their own marks to separate different messages sent through a
TCP connection, or use UDP instead. But UDP is unreliable and does not offer many of the
features that TCP does, like congestion control. In SCTP the user messages are identified
by their SSN and that makes possible to identify specific portions of the whole data
transfer. Applied to our previous example of an HTTP server, the different parts of a web
page could be transferred as different messages that would make easier their identification
at the receiver side even if a single stream is used.
TCP relies on ICMP to inform about problems such as a server that is not listening in a
specific port or an unreachable peer. The problems reported by ICMP are always at the IP
level and TCP itself does not have any way to tell about problems at the transport level.
SCTP, however, has the possibility of using ERROR or ABORT chunks to notify the peer
of certain error conditions. Thus, an SCTP endpoint can tell the other for example that it is
out of resources or that the received cookie was stale. So, the other peer can act more
consistently than if it would simply notice that for some reason the association was not
established.
But SCTP not only has new features, it is highly inspired in TCP and that is a good
thing since TCP has proven to be a very robust protocol used during many years now. The
congestion control algorithms were directly taken from those of TCP, and some other
features that are optional in TCP became compulsory in any SCTP implementation.
Among them we could mention the use of selective acknowledgements, the ability to tell
about the receipt of duplicate TSNs, the support for ECN or the path heartbeat mechanism.
SCTP has much better extensibility capabilities than TCP. In TCP, the restricted space
that can be used to include options makes them virtually useless. The few bits that are
reserved in the TCP header are a very scarce good, and any new feature added to TCP that
had to make use of any of those reserved bits must be designed in a way that it uses as few
of them as possible. This usually complicates the design or may even make the whole
feature impossible to add. In SCTP adding a new feature is easy and the designers do not
have to be worried about the available space for extensions, they just define new chunks or
new parameters and include in them as much information as needed. The available quantity
of undefined chunks and parameters is big enough to ensure that we will not run out of
them in the future.
The quantity of applications that use TCP is huge and it would take a long time to
modify them to use SCTP instead. However, this is alleviated by the use of a very similar
socket interface to the TCP one that is being defined presently [Ste2002d]. For simple
applications that would make use of a single stream, the necessary changes in the code to
use SCTP instead of TCP are so minimal that basically one just has to manage the socket
in the old way, specifying at the moment of its aperture that SCTP should be used instead
of TCP or UDP. This would make things much easier and would facilitate the quick
deployment of SCTP.
Moreover, there are several available open source SCTP implementations that can be
downloaded from the Internet freely, so nobody who would like to use SCTP really has to

42
Note that there exists also the possibility of using another network card and still using the same
source and destination IP addresses. However, usually an IP address is assigned to each network card in a
way that IP datagrams sent through a specific network card will always have the same IP source address.
Conclusions
124

write his own implementation. The so-called reference implementation (written by the
creators of SCTP to help themselves finding errors in the specifications of MDTP and
SCTP) has been available since the times of MDTP and it is constantly updated. Even if
SCTP is not a simple protocol, there are some implementations that occupy less than 100
Kbytes, making SCTP suitable to run in small devices with memory limitations.
Some tests [Jun2000] showed that not only SCTP performance was not worse than
TCP, but the throughput achieved by SCTP was even better than that of TCP under some
circumstances. Moreover, SCTP and TCP implementations share resources equally (as
they have the same congestion avoidance algorithms). This behavior is highly desired to
facilitate a gradual conversion of applications to use SCTP instead of TCP, making easier
the co-existence of both protocols.
On the whole, SCTP has many advantages over TCP and very few drawbacks, and we
can expect that, apart from being used for signaling transport, SCTP will replace TCP in
the Internet in the future. However, that will not happen overnight. As an example we can
cite IPv6, whose design procedure was relatively similar to SCTP's one. IPv6 was chosen
among some other proposals about 10 years ago, it took some years to finish its design and
the specification was finally revised in 1998. Today, more than three years later, IPv6 is
not widely deployed yet, but it will in the future (otherwise Internet will collapse). SCTP is
in the phase of being revised and possibly within this year we will see another RFC
containing the new specification of SCTP.
We do not know how many years it will take, but very probably, in the future, the
TCP/IP architecture will be replaced by another similar architecture, SCTP/IPv6.

Appendix A: Contents of the CD-ROM
125

APPENDIX A: CONTENTS OF THE CD-ROM

At the end of this Master's Thesis you will find a CD-ROM. That CD-ROM includes
most of the documents cited in the Bibliography section that are available in electronic
format and some other files related with SCTP.
All the publicly available documents in the Internet appear in the /bibliography
folder. This includes all the documents available in the IETF pages (RFCs and Internet-
Drafts) as well as some other papers that can be freely distributed. Basically all the tittles
of the bibliography are in the CD-ROM except those that are books or magazines or those
published by the ITU-T, the Institute of Electrical and Electronic Engineers (IEEE) or
the Association for Computing Machinery (ACM), which are not free of charge. For these
documents, the link in the Bibliography section can be only accessed by those having the
needed subscription for those publications. All the documents included in the CD-ROM
appear in the Bibliography section with their reference name written in bold letters.
The name of the files in the /bibliography folder correspond with the name that
appears inside the square brackets in the Bibliography section. All those documents are
written in English and are saved either in .txt, .htm or .pdf format.
The CD-ROM also includes all the previous MDTP and SCTP versions. The folder
/mdtp contains the old MDTP Internet-Drafts (from mdtp-00.txt to mdtp-
08.txt), and /sctp includes all the previous releases of RFC 2960 and the RFC itself
(from sctp-00.txt to sctp-14.txt).
In the /extras folder there is a compilation of Internet-Drafts published by the IETF
that are related with SCTP and not included in the bibliography. These are the last releases
of the documents describing ways how applications not mentioned in the Master's Thesis
can use SCTP as their transport protocol, the Management Information Base (MIB) and
some other documents. They all are saved in the CD-ROM maintaining their original name
with which they were published in the IETF.
The /rfcs folder contains all the IETF RFCs from RFC 1 to RFC 3238 available in
the IETF pages. They all are in English and saved in .txt format.
Inside the /mail folder there is an extensive mail archive of both SIGTRAN and
TSVWG. They are included there as a .pst file, so they should be opened using
Microsoft Outlook. The archive includes all the messages sent to SIGTRAN from
November 1999 to January 2002, both months included, and the messages sent to the
TSVWG distribution list from February 2001 to January 2002. There are about 13,000
messages altogether.
The CD-ROM also includes four publicly available implementations of SCTP inside
the /implementations folder. One of them is the so-called Reference
Implementation written by Randall R. Stewart and Qiaobing Xie, the two primary
designers or SCTP. It is an user-space implementation that runs on Linux, FreeBSD,
NetBSD, Lynx O/S, Solaris and in general most UNIX-like systems that provide a classic
sockets API and a method for sending raw IP datagrams. The archives of release 4.0.5
were taken from the CD-ROM included in its newly published book regarding SCTP,
[Ste2001c], and are located in the /reference subfolder.
There is a kernel implementation for Linux based on the Reference Implementation,
the lksctp. It is a Source Forge project and has been developed by a team of programmers
Appendix A: Contents of the CD-ROM
126

from Motorola, Cisco, IBM and Intel. The release 2.4.1-0.3.2 appears in the CD-ROM
inside the /lksctp subfolder.
The CD-ROM includes another public implementation published under the GNU
public license called sctplib, a cooperative work of the University of Essen and Siemens
AG, Munich. It is an user-space implementation programmed by Andreas Jungmaier,
Herbert Hlzlwimmer, Achim Weber and Michael Txen that runs under Linux, FreeBSD,
Solaris and Mac OS X. The release 1.0.0-pre14 is included in the /sctplib subfolder.
The last public implementation of SCTP appears in the /strsctp subfolder. It is the
release 0.7.6 of the kernel implementation for Linux called STREAMS SCTP, which has
been developed by OpenSS7.
The specific functionality provided by each of those implementations appears in the
README files included in each subfolder. They are not completely compliant with the
SCTP specifications and some are beta versions that might include bugs. New releases
appear every now and then (see Appendix B).
There is a nice network protocol analyzer that supports SCTP. Its name is Ethereal
and it is included in the /ethereal folder of the CD-ROM. Ethereal is a free network
protocol analyzer for Windows, Unix and Unix-like operating systems. It allows you to
examine data from a live network or from a capture file on disk. You can interactively
browse the capture data, viewing summary and detail information for each packet. Ethereal
has several powerful features, including a rich display filter language. The CD-ROM
includes the Windows version of Ethereal 0.8.20 (including the necessary WinPcap packet
capture driver that must be installed before Ethereal can be used) ready to be installed,
inside the /windows subfolder. But the /ethereal folder contains another subfolder,
the /others one. This folder includes the source code of Ethereal for Linux, Solaris,
FreeBSD, Sequent PTX v4.4.5, Tru64 UNIX (formerly Digital UNIX), Irix, AIX and
Windows as well. In order to work properly, Ethereal needs GTK+ and Glib (a graphical
user interface library) and libpcap (a packet capture and filtering library), all included in
the same subfolder. Perl is also required to build the documentation, and the zlib library
allows Ethereal to read gzip-compressed files on the fly. They both are also included in the
CD-ROM.
SCTP support was added to tcpdump by Jerry Heinz (Temple University), John Fiore
(University of Pennsylvania), and Armando Caro (University of Delaware). It is available
starting in versions 3.7. Tcpdump is the de facto standard for packet sniffing tools, which is
published under the BSD software license. It comes pre-packaged with most major
UNIX/Linux distributions. Its source code, together with the libpcap and tcpslice libraries
is included in the /tcpdump folder of the CD-ROM.
This CD-ROM also contains an SCTP module as a patch for NS-2 (release 2.1b8),
published under the BSD software license. This module was developed by Armando Caro
and Janardhan Iyengar of the University of Delaware. NS-2 is a discrete event simulator
and it is the most commonly used network simulator today in the research community. The
NS-2 simulator itself and the patch has been included in the /ns-2 folder, as well as a
manual for the NS-2 simulator. This patch currently supports most of the features in
section 6 and 7 of the SCTP specifications.
Finally, in the /thesis folder we can find this document in electronic format, both
in .ps and .pdf formats.

Appendix B: Other sources of information about SCTP
127

APPENDIX B: OTHER SOURCES OF INFORMATION ABOUT SCTP

SCTP is still a very young protocol, however, there are already quite many sources of
information about it, mostly in the Internet. The only book published so far about SCTP is
[Ste2001c], on the shelves since November 2001. It is written by the two primary designers
of SCTP, Randall R. Stewart and Qiaobing Xie, and it is definitely worth reading. It should
be seen as a companion to the SCTP specification, including lots of examples that help
understanding the difficult parts of the SCTP specification.
Possibly the most complete web page about SCTP can be found at
http://www.sctp.de/. There, we can not only find several links to many other SCTP
resources in the Internet related to SCTP, including RFCs and Internet-Drafts, but also the
last releases of the sctplib implementation of SCTP.
The last version of the Reference Implementation is located in http://www.sctp.org/.
There you can fin information about SCTP extensions as well.
In http://sourceforge.net/projects/lksctp/ you will find the last version of the SCTP
kernel implementation lksctp.
Another web page, http://www.openss7.org/, contains the last versions of the strsctp
implementation of SCTP.
In http://playground.sun.com/sctp/ there is one kernel implementation of SCTP
publicly available for Solaris
TM
Operating Environment. However, due to due to U.S.
export laws it cannot be downloaded from several countries.
In http://www.ethereal.com/ you can find all the information about Ethereal and all the
downloads for the different platform versions.
The NS-2 page is located in http://www.isi.edu/nsnam/ns/. The last releases can be
obtained there.
The web page http://www.watersprings.org/ contains an impressive extensive
collection of IETF Internet-Drafts (both expired and up-to-date ones) and RFCs. The
official IETF page of the TSVWG is http://www.ietf.org/html.charters/tsvwg-
charter.html, and for the SIGTRAN working group is
http://www.ietf.org/html.charters/sigtran-charter.html. They contain most of the RFCs
and present Internet-Drafts related with SCTP. All the RFCs can be accessed from the
IETF web page, http://www.ietf.org.
If you want to dive into the mail archives of both the SIGTRAN and TSVWG to
discover by yourself the reasons beneath some design decisions, you can go either to
ftp://ftp.ietf.org/ietf-mail-archive/sigtran/ (here you can find only the archive since
March 2001) or to http://www17.nortelnetworks.com/archives/sigtran.html for the
SIGTRAN archives, or to ftp://ftp.ietf.org/ietf-mail-archive/tsvwg/ for the TSVWG
archives. To participate in those mail lists you can send your messages to
sigtran@standards.nortelnetworks.com or tsvwg@ietf.org respectively for SIGTRAN or
TSVWG. The instructions about how to subscribe appear in the official pages of those
IETF working groups, http://www.ietf.org/html.charters/sigtran-charter.html for the
SIGTRAN working group and http://www.ietf.org/html.charters/tsvwg-charter.html for
TSVWG.

Bibliography
128

BIBLIOGRAPHY

[Ahm2001] AHMED, H., and BOFFA, S.: SCTP Dynamic Checksum Selection,
Internet-Draft, August 2001. Work in progress.
http://www.watersprings.org/pub/id/draft-ahmed-tsvwg-sctpdsum-00.txt

[All1999] ALLMAN, M., PAXSON, V., and STEVENS, W. R.: TCP Congestion
Control, RFC 2581, April 1999.
http://www.ietf.org/rfc/rfc2581.txt

[Alm1992] ALMQUIST, P.: Type of Service in the Internet Protocol Suite, RFC 1349,
July 1992.

[Ari2001] ARIAS-RODRGUEZ, I., STEWART, R. R., and ALLMAN, M.: SCTP
Adaptive Fast Retransmit, Internet-Draft, June 2001. Work in progress.

[Bel1996] BELLOWIN, S. M.: Defending Against Sequence Number Attacks, RFC
1948, May 1996.

[Bel2001] BELLOWIN, S. M., IOANNIDIS, J., KEROMYTIS, A. D., and
STEWART, R. R.: On the use of SCTP with IPsec, Internet-Draft, October
2001. Work in progress.
http://www.watersprings.org/pub/id/draft-ietf-ipsec-sctp-02.txt

[Ben1999] BENNETT, J. C. R., PARTRIDGE, C., and SHECTMAN, N.: Packet
Reordering is not Pathological Network Behavior, IEEE Transactions on
Networking, Vol. 7, Issue 6, December 1999.
http://ieeexplore.ieee.org/iel5/90/17613/00811445.pdf

[Ber1994] BERNERS-LEE, T.: Universal Resource Identifiers in WWW: A Unifying
Syntax for the Expression of Names and Addresses of Objects on the Network
as used in the World-Wide Web, RFC 1630, June 1994.

[Ber1996] BERNERS-LEE, T., FIELDING R. T., and FRYSTYK, H.: Hypertext
Transfer Protocol -- HTTP/1.0, RFC 1945, May 1996.

[Bla1998] BLAKE, S., BLACK, D. L, CARLSON, M. A., DAVIES, E., WANG, Z.,
and WEISS, W.: An Architecture for Differentiated Services, RFC 2475,
December 1998.
Bibliography
129

[Bla2001a] BLANTON, E., and ALLMAN, M.: Using TCP DSACKs and SCTP
Duplicate TSNs to Detect Spurious Retransmissions, Internet-Draft, August
http://www.watersprings.org/pub/id/draft-blanton-dsack-use-01.txt

[Bla2001b] BLANTON, E., and ALLMAN, M.: Adjusting the Duplicate ACK
Threshold to Avoid Spurious Retransmits, Internet-Draft, July 2001. Work in
progress.
http://www.watersprings.org/pub/id/draft-blanton-dupack-thresh-adjust-
00.txt

[Bov1999] BOVA, T., and KRIVORUCHKA, T.: Reliable UDP Protocol, Internet-
Draft, expired August 1999.
http://www.watersprings.org/pub/id/draft-ietf-sigtran-reliable-udp-00.txt

[Bra1989] BRADEN, R. (editor): Requirements for Internet Hosts -- Communication
Layers, RFC 1122, October 1989.

[Bra1997] BRADEN, R. (editor), ZHANG, L., BERSON, S., HERZOG, B., and
JAMIN, S.: Resource Reservation Protocol (RSVP) -- Version 1 Functional
Specification, RFC 2205, September 1997.

[Car2000] CARPENTER, B. E.: Internet Transparency, RFC 2775, February 2000.

[Cas1990] CASE, J., FEDOR, M., SCHOFFSTALL, M., and DAVIN, J.: A Simple
Network Management Protocol, RFC 1157, May 1990.

[Cas1993] CASTAGNOLI, G., BRUER, S., and HERRMANN, M.: Optimization of
Cyclic Redundancy-Check Codes with 24 and 32 Parity Bits, IEEE
Transactions on Communications, Vol. 41, Issue 6, June 1993.
http://ieeexplore.ieee.org/iel1/26/5993/00231911.pdf

[Cav2001] CAVANNA, V., and WAKELEY, M.: iSCSI Digest, CRC or Checksum?,
Internet-Draft, expired September 2001.
http://www.watersprings.org/pub/id/draft-cavanna-iscsi-crc-vs-cksum-01.txt

[Cla1982] CLARK, D. D.: IP Datagram Reassembly Algorithms, RFC 815, July 1982.

[CER1995] CERT: IP Spoofing Attacks and Hijacked Terminal Connections, CERT
Advisory CA-1995-01, January 1995.
http://www.cert.org/advisories/CA-1995-01.html

Bibliography
130

[CER1996] CERT: TCP SYN Flooding and IP Spoofing Attacks, CERT Advisory CA-
1996-21, September 1996.
http://www.cert.org/advisories/CA-1996-21.html

[Cha1998] CHANDRANMENON, G. P., and VARGHESE, G.: Reconsidering
Fragmentation and Reassembly, Proceedings of the 17
th
annual ACM
Symposium of Principles of Distributed Computing, pages 21-29, July 1998.
http://www.acm.org/pubs/articles/proceedings/podc/277697/p21-
chandranmenon/p21-chandranmenon.pdf

[Con1998] CONTA, A., and DEERING, S. E.: Internet Control Message Protocol
(ICMPv6) for the Internet Protocol Version 6 (IPv6) Specification, RFC
2463, December 1998.

[Coe2001] COENE, L., TEXEN, M., VERWIMP, G., LOUGHNEY, J.,
STEWART, R. R., XIE, Q., HOLDREGE, M., BELINCHN, M. C.,
JUNGMAIER, A., and ONG, L.: Multihoming issues in the Stream Control
Transmission Protocol, Internet-Draft, November 2001.
http://www.watersprings.org/pub/id/draft-coene-sctp-multihome-01.txt

[Dee1998] DEERING, S. E., and HINDEN, R. M.: Internet Protocol, Version 6 (IPv6)
Specification, RFC 2460, December 1998.

[Deu1996] DEUTSCH, L. P., and GAILLY, J. L.: ZLIB Compressed Data Format
Specification version 3.3, RFC 1950, May 1996.

[Die1999] DIERKS, T., and ALLEN, C.: The TLS Protocol Version 1.0, RFC 2246,
January 1999.

[Dob1996] DOBBERTIN, H.: The Status of MD5 After a Recent Attack, RSA
Laboratories' CryptoBytes, Volume 2, Number 2, Summer 1996.
ftp://ftp.rsasecurity.com/pub/cryptobytes/crypto2n2.pdf

[Dur2000] DURHAM, D. (editor), BOYLE, J., COHEN, R., HERZOG, S., RAJAN
R., and SASTRY, A.: The COPS (Common Open Policy Service) Protocol,
RFC 2748, January 2000.

[Eas1994] EASTLAKE, D. E., CROCKER, S. D., and SCHILLER, J. I.:
Randomness Recommendations for Security, RFC 1750, December 1994.

[Fai2001] FAIRLIE-CUNINGHAME, R.: Guidelines for specifying SCTP-based
media transport using SDP, Internet-Draft, May 2001. Work in progress.
Bibliography
131

http://www.watersprings.org/pub/id/draft-fairlie-mmusic-sdp-sctp-00.txt

[Flo2000] FLOYD, S., MAHDAVI, J., MATHIS, M., and PODOLSKY, M.: An
Extension to the Selective Acknowledgement (SACK) Option for TCP, RFC
2883, July 2000.

[Fox1989] FOX, R.: TCP Big Window and Nak Options, RFC 1106, June 1989.

[Geo2001] GEORGE, T., DANTU, R., KALLA, M., SCHARZBAUER, H. J.,
SIDEBOTTON, G., and MORNEAULT, K.: SS7 MTP2-User Peer-to-Peer
Adaptation Layer, Internet-Draft, July 2001. Work in progress.
http://www.watersprings.org/pub/id/draft-ietf-sigtran-m2pa-03.txt

[Gib2001] GIBSON, S.: The Strange Tale of the Denial of Service Attack against
GRC.COM, June 2001.
http://media.grc.com/files/grcdos.pdf

[GSM2001] GSM World: Member Statistics, December 2001.
http://www.gsmworld.com/membership/mem_stats.html

[Han1998] HANDLEY, M., and JACOBSON, V.: SDP: Session Description Protocol,
RFC 2327, April 1998.

[Han1999] HANDLEY, M., SCHULZRINNE, H., SCHOOLER, E., and
ROSEMBERG, J.: SIP: Session Initiation Protocol, RFC 2543, March
1999.

[Han2000] HANDLEY, M., PERKINS, C., and WHELAN, E.: Session Announcement
Protocol, RFC 2974, October 2000.

[Har1998] HARKINS, D., and CARREL, D.: The Internet Key Exchange (IKE), RFC
2409, November 1998.

[Hin1998] HINDEN, R. M., and DEERING, S. E.: IP Version 6 Addressing
Architecture, RFC 2373, July 1998.

[Hui1998] HUITEMA, C.: IPv6: The new Internet Protocol, Second Edition, Prentice-
Hall International, 1998.

[ITU1996] ITU-T: Error-correcting procedures for DCEs using asynchronous-to-
synchronous conversion, Recommendation V.42, October 1996.
Bibliography
132

http://www.itu.int/rec/recommendation.asp?type=items&lang=e&parent=T-REC-
V.42-199610-I

[ITU1994] ITU-T: B-ISDN ATM adaptation layer - Service specific connection oriented
protocol (SSCOP), Recommendation V.42, October 1996.
Q.2110-199407-I

[ITU1999] ITU-T: Packet-based multimedia communications systems. Annex E:
Framework and wire-protocol for multiplexed call signaling transport,
Recommendation H.323 Annex E, May 1999.
H.323-199905-S!AnnE

[ITU2000] ITU-T: Packet-based multimedia communications systems, Recommendation
H.323, November 2000.
H.323-200011-P

[Jac1988] JACOBSON, V.: Congestion Avoidance and Control, Computer
Communication Review, Vol. 18, No. 4, pages 314-329, August 1988.
http://www.acm.org/pubs/articles/proceedings/comm/52324/p314-
jacobson/p314-jacobson.pdf

[Jac1990] JACOBSON, V.: Compressing TCP/IP Headers for Low-Speed Serial Links,
RFC 1144, February 1990.

[Jac1992] JACOBSON, V., BRADEN, R., and BORMAN, D.: TCP Extensions for
High Performance, RFC 1323, May 1992.

[Jun2000] JUNGMAIER, A., SCHOOP, M., and TXEN, M.: Performance
Evaluation of the Simple Control Transmission Protocol (SCTP), Proceedings
of the IEEE Conference on High Performance Switching and Routing, June
2000.
http://tdrwww.exp-math.uni-essen.de/pages/forschung/atm2000.pdf

[Jun2001] JUNGMAIER, A., RESCORLA, E., and TEXEN, M.: TLS over SCTP,
Internet-Draft, November 2001. Work in progress.
http://www.watersprings.org/pub/id/draft-ietf-tsvwg-tls-over-sctp-00.txt

[Kaa2001] KAARANEN, H., AHTIAINEN, A., LAITINEN, L., NAGHIAN, S., and
NIEMI, V.: UMTS Networks. Architecture, Mobility and Services, First
Edition, John Wiley & Sons, 2001.

[Kar1999] KARN, P., and SIMPSON, W. A.: Photuris: Session-Key Management
Protocol, RFC 2522, March 1998.
Bibliography
133


[Ken1998a] KENT, S., and ATKINSON, R.: Security Architecture for the Internet
Protocol, RFC 2401, November 1998.

[Ken1998b] KENT, S., and ATKINSON, R.: IP Authentication Header, RFC 2402,
November 1998.

[Ken1998c] KENT, S., and ATKINSON, R.: IP Encapsulating Security Payload, RFC
2406, November 1998.

[Kes1998] KESSLER, G. C., and SOUTHWICK, P. V.: ISDN, Signature Edition,
McGraw-Hill, 1998.

[Kle2001] KLENSIN, J. (editor): Simple Mail Transfer Protocol, RFC 2821, April
2001.

[Kra1997] KRAWCZYK, H., BELLARE, M., and CANETTI, R.: HMAC: Keyed-
Hashing for Message Authentication, RFC 2104, February 1997.

[Lou2002] LOUGHNEY, J., SIDEBOTTON, G., MOUSSEAU, G., LORUSSO, S.,
COENE, L., VERWIMP, G., KELLER, J., ESCOBAR, F., SULLY, W.,
FURNISS, S., and BIDULOCK, B.: SS7 SCCP-User Adaptation Layer
(SUA), Internet-Draft, January 2002. Work in progress.
http://www.watersprings.org/pub/id/draft-ietf-sigtran-sua-11.txt

[Ma1998] MA, G.: T/UDP: UDP for TCAP, Internet-Draft, expired May 1999.
http://www.watersprings.org/pub/id/draft-ma-tudp-00.txt

[Mad1998a] MADSON, C. (editor), and GLENN, R. (editor): The use of HMAC-MD5-
96 within ESP and AH, RFC 2403, November 1998.

[Mad1998b] MADSON, C. (editor), and GLENN, R. (editor): The use of HMAC-SHA-
1-96 within ESP and AH, RFC 2404, November 1998.

[Mat1996] MATHIS, M., MAHDAVI, J., FLOYD, S., and ROMANOW, A.: TCP
Selective Acknowledgement Options, RFC 2018, October 1996.

[McC1996] McCANN, J., DEERING, S. E., and MOGUL, J.: Path MTU Discovery for
IP Version 6, RFC 1981, August 1996.
Bibliography
134


[Moc1987] MOCKAPETRIS, P.: Domain Names Concepts and Facilities, RFC 1034,
November 1987.

[Mod1992] MODARRESSI, A. R., and SKOOG, R. A.: An Overview of Signaling
System No. 7, Proceedings of the IEEE, Vol. 80, No. 4, April 1992.
http://ieeexplore.ieee.org/iel1/5/3687/00135382.pdf?isNumber=3687

[Mog1990] MOGUL, J., DEERING, S.: Path MTU Discovery, RFC 1191, November
1990.

[Moo1965] MOORE, G. E.: Cramming More Components onto Integrated Circuits,
Electronics, Volume 38, Number 8, April 1965.
http://www.intel.com/research/silicon/moorespaper.pdf

[Mor2001] MORNEAULT, K., RENGASAMI, S., KALLA, M., and SIDEBOTTON,
G.: ISDN Q.921-User Adaptation Layer, RFC 3057, February 2001.

[Mor2002] MORNEAULT, K., DANTU, R., SIDEBOTTON, G., GEORGE, T.,
BIDULOCK, B., and HEITZ, J.: SS7 MTP2-User Adaptation Layer,
Internet-Draft, January 2002. Work in progress.
http://www.watersprings.org/pub/id/draft-ietf-sigtran-m2ua-13.txt

[Moy1998] MOY, J.: OSPF Version 2, RFC 2328, April 1998.

[NBS1995] NATIONAL BUREAU OF STANDARDS: Secure Hash Standard, Federal
Information Processing Standards Publication 180-1, April 1995.
http://csrc.nist.gov/publications/fips/fips180-1/fip180-1.txt

[Nic1998] NICHOLS, K., BLAKE, S., BAKER, F., and BLACK, D.: Definition of
the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers,
RFC 2474, December 1998.

[Nua2001] Nua: How Many Online?, Nua Internet Surveys, 2001.
http://www.nua.ie/surveys/how_many_online/index.html

[Ong1999] ONG, L., RYTINA, I., HOLDREGE, M., LODE, C., GARCA, M. A.,
SHARP, C., JUHASZ, I., LIN, H. P., and SCHWARZBAUER, H. J.:
Framework Architecture for Signaling Transport, RFC 2719, October 1999.

Bibliography
135

[Oti2001a] OTIS, D.: RE: [TSVWG] SCTP and Checksums, email sent to the TSVWG
distribution list, 14
th
May 2001.
ftp://ftp.ietf.org/ietf-mail-archive/tsvwg/2001-05.mail

[Oti2001b] OTIS, D.: Integrity-Authentication Digest for SCTP, Internet-Draft, June
http://www.watersprings.org/pub/id/draft-otis-sctp-digest-02.txt

[Pat1995] PARTRIDGE, C., HUGHES, J., and STONE, J.: Performance of
Checksums and CRCs over Real Data, Proceedings of SIGCOMM '95
Conference, ACM, pages 68-76, August 1995.
http://www.acm.org/pubs/articles/proceedings/comm/217382/p68-
partridge/p68-partridge.pdf

[Pax1997] PAXSON, V.: End-to-End Internet Packet Dynamics, Proceedings of
SIGCOMM '97 Conference, ACM, pages 139-152, September 1997.
http://www.acm.org/pubs/articles/proceedings/comm/263105/p139-paxson/p139-
paxson.pdf

[Pax2000] PAXSON, V., and ALLMAN, M.: Computing TCP's Retransmission Timer,
RFC 2988, November 2000.

[Pos1980] POSTEL, J. (editor): User Datagram Protocol, RFC 768, August 1980.

[Pos1981a] POSTEL, J. (editor): Internet Protocol, RFC 791, September 1981.

[Pos1981b] POSTEL, J. (editor): Internet Control Message Protocol, RFC 792,
September 1981.

[Pos1981c] POSTEL, J. (editor): Transmission Control Protocol, RFC 793, September
1981.

[Pos1983] POSTEL, J. , and REYNOLDS, J. K.: Telnet Protocol Specification, RFC
854, May 1983.

[Pos1985] POSTEL, J. , and REYNOLDS, J. K.: File Transfer Protocol (FTP), RFC
959, October 1985.

[Pri2001] PRICE, R., HANCOCK, R., McCANN, S., WEST, M. A., SURTEES, A.,
OLLIS, P., ZHANG, Q., LIAO, H., ZHU, W., and ZHANG, Y.,: TCP/IP
Compression for ROHC, Internet-Draft, November 2001. Work in progress.
Bibliography
136

http://www.watersprings.org/pub/id/draft-ietf-rohc-tcp-epic-02.txt

[PT2000] PERFORMANCE TECHNOLOGIES: SS7 Tutorial, 2000.
http://www.pt.com/tutorials/ss7_tutorial_05_07_01.pdf

[Ram2001] RAMAKRISHNAN, K. K., FLOYD, S., and BLACK, D. L.: The Addition
of Explicit Congestion Notification (ECN) to IP, RFC 3168, September 2001.

[Rij1994] RIJSINGHANI, A. (editor): Computation of the Internet Checksum via
Incremental Update, RFC 1624, May 1994.

[Riv1992] RIVEST, R. L.: The MD5 Message-Digest Algorithm, RFC 1321, April
1992.

[Ros2001a] ROSEN, E., VISWANATHAN, A., and CALLON, R.: Multiprotocol
Label Switching Architecture, RFC 3031, January 2001.

[Ros2001b] ROSENBERG, J., SCHULZRINNE, H., and CAMARILLO, G.: SCTP as
a transport for SIP, Internet-Draft, November 2001. Work in progress.
http://www.watersprings.org/pub/id/draft-ietf-sip-sctp-01.txt

[Rus1998] RUSSELL, T.: Signaling System #7, Second Edition, McGraw-Hill, 1998.

[Sn1998] SNCHEZ, D.: Connectionless SCCP over IP Adaptation Layer (CSIP),
Internet-Draft, expired May 1999.
http://www.watersprings.org/pub/id/draft-sanchez-CSIP-v0r0-00.txt

[Sn1999] SNCHEZ, D.: A Simple SCCP Tunneling Protocol (SSTP), Internet-Draft,
expired July 1999.
http://www.watersprings.org/pub/id/draft-sanchez-garcia-SSTP-v1r0-00.txt

[Sch1996] SCHULZRINNE, H., CASNER, S., FREDERICK, R., and JACOBSON,
V.: RTP: A Transport Protocol for Real-Time Applications, RFC 1889,
January 1996.

[Sch1998] SCHULZRINNE, H., RAO, A., and LANPHIER, R.: Real Time Streaming
Protocol (RTSP), RFC 2326, April 1998.

[She2000] SHEPLER, S., CALLAGHAN, B., ROBINSON, D., THURLOW, R.,
BEAME, C., EISLER, M., and NOVECK, D.: NFS Version 4 Protocol,
RFC 3010, December 2000.
Bibliography
137

[She2001] SHEINWALD, D., SATRAN, J., THALER, P., CAVANNA, V., and
WAKELEY, M.: iSCSI CRC/Checksum Considerations, Internet-Draft, May
http://www.watersprings.org/pub/id/draft-sheinwald-iscsi-crc-00.txt

[Sid2002] SIDEBOTTON, G., PASTOR-BALBAS, J., RYTINA, I., MOUSSEAU,
G., ONG, L., SCHWARZBAUER, H. J., GRADISCHNIG, K.,
MORNEAULT, K., KALLA, M., GLAUDE, N., BIDULOCK, B., and
LOUGHNEY, J.: SS7 MTP3-User Adaptation Layer (M3UA), Internet-
Draft, January 2002. Work in progress.
http://www.watersprings.org/pub/id/draft-ietf-sigtran-m3ua-11.txt

[Sol1992] SOLLINS, K. R.: The TFTP Protocol (Revision 2), RFC 1350, July 1992.

[Sri1999] SRISURESH, P., and HOLDREGE, M.: IP Network Address Translator
(NAT) Terminology and Considerations, RFC 2663, August 1999.

[Sri2001] SRISURESH, P., and EGEVANG, K. B.: Traditional IP Network Address
Translator (Traditional NAT), RFC 3022, January 2001.

[Sta1995] STALLINGS, W.: ISDN and Broadband ISDN with Frame Relay and ATM,
Third Edition, Prentice-Hall International, 1995.

[Ste1994] STEVENS, W. R.: TCP/IP Illustrated, Volume 1, First Edition, Addison-
Wesley Professional Computing Series, 1994.

[Ste1998] STEWART, R. R., and XIE, Q.: Multi-Network Datagram Transmission
Protocol, Internet-Draft, expired January 1999.
http://www.watersprings.org/pub/id/draft-stewart-xie-mdtp-00.txt

[Ste2000] STEWART, R. R., XIE, Q., MORNEAULT, K., SHARP, C.,
SCHWARZBAUER, H. J., TAYLOR, T., RYTINA, I., KALLA, M.,
ZHANG, L., and PAXSON, V.: Stream Control Transmission Protocol,
RFC 2960, October 2000.

[Ste2001a] STEWART, R. R.: [TSVWG] SCTP and Checksums, email sent to the
TSVWG distribution list, 4
th
May 2001.

[Ste2001b] STEWART, R. R.: Re: [TSVWG] sctp error check, again, email sent to the
TSVWG distribution list, 5
th
June 2001.

Bibliography
138

[Ste2001c] STEWART, R. R., and XIE, Q.: Stream Control Transmission Protocol
(SCTP), A Reference Guide, First Edition, Addison-Wesley, 2001.

[Ste2002a] STEWART, R. R., STONE, J., and OTIS, D.: SCTP Checksum Change,
Internet-Draft, January 2002. Work in progress.
http://www.watersprings.org/pub/id/draft-ietf-tsvwg-sctpcsum-02.txt

[Ste2002b] STEWART, R. R, ONG, L., ARIAS-RODRGUEZ, I., and POON, K.:
SCTP Implementors Guide, Internet-Draft, January 200. Work in progress.
http://www.watersprings.org/pub/id/draft-ietf-tsvwg-sctpimpguide-03.txt

[Ste2002c] STEWART, R. R., RAMALHO, M. A., XIE, Q., TUEXEN, M.,
RYTINA, I., and CONRAD, P.: SCTP Extensions for Dynamic
Reconfiguration of IP Addresses, Internet-Draft, January 2002. Work in
progress.
http://www.watersprings.org/pub/id/draft-ietf-tsvwg-addip-sctp-04.txt

[Ste2002d] STEWART, R. R., XIE, Q., YARROLL, L., WOOD, J., POON, K., and
FUJITA, K.: Sockets API Extensions for SCTP, Internet-Draft, January 2002.
Work in progress.
http://www.watersprings.org/pub/id/draft-ietf-tsvwg-sctpsocket-03.txt

[Tan1996] TANENBAUM, A. S.: Computer Networks, Third Edition, Prentice-Hall
International, 1996.

[Tho1998] THOMSON, S., and NARTEN, T.: IPv6 Stateless Address
Autoconfiguration, RFC 2462, December 1998.

[Ton1999] TONEY, K.: PURDET. Reliable Transport Extensions on UDP, Internet-
Draft, expired September 1999.
http://www.watersprings.org/pub/id/draft-toney-purdet-00.txt

[Tou1996] TOUCH, J., and PARHAM, B.: Implementing the Internet Checksum in
Hardware, RFC 1936, April 1996.

[Vh2000] VH-SIPIL, A.: URLs for Telephone Calls, RFC 2806, April 2000.

[W3C1999] WORLD WIDE WEB CONSORTIUM: HTML 4.01 Specification, W3C
Recommendation, December 1999.
http://www.w3.org/TR/html401/html40.pdf.gz

[Wil1993] WILLIAMS, R. N.: A Painless Guide to CRC Error Detection Algorithms,
Third Version, August 1993.
ftp://ftp.rocksoft.com/papers/crc_v3.txt

Bibliography
139

[Xie2001a] XIE, Q., STEWART, R. R., SHARP, C., and RYTINA, I.: SCTP
Unreliable Data Mode Extension, Internet-Draft, expired October 2001.
http://www.watersprings.org/pub/id/draft-ietf-tsvwg-usctp-00.txt

[Xie2001b] XIE, Q.: [TSVWG] Not proceeding with U-SCTP, email sent to the TSVWG
distribution list, 11
th
September 2001.

[Yav2000] YAVATKAR, R., PENDARAKIS, D., and GUERIN, R.: A Framework for
Policy-based Admission Control, RFC 2753, January 2000.

[Yuv1979] YUVAL, G.: How to Swindle Rabin, Cryptologia Magazine, Vol. 3, pages
187-190, July 1979.

[Zwe1990] ZWEIG, J., and PARTRIDGE, C.: TCP Alternate Checksum Options, RFC
1146, March 1990.

Index
140

INDEX

A
Adaptive fast retransmit algorithm, 1078
Adding and deleting addresses, 1035
Adler-32 Checksum, 116
Advertised receiver window credit, 56
Application Service Element, 17
ASE (see Application Service Element)
Associated signaling mode, 7
Automatic callback, 6
B
Birthday attack, 64
BISUP (see Broadband ISDN Used Part)
Blind attack, 56
Broadband ISDN User Part, 14
Bundling, 67
Burst error, 114
C
CCS (see Common Channel Signaling)
Chunks, 39, 4346
ABORT chunk, 44
CANCEL chunk, 87, 106
CHECKSUM chunk, 119
Chunk Flags, 46
Chunk Length, 46
Chunk Type, 4546
COOKIE ACK chunk, 43, 66
COOKIE ECHO chunk, 43, 65
CWR chunk, 44
DATA chunk, 43, 6788
ECNE chunk, 44
ERROR chunk, 44, 9294, 9294
Fixed Fields, 46
HEARTBEAT ACK chunk, 44, 8991
HEARTBEAT chunk, 44, 8991
IETF-defined chunk extensions, 46
INIT ACK chunk, 43, 5565
INIT chunk, 43, 5565
SACK chunk, 43, 6788
SHUTDOWN ACK chunk, 44
SHUTDOWN chunk, 44
SHUTDOWN COMPLETE chunk, 44, 100101
Vendor-specific chunks, 45
Circuit-switched network, 7
Common Channel Signaling, 4, 5
Common Open Policy Service, 27
Common Transport Protocol, 32
Congestion avoidance algorithm, 73
Connectionless SCCP over IP Adaptation Layer, 34
Cookie, 6365
Cookie mechanism, 50, 5466, 122
COPS (see Common Open Policy Service)
CRC (see Cyclic Redundance Check)
CRC-16, 43, 112
CRC-32, 117
CRC-CCITT, 114
CSIP (see Connectionless SCCP over IP Adaptation
Layer)
CTP (see Common Transport Protocol)
Cumulative TSN Ack, 68
D
Data User Part, 14
DC signaling, 4
Delayed ACK Algorithm, 70
Delayed SACKs, 85
Denial of service, 53
Differentiated Services, 28
DiffServ (see Differentiated Services)
Digital signaling, 5
DUP (see Data User Part)
Duplicate TSNs, 68, 107
E
Error causes, 47, 9394, 97
F
Fast retransmit algorithm, 74, 1078, 120
Fletcher Checksum, 115
Fletcher-Adler Checksum, 117
Fragmentation, 79
G
Generator polynomial, 113
H
H.323, 28
H.323 Annex E, 34
Half-closed connection, 98
Half-open connection, 50, 53
Hamming distance, 119
Head-of-line blocking, 33, 58, 7778
Heartbeat interval, 91
HMAC (see Keyed-Hashing algorithm for Message
Authentication)
HOL (see Head-of-line)
I
IANA (see Internet Assigned Numbers Authority)
Idle address, 90
Implementors guide, 11921
In-band signaling, 4, 5
Initiate Tag, 55
Internet Assigned Numbers Authority, 42
Internet Checksum, 112
Internet Protocol
ARPANet, 18
Header, 22
History, 1821
HTTP (see Hypertext Transfer Protocol)
Hypertext Transfer Protocol, 19
Index
141

IP spoofing, 53
NSFNet, 18
SCTP over IP, 35
Voice over IP, 2531
VoIP (see Voice over IP)
World Wide Web, 19
WWW (see World Wide Web)
Internet telephony, 2531
Interoperability session, 37, 105, 111
IP (see Internet Protocol)
ISDN Q.921-User Adaptation Layer, 110
ISDN User Part, 17
ISUP (see ISDN User Part)
K
Karn's algorithm, 85
Keepalive mechanism, 89
Keyed-Hashing algorithm for Message Authentication,
63
L
LAPD (see Link Access Procedures on the D-channel)
Link Access Procedures on the D-channel, 110
LNP (see Local Number Portability)
Local Number Portability, 6
M
M2PA (see MTP2-User Peer-to-Peer Adaptation
Layer)
M2UA (see MTP2-User Adaptation Layer)
M3UA (see MTP3-User Adaptation Layer)
MAC (see Message Authentication Code)
Maximum Transfer Unit
Black hole detection, 84
Discovery, 8085
MDTP (see Multi-network Datagram Transmission
Protocol)
Media Gateway, 31
Media Gateway Controller, 31
Message Authentication Code, 63
Message Digest 5, 63
Message Transfer Part, 1416
MMUSIC (see Multiparty Multimedia Session Control)
Modified Adler-32 Checksum, 118
MPLS (see Multiprotocol Label Switching
Architecture)
MTP (see Message Transfer Part)
MTP2-User Adaptation Layer, 109
MTP2-User Peer-to-Peer Adaptation Layer, 110
MTP3-User Adaptation Layer, 108
MTU (see Maximum Transfer Unit)
Multihoming, 1035, 122
Multi-network Datagram Transmission Protocol, 34
Acknowledgedment Number, 39
Biggest message, 39
Data field, 40
Data Size, 39
Endpoint drain procedure, 95
Establishment procedure, 5152
Flags, 39
Header, 3740
In Queue field, 40
Mode, 39
Of field, 39
Part field, 39
Protocol Identifier field, 38
Sequence Number, 39
Termination of an endpoint procedure, 95
Version field, 39
Multiparty Multimedia Session Control Working
Group, 31
Multiprotocol Label Switching Architecture, 28
N
NAKs, 76
NAT (see Network Address Translator)
Network Address Translator, 6062
Nonassociated signaling mode, 7
O
OOTB (see Out of the blue datagram)
Out of the blue datagram, 96
Out-of-band signaling, 5
P
Packet-switched network, 7
Padding, 46, 47
Parameters, 4647
INIT ACK parameters, 5965
INIT parameters, 5965
Parameter Length, 47
Parameter Type, 46
Path heartbeat mechanism, 89, 121
Payload Protocol Identifier, 70
Per stream flow control, 1035
Polynomial Code, 113
Primary Address, 104
Pseudo header, 112
PURDET, 34
Q
Q.921, 110
Q.931, 110
QoS (see Quality of Service)
Quality of Service, 27
R
RAP (see Resource Allocation Protocol)
Real Time Protocol, 28, 34
Real Time Streaming Protocol, 28
Reference implementation, 124
Reliable request procedure, 104
Reliable UDP, 34
Reordering of packets, 107
Resource Allocation Protocol, 27
Resource Reservation Protocol, 27
Retransmission Time-Out, 73, 8587
Round Trip Time, 8587
Round Trip Time Variation, 87
RSVP (see Resource Reservation Protocol)
RTO (see Retransmission Time-Out)
RTP (see Real Time Protocol)
RTSP (see Real Time Streaming Protocol)
RTT (see Round Trip Time)
RTTVAR (see Round Trip Time Variation)
RUDP (see Reliable UDP)
S
SCCP (see Signaling Connection Control Part)
SCCP-User Adaptation Layer, 108
Index
142

Secure Hash Standard 1, 63
Sequence number attack, 56
Service Specific Connection-Oriented Protocol, 34
Session Announcement Protocol, 28
Session Description Protocol, 28
Session Initiation Protocol, 28
SHA-1 (see Secure Hash Standard 1)
Signaling Connection Control Part, 16
Signaling Gateway, 31
Signaling System #7
Functional architecture, 713
Global title translation, 10, 16
International plane, 8
Linkset, 11
Message discrimination, 7
National Plane, 8
Protocol architecture, 1317
SCP (see Service Control Point)
Screening, 10
Service Control Point, 11
Service Switching Point, 9
Signaling Link, 8, 1113
Signaling Point, 7
Signaling Transfer Point, 910
SSP (see Service Switching Point)
STP (see Signaling Transfer Point)
Signaling Transport Working Group, 31, 48, 70, 122
SIGTRAN (see Signaling Transfer working group)
Simple SCCP Tunneling Protocol, 34
Slow start algorithm, 73
Smoothed Round Trip Time, 86
Socket interface, 123
SP (see Signaling Point)
SRTT (see Smoothed Round Trip Time)
SS7 (see Signaling System #7)
SSN (see Stream Sequence Number)
SSTP (see Simple SCCP Tunneling Protocol)
Stream Control Transmission Protocol
Checksum, 43, 11119
Common header, 4043
Congestion avoidance algorithms, 7076
Cookie mechanism, 35
Defects found, 36, 75, 11121
Destination Port Number, 41
Establishment procedure, 51
Extensibility features, 36, 4748
Primary address, 60
Protocol number, 38
Public implementation, 125, 127
Source Port Number, 41
State diagram, 4850
TCB (see Transmission Control Block)
Transmission Control Block, 59
Verification Tag, 42, 55
Stream Sequence Number, 70
Streams, 33, 58, 7680, 87, 122
SUA (see SCCP-User Adaptation Layer)
T
T/UDP (see UDP for TCAP)
TCAP (see Transaction Capabilities Application Part)
TCP (see Transmission Control Protocol)
TCP-32 Checksum, 115
Telephone User Part, 14
Tie-Tags, 64, 120
TLV (see Type-Length-Value structure)
Transaction Capabilities Application Part, 1617
Transmission Control Protocol
Checksum, 112
Extensibility problems, 4445
Problems to become the Common Transport
Protocol, 33
SYN attack, 33, 5254
TIME WAIT state, 98
Timestamps, 85
Transmission Sequence Number, 67
Transport Area Working Group, 36, 48, 68
Transport Layer Security, 110
TSN (see Transmission Sequence Number)
TSVWG (see Transport Area Working Group)
TUP (see Telephone User Part)
Two-army problem, 100
Type-Length-Value structure, 43
U
UDP for TCAP, 34
Unreliable SCTP, 105
W
WATS (see Wide Area Telephone Service)
Wide Area Telephone Service, 6
Index
143

Stream Control Transmission Protocol

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Stream Control Transmission Protocol

Transféré par

Droits d'auteur :

Formats disponibles

Stream Control Transmission Protocol

The design of a new reliable transport

Vous aimerez peut-être aussi