Efficient Implementation of Packet Scheduling Algorithm On Network Processor

Efficient Implementation of Packet Scheduling Algorithm on Network Processor
Weidong Shi Xiaotong Zhuang Indrani Paul Karsten Schwan
Plan
Motivation DWCS QoS Packet Scheduler Intel IXP Network Processor Design Challenges Hierarchically Indexed Linear Queue (HILQ) Results Conclusions
Motivation
Real time media gateway to
support thousands of concurrent media steams schedule packets at wire-speed, 100mbps or even 1000mbps exploit state-of-the-art architectural features to speed up scheduling throughput
DWCS dynamic window constraint scheduling
real time packet scheduler ensures QoS on per steam basis limit the number of late packets for each steam over finite window of arrivals per steam loss tolerance constrain, in a window of y packets, at maximal x packets can be late or missing scheduling is feasible when certain conditions are met
DWCS Scheduler
while TRUE:
Find stream i with highest priority(use a precedence table) Service packet at head of stream i Adjust loss-tolerance for i according to some rules. Deadline(i) = Deadline(i) + Inter-packet gap(i) For each stream j missing its deadline: While deadline is missed: Adjust loss-tolerance for j according to some rules. Drop head packet of stream j if droppable Deadline(j) = Deadline(j) + Inter-packet gap(j)
DWCS Packet Ordering Rules

Precedence among pairs of packets
1. 2.
Earliest Deadline First (EDF). Equal deadlines, order lowest window constraint (x/y) first. Equal deadlines and zero window-constraints, order highest window-denominator first (y). Equal deadlines and equal non-zero windowconstraints, order lowest window-numerator first (x). All other cases: First-Come-First-Serve.
3.
4.
5.
Heap Based DWCS
Intel IXP Network Processor
designed for software router multiple RISC cores in a single chip simultaneous multithreading shared memory architecture packet level parallelism load/store architecture with big data transfer size
Design Challenges
QoS packet scheduler is hard to be parallelized simultaneous multi-threading is good for throughput but not for latency heap based implementation requires too many memory accesses for per scheduled packet heap based implementation on IXP shows bad scalability.
receive threads
scheduler
transmit threads
Latency Distribution of SRAM Access

70 60
Percentage of SRAM Access
50 40 30 20 10 0 19 21 23 25 27 29 31 33 35
Number of Cycles
Hierarchically Indexed Linear Queue

Transmit pointer
1ms segment with many entries

one segment corresponds to a fixed time window new arrival packet is put to a segment based on its deadline a transmit thread keeps a pointer to the entry whose packet should be put on the wire next. sweep through all entries of a segment and jumps to the next segment when its time comes.

Packets with increasing window-numerator (x)
Packets with similar loss tolerance x/y
x
Inside a segment, position of a packet is determined according to DWCS rules
1 y 1 2 3 30
19 20

Example:
Assume that the maximum possible x is 20, maximum possible Y is 30 and there are 50 loss tolerance regions, 2000 cells.
Entry position p = tab(x,y) + x

Each entry stores a pointer to a buffer of packet pointers

Level 2 vectors 0 SRAM Linear queue
1 2 3 . . . . 31 0 1 2 3 . . . . 31
Level 1 vectors
speed up packet search with multilevel bit map
2 n=N
0 1 2 3 . . . . 31
0 1 2 3 . . . . 31
Level 0 segment 0 segment 1 Transmission Pointer segment 99
Results
memory accesses / per scheduled packet
No. of active streams Memory access# Per Stream Heap HILQ 10
45.86 19.8
50
73.73 14.36
100
85.73 13.68
200
97.73 13.34
500
113.59 13.135
1000
125.59 13.068
2000
137.58 13.034
Results
scheduling cycle scalability
Scheduling Delay per Stream (microenine cycles)
4000 3500 3000 2500 2000 1500 1000 500 0 0 500 1000 1500 2000 2500
No. of Active Streams per Segment in System
Results
throughput scalability
1200
Throughput (Mbps)
1000
800
600
400
200
512 byte packetsizebyte packet256

0 0 100 200 300 400
size
500 600
Number of Active Streams ActiveAStreams|SStreams per
Conclusions
HILQ based DWCS significantly reduces memory accesses for per scheduled packet comparing with heap based implementation. HILQ is able to service thousands of steams at high networking speed. HILQ achieves its performance through optimizing the scheduler algorithm and exploiting certain architecture attributes

Efficient Implementation of Packet Scheduling Algorithm On Network Processor

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Efficient Implementation of Packet Scheduling Algorithm On Network Processor

Transféré par

Droits d'auteur :

Formats disponibles

Efficient Implementation of Packet Scheduling Algorithm on Network Processor

Weidong Shi Xiaotong Zhuang Indrani Paul Karsten Schwan

DWCS dynamic window constraint scheduling

DWCS Packet Ordering Rules

Heap Based DWCS

Intel IXP Network Processor

Latency Distribution of SRAM Access

Percentage of SRAM Access

Hierarchically Indexed Linear Queue

1ms segment with many entries

Hierarchically Indexed Linear Queue

Packets with similar loss tolerance x/y

Hierarchically Indexed Linear Queue

Entry position p = tab(x,y) + x

Hierarchically Indexed Linear Queue

speed up packet search with multilevel bit map

Hierarchically Indexed Linear Queue

Level 0 segment 0 segment 1 Transmission Pointer segment 99

No. of Active Streams per Segment in System

512 byte packetsizebyte packet256

Number of Active Streams ActiveAStreams|SStreams per

Vous aimerez peut-être aussi