Vous êtes sur la page 1sur 19

Efficient Implementation of Packet Scheduling Algorithm on Network Processor

Weidong Shi Xiaotong Zhuang Indrani Paul Karsten Schwan

Plan

Motivation DWCS QoS Packet Scheduler Intel IXP Network Processor Design Challenges Hierarchically Indexed Linear Queue (HILQ) Results Conclusions

Motivation
Real time media gateway to

support thousands of concurrent media steams schedule packets at wire-speed, 100mbps or even 1000mbps exploit state-of-the-art architectural features to speed up scheduling throughput

DWCS dynamic window constraint scheduling

real time packet scheduler ensures QoS on per steam basis limit the number of late packets for each steam over finite window of arrivals per steam loss tolerance constrain, in a window of y packets, at maximal x packets can be late or missing scheduling is feasible when certain conditions are met

DWCS Scheduler
while TRUE:
Find stream i with highest priority(use a precedence table) Service packet at head of stream i Adjust loss-tolerance for i according to some rules. Deadline(i) = Deadline(i) + Inter-packet gap(i) For each stream j missing its deadline: While deadline is missed: Adjust loss-tolerance for j according to some rules. Drop head packet of stream j if droppable Deadline(j) = Deadline(j) + Inter-packet gap(j)

DWCS Packet Ordering Rules


Precedence among pairs of packets
1. 2.

Earliest Deadline First (EDF). Equal deadlines, order lowest window constraint (x/y) first. Equal deadlines and zero window-constraints, order highest window-denominator first (y). Equal deadlines and equal non-zero windowconstraints, order lowest window-numerator first (x). All other cases: First-Come-First-Serve.

3.

4.

5.

Heap Based DWCS

Intel IXP Network Processor

designed for software router multiple RISC cores in a single chip simultaneous multithreading shared memory architecture packet level parallelism load/store architecture with big data transfer size

Design Challenges

QoS packet scheduler is hard to be parallelized simultaneous multi-threading is good for throughput but not for latency heap based implementation requires too many memory accesses for per scheduled packet heap based implementation on IXP shows bad scalability.

receive threads

scheduler

transmit threads

Latency Distribution of SRAM Access


70 60

Percentage of SRAM Access

50 40 30 20 10 0 19 21 23 25 27 29 31 33 35

Number of Cycles

Hierarchically Indexed Linear Queue


Transmit pointer

1ms segment with many entries


one segment corresponds to a fixed time window new arrival packet is put to a segment based on its deadline a transmit thread keeps a pointer to the entry whose packet should be put on the wire next. sweep through all entries of a segment and jumps to the next segment when its time comes.

Hierarchically Indexed Linear Queue


Packets with increasing window-numerator (x)

Packets with similar loss tolerance x/y

x
Inside a segment, position of a packet is determined according to DWCS rules

1 y 1 2 3 30

19 20

Hierarchically Indexed Linear Queue


Example:
Assume that the maximum possible x is 20, maximum possible Y is 30 and there are 50 loss tolerance regions, 2000 cells.

Entry position p = tab(x,y) + x


Each entry stores a pointer to a buffer of packet pointers

Hierarchically Indexed Linear Queue


Level 2 vectors 0 SRAM Linear queue
1 2 3 . . . . 31 0 1 2 3 . . . . 31

Level 1 vectors

speed up packet search with multilevel bit map

2 n=N

0 1 2 3 . . . . 31

0 1 2 3 . . . . 31

Hierarchically Indexed Linear Queue

Level 0 segment 0 segment 1 Transmission Pointer segment 99

Results
memory accesses / per scheduled packet
No. of active streams Memory access# Per Stream Heap HILQ 10
45.86 19.8

50
73.73 14.36

100
85.73 13.68

200
97.73 13.34

500
113.59 13.135

1000
125.59 13.068

2000
137.58 13.034

Results
scheduling cycle scalability
Scheduling Delay per Stream (microenine cycles)
4000 3500 3000 2500 2000 1500 1000 500 0 0 500 1000 1500 2000 2500

No. of Active Streams per Segment in System

Results
throughput scalability
1200

Throughput (Mbps)

1000

800

600

400

200

512 byte packetsizebyte packet256


0 0 100 200 300 400

size
500 600

Number of Active Streams ActiveAStreams|SStreams per

Conclusions

HILQ based DWCS significantly reduces memory accesses for per scheduled packet comparing with heap based implementation. HILQ is able to service thousands of steams at high networking speed. HILQ achieves its performance through optimizing the scheduler algorithm and exploiting certain architecture attributes

Vous aimerez peut-être aussi