Académique Documents
Professionnel Documents
Culture Documents
Plan
Motivation DWCS QoS Packet Scheduler Intel IXP Network Processor Design Challenges Hierarchically Indexed Linear Queue (HILQ) Results Conclusions
Motivation
Real time media gateway to
support thousands of concurrent media steams schedule packets at wire-speed, 100mbps or even 1000mbps exploit state-of-the-art architectural features to speed up scheduling throughput
real time packet scheduler ensures QoS on per steam basis limit the number of late packets for each steam over finite window of arrivals per steam loss tolerance constrain, in a window of y packets, at maximal x packets can be late or missing scheduling is feasible when certain conditions are met
DWCS Scheduler
while TRUE:
Find stream i with highest priority(use a precedence table) Service packet at head of stream i Adjust loss-tolerance for i according to some rules. Deadline(i) = Deadline(i) + Inter-packet gap(i) For each stream j missing its deadline: While deadline is missed: Adjust loss-tolerance for j according to some rules. Drop head packet of stream j if droppable Deadline(j) = Deadline(j) + Inter-packet gap(j)
Earliest Deadline First (EDF). Equal deadlines, order lowest window constraint (x/y) first. Equal deadlines and zero window-constraints, order highest window-denominator first (y). Equal deadlines and equal non-zero windowconstraints, order lowest window-numerator first (x). All other cases: First-Come-First-Serve.
3.
4.
5.
designed for software router multiple RISC cores in a single chip simultaneous multithreading shared memory architecture packet level parallelism load/store architecture with big data transfer size
Design Challenges
QoS packet scheduler is hard to be parallelized simultaneous multi-threading is good for throughput but not for latency heap based implementation requires too many memory accesses for per scheduled packet heap based implementation on IXP shows bad scalability.
receive threads
scheduler
transmit threads
50 40 30 20 10 0 19 21 23 25 27 29 31 33 35
Number of Cycles
one segment corresponds to a fixed time window new arrival packet is put to a segment based on its deadline a transmit thread keeps a pointer to the entry whose packet should be put on the wire next. sweep through all entries of a segment and jumps to the next segment when its time comes.
x
Inside a segment, position of a packet is determined according to DWCS rules
1 y 1 2 3 30
19 20
Level 1 vectors
2 n=N
0 1 2 3 . . . . 31
0 1 2 3 . . . . 31
Results
memory accesses / per scheduled packet
No. of active streams Memory access# Per Stream Heap HILQ 10
45.86 19.8
50
73.73 14.36
100
85.73 13.68
200
97.73 13.34
500
113.59 13.135
1000
125.59 13.068
2000
137.58 13.034
Results
scheduling cycle scalability
Scheduling Delay per Stream (microenine cycles)
4000 3500 3000 2500 2000 1500 1000 500 0 0 500 1000 1500 2000 2500
Results
throughput scalability
1200
Throughput (Mbps)
1000
800
600
400
200
size
500 600
Conclusions
HILQ based DWCS significantly reduces memory accesses for per scheduled packet comparing with heap based implementation. HILQ is able to service thousands of steams at high networking speed. HILQ achieves its performance through optimizing the scheduler algorithm and exploiting certain architecture attributes