Vous êtes sur la page 1sur 50

Platform Architecture Lab

USB Performance Analysis


of Bulk Traffic

Brian Leete
brian.a.leete@intel.com

Introduction
Platform Architecture Lab

Bulk

Traffic

Designed

for reliable, highly variable data

transfer
No guarantees are made in the specification
for throughput
Is scheduled last after ISOC, Interrupt, and
Control
Throughput is dependant on many factors

Introduction
will look at Bulk Throughput from the
following aspects

Platform Architecture Lab

We

Distribution

of Throughput for Various Packet Sizes and

Endpoints
Low Bandwidth Performance
Small Endpoint Performance
Nak Performance
CPU Utilization
PCI bus Utilization

Test Environment -- Hardware


Platform Architecture Lab

PII

233 (8522px) with 512 Bytes Cache


Atlanta Motherboard with 440LX (PIX
4A) Chipset
32 Meg Memory
Symbios OHCI Controller (for OHCI
Measurements)
Intel Lava Card as Test Device

Test Environment -- Software


Platform Architecture Lab

Custom

Driver and Application


Test Started by IOCTL
IOCTL allocates static memory
structures, submits IRP to USBD
Completion routine resubmits next
buffer
All processing done at ring 0,
IRQL_DISPATCH

Terminology
Platform Architecture Lab

A Packet is a Single Packet of Data on the Bus. It is


determined by Max Packet Size of the Device
Valid

A Buffer is the amount of data sent to USBD in a


Single IRP.
In

numbers are 8, 16, 32, 64

this presentation buffers range from 8 Bytes to 64K Bytes

Unless otherwise specified, Most Data Taken at 64


Byte Max Packet Size, 15 Endpoints Configured in the
System

Platform Architecture Lab

Host Controller Operation (UHCI)

Total Throughput on All End Points v.s. Buffer Size for Multiple
Endpoints
(UHCI)
Oscillations @ 256, 512

Single Endpoint Throughput

Byte Buffers

Flat Throughput @
512 and 1024 Byte
Buffers

1000000

Throughput
(Bytes per Second)

Platform Architecture Lab

1200000

800000

600000

400000

32767
8192
2048
512
Buffer Size
128
(Bytes)

200000

0
15 14
13 12

11 10

32
9

Number of Endpoints

Small Buffer Throughput

8
4

Small Buffer Throughput


Platform Architecture Lab

For

Buffer Sizes < Max Packet Size

Host

Controller sends 1 Buffer per Frame

No

Ability to Look Ahead and Schedule


Another IRP Even Though Time Remains in
the Frame

Why

is this?

Platform Architecture Lab

Interrupt Delay

Last Packet
Buffer 'n'

First Packet
Buffer 'n+1'

Unused Frame

Software Latency

Start of Frame
Interrupt

10

Platform Architecture Lab

Single Endpoint Graph

Flat Throughput @ 1024 and 512 Byte Graphs

Single Ended Throughput for 64K Byte Buffers Below


Theoretical Max of 1216000 Bytes per Second

Both are explained by Looking at the Number of


Packets per Frame

11

Platform Architecture Lab

Maximum Packets per Frame


Buffer Maximum Number of Number
Total
Size
Bytes per Frames to of Bytes Number of
Frame (15 transfer Left Over Frames To
Packets @ bulk of data
Transfer
64 Bytes
Data
Per
Packet)
8
960
1
0
1
16
960
1
0
1
32
960
1
0
1
64
960
1
0
1
128
960
1
0
1
256
960
1
0
1
512
960
1
0
1
1024
960
1
64
2
2048
960
2
128
3
4096
960
4
256
5
8192
960
8
512
9
16384
960
17
64
18
32768
960
34
128
35
65536
960
68
256
69

Maximum Measured
Expected Throughput
Throughput (Bytes per
(Bytes per
Second)
Second for
Transfer
Size)
8000
8071
16000
16082
32000
32293
64000
64264
128000
129186
256000
255667
512000
512017
512000
515515
682666
682803
819200
819200
910222
910131
910222
910404
936228
936072
949797
948087
12

Throughput for Multiple Endpoints


512 Byte Buffers
1000000

Throughput (Bytes Per Second)

Platform Architecture Lab

1200000

800000

600000

400000

200000

0
1

10

11

12

13

14

15

Number of Endpoints

13

Platform Architecture Lab

512 Byte Buffers 1 Endpoint


End
Point
1

SOF

Inter P
Delay
(Bits)
1000 0

M B

End
Time
(Bits)
5000

8 Packets Total per Frame

8 Packets * 64 Bytes per Packet = 512,000 B/S


511986

Measured

14

Platform Architecture Lab

512 Byte Buffers 2 Endpoints


End
Point
2
1

SOF

Inter
Delay
(Bits)
5

0
0

1
1

2
2

3
3

4
4

Ending
Time
(Bits)

5
5

6
6

480

16 Packets Total per Frame

16 Packets * 64 Bytes per Packet = 1,024,000 B/S


1,022,067

B/S Measured

Notice that Interrupt Delay is not a factor here!

15

512 Byte Buffer -- 3 Endpoints


For Frame N

Platform Architecture Lab

End
Point
3
2
1

Inter
Delay
(Bits)
S
O
F

0
1000

1
0
1
15 Packets Total in This Frame

Ending
Time

554

For Frame N + 1
End
Point
3
2
1

S
O
F

Inter
Delay
(Bits)

Ending
Time

7
6

4
5
9 Packets Total in This Frame

7
6

5700

24 Packets * 64 Bytes / 2 Frames = 768,000 B/S


776,211

Measured
16

Total Throughput on All Endpoints V.S. Buffer Size


for Multiple Endpoints
(OHCI)

Single Ended Throughput


900,000 VS 950,000 B/S

1200000

Total Throughput
(Bytes per Second)

Platform Architecture Lab

High End Throughput


18 PPF VS 17 PPF

Flat Throughput @
512 and 1024 B
Buffers

1000000

800000

600000

400000
32768
4096
512

200000
0

Oscillations @
256 and 512 B
buffers

15 14 13

64
12 11 10
9 8 7
6 5 4
3 2

Number of Endpoints

Buffer Size
(Bytes)

Small Buffer
Throughput

17

Platform Architecture Lab

Minimal Endpoint Configuration

Total Throughput on All Endpoints V.S. Buffer Size for


Multiple Endpoints
Minimal Endpoint Configuration
(UHCI)

1200000

Total Throughput
(Bytes per Second)

Platform Architecture Lab

Higher Single Endpoint


Throughput 17 VS 15 PPF

1000000

800000

600000

400000

32768

200000

4096
512

0
15 14

64
13

12

11 10

Number of Endpoints

Buffer Size
(Bytes)

8
4

19

Platform Architecture Lab

Host Controller Operation (UHCI)

20

Throughput of a Single Endpoint in Single and Multiple Endpoint Configurations


(UHCI)
1200000

Throughput (Bytes per Second)

Platform Architecture Lab

1000000

800000

Single
Multiple

600000

400000

200000

0
8

16

32

64

128

256

512

1024

2048

4096

8192

16384 32768 65536

Buffer Size (Bytes)

21

Platform Architecture Lab

Results

We

are working with Microsoft to remove


unused endpoints from the Host Controller
Data Structures

22

Total Throughput on All Endpoints V.S. Buffer Size for Multiple Endpoints
Minimal Endpoint Configuration
(OHCI)

Higher Single Endpoint


Throughput

More Endpoints get 18


Packets per Frame

1000000

Total Throughput
(Bytes per Second)

Platform Architecture Lab

1200000

800000

600000

400000

32768
8192
2048
512
Buffer Size
128
(Bytes)

200000

0
15 14 13
12 11
10 9

32
8

Number of Endpoints

8
3

23

Platform Architecture Lab

Distribution of Throughput across


Endpoints

Throughput by End Point V.S. Number of Endpoints


(UHCI)
64K Byte Buffers

900000
800000
700000
Throughput
(Bytes Per Sec)

Platform Architecture Lab

1000000

600000
500000
400000
300000
200000
3
100000

6
9

0
1

Number of Endpoints

12
4

Endpoint Number

10 11
12 13
14 15

15

25

Platform Architecture Lab

Results

We are working with Microsoft to get the Host


Controller driver to start sending packets at the next
endpoint rather than starting over at the beginning of
the frame.

26

Throughput by Endpoint V.S. Number of Endpoints


64K Byte Buffers
(OHCI)

800000
700000
600000
Throughput
(Bytes Per Sec)

Platform Architecture Lab

900000

500000
400000
300000
200000
3
100000

6
9

0
1

Num ber of Endpoints

12
4

Endpoint Num ber

10 11
12 13
14 15

15

27

Platform Architecture Lab

Limited Bandwidth Operation

Throughput by Endpoint V.S. Number of Endpoints


1023 Bytes / Frame Isoc Traffic
(UHCI)

250000

Throughput
(Bytes per Second)

Platform Architecture Lab

300000

200000

150000

100000

50000

3
5
7
9

0
1

Endpoint Number

11
2

13
7

Number of Endpoints

10

11

12

15
13

14

15

29

Throughput by Endpoint V.S. Number of Endpoints


768 Bytes / Frame Isoc Traffic
(OHCI)

400000

Throughput (Bytes Per Sec)

Platform Architecture Lab

350000

300000

250000

200000

150000

100000
1
3
50000

5
7
9

0
1

Endpoint Number

11
2

13
8

Number of Endpoints

10 11
12

15
13 14

15

30

Platform Architecture Lab

Small Endpoint Performance

450000
400000

Total Throughput
(Bytes per Second)

Platform Architecture Lab

Total Throughput on All End Points V.S. Buffer Size for


Multiple Endpoints
8 Byte Max Packet Size
(UHCI)

350000
300000
250000
200000
150000
100000
32768
4096

50000

512

0
15 14 13
12 11 10
9 8 7
6

64
5 4

Buffer Size
(Bytes)

8
3

Number of Endpoints
32

Total Throughput on All End Points v.s. Buffer Size for Multiple Endpoints
8 Byte Max Packet Size
(OHCI)

450000

400000

Total Throughput
(Bytes per Second)

Platform Architecture Lab

500000

350000
300000
250000
200000
150000
100000
32768
8192

50000

2048
512

0
15

128
14

13

12

11

10

Buffer Size
(Bytes)

32
9

Number of Endpoints

8
3

33

Total Throughput for a Single Endpoint for Various Packet Sizes


(OHCI)
1000000

800000
Throughput (Bytes per Second)

Platform Architecture Lab

900000

700000
600000
8
16

500000

32
64

400000
300000
200000
100000
0
8

16

32

64

128

256

512

1024

2048

4096

8192

16384

32768

65536

Buffer Size

34

Throughput by Endpoint V.S. Number of Endpoints


Mixed 64 and 8 Byte Endpoints
(UHCI)

900000

Throughput (Bytes per Second)

Platform Architecture Lab

1000000

800000

700000

600000
500000
400000
300000
200000

1
3
5

100000

7
9

0
1

11
2

Number of Endpoints

13
7

Endpoint Number

10

11

12

15
13

14

15

35

Platform Architecture Lab

If you care about throughput.

Use 64 byte Max Packet Size Endpoints

Use Large Buffers

36

Platform Architecture Lab

Nak Performance

1200000

1000000

Total Throughput

Platform Architecture Lab

Total Throughput on All Endpoints V.S. Buffer Size


for Multiple Endpoints
with 1 Endpoint NAKing 64 Bytes OUT
(OHCI)

800000

600000

400000

32768
8192
2048
Buffer Size
512
(Bytes)
128

200000

0
15 14
13

12 11

10

32
9

Number of Endpoints

8
3

38

Single Endpoint Throughput


With 64 Byte Endpoint NAKing on the Bus
(OHCI)

900000

45 % Drop in Total
Throughput

800000
700000
600000

No NAK
NAK

500000
400000
300000
200000
100000

Buffer Size

9
16 2
38
32 4
76
65 8
53
6

81

96

40

48

20

24

10

51

6
25

8
12

64

32

16

0
8

Throughput (Bytes per Second)

Platform Architecture Lab

1000000

39

Total Throughput on All Endpoints V.S. Buffer Size for Multiple Endpoints
14 Endpoints OUT, 1 Endpoint NAK IN
(UHCI)

1000000

Total Throughput
(Bytes per Second)

Platform Architecture Lab

1200000

800000

600000

400000

32768
8192
2048
Buffer Size
512
(Bytes)
128

200000

0
15 14
13 12
11 10

32
9

Number of Endpoints

8
3

40

Single Endpoint Throughput


One Endpoint NAKing IN
1000000

Throughput (Bytes per Second)

Platform Architecture Lab

900000

800000

700000

600000

Nak
No NAK

500000

400000

300000

200000

100000

0
8

16

32

64

128

256

512

1024

2048

4096

8192

16384 32768 65536

Buffer Size

41

Platform Architecture Lab

CPU Utilization

CPU Utilization
Platform Architecture Lab

Idle process incrementing a counter in main memory


Designed

Numbers indicate how much work the CPU could


accomplish after servicing USB traffic
Higher

numbers are better

Small buffers and large numbers of Endpoints take


more overhead
Software

to simulate a heavily CPU bound load

Stack Navigation

Endpoint 0 is the Control -- No USB Traffic running

43

CPU Utilization
(UHCI)

10000000

8000000

6000000

4000000

3
5
7

2000000

Number of
Endpoints

11
13

65536

15
32768

Buffer Size (Bytes)

16384

8192

4096

0
2048

Idle Count

Platform Architecture Lab

12000000

44

CPU Utilization
(OHCI)

Platform Architecture Lab

12000000

10000000

8000000

6000000

4000000
3
5
7
2000000

9
11
13

0
2048 4096
8192 16384
32768 65536

15

45

Platform Architecture Lab

PCI Utilization

PCI Utilization
(UHCI)

30

25

% Utilization

Platform Architecture Lab

35

20

15
1
3
10

5
7
9

Number of Endpoints

11
13

0
2048

4096

8192

Buffer Size

15
16384 32768

65536

47

Platform Architecture Lab

PCI Utilization
(UHCI)

15 Endpoint Configuration

For low numbers of active endpoints, Host Controller


must poll memory for each unused endpoint, causing
relatively high utilization.

Removing unused endpoints will lower single


endpoint PCI utilization for this configuration.

48

Conclusions

UHCI Host Controller Driver needs a few tweaks

Platform Architecture Lab

Need

to get Host Controller to start sending packets where it last


left off rather than at endpoint 1.
Needs to remove unused endpoints from the list

Performance Recommendations
Use

64 Byte Max Packet Size Endpoints


Large Buffers are better than small buffers
Reduce NAKd traffic
Fast devices if possible

49

Platform Architecture Lab

Future Research Topics

Multiple IRPS per Pipe

USB needs to control throughput to the slow device


Small

Endpoints arent good


Small Buffers arent good
NAKing isnt good

50

Vous aimerez peut-être aussi