Académique Documents
Professionnel Documents
Culture Documents
Outline
Introduction to parallel I/O library
HDF5,netCDF,netCDF4
Parallel HDF5 and Parallel netCDF
performance comparison
Parallel netCDF4 and Parallel netCDF
performance comparison
Collective I/O optimizations inside HDF5
Conclusion
P1
P2
P3
P0 becomes bottleneck
May exceed system memory
I/O library
File System
P0
P1
P2
File System
P3
I/O library
P0
P1
P2
P3
P0
P1
P2
P3
HDF5 VS netCDF
NetCDF
Overview of NetCDF4
Advantages:
Disadvantage:
P1
P2
P3
Row 1
P0 P1 P2 P3
Row 2
P0 P1 P2
P3
Outline
Introduction to parallel I/O library
HDF5,netCDF,netCDF4
Parallel HDF5 and Parallel netCDF
performance comparison
Parallel netCDF4 and Parallel netCDF
performance comparison
Collective I/O optimizations inside HDF5
Conclusion
Previous Study:
NCAR Bluesky
Power4
LLNL uP
Power5
10
PnetCDF
HDF5 independent
PnetCDF
60
HDF5 independent
2500
50
2000
MB/s
MB/s
40
30
20
1500
1000
10
500
10
60
110
Number of Processors
Bluesky: Power 4
160
10
110
210
310
Number of Processors
uP: Power 5
12
HDF5 collective
PnetCDF
HDF5 independent
60
HDF5 collective
HDF5 independent
2500
50
2000
M B /s
MB/s
40
30
20
1500
1000
10
500
0
10
60
110
Number of Processors
Bluesky: Power 4
160
10
110
210
310
Number of Processors
uP: Power 5
13
ROMS
Regional Oceanographic Modeling System
Supports MPI and OpenMP
I/O in NetCDF
History file writer in parallel
Data:
60
14
Bandwidth (MB/S)
PNetCDF collective
NetCDF4 collective
160
140
120
100
80
60
40
20
0
0
16
32
48
64
80
96
112
128
144
Number of processors
Bandwidth (MB/S)
300
250
200
150
100
50
0
0
16
32
Outline
Introduction to parallel I/O library
HDF5,netCDF,netCDF4
Parallel HDF5 and Parallel netCDF
performance comparison
Parallel netCDF4 and Parallel netCDF
performance comparison
Collective I/O optimizations inside HDF5
Conclusion
17
Improvements of collective IO
supports inside HDF5
Advanced HDF5 feature: non-regular
selections
Performance optimizations: chunked
storage
18
Improvement 1
HDF5 non-regular selections
2-D array with the IO in shaded selections
Performance issue:
Severe performance penalties with many small
chunk IOs
20
chunk 2
chunk 3
chunk 4
P0
P1
MPI-IO
Collective View
21
Improvement 3:
Multi-chunk IO Optimization
Problem
P0
P4
P0
P4
P1
P5
P1
P5
P2
P6
P2
P6
P3
P7
P3
P7
22
Improvement 4
Problem
Solution
23
One linked-chunk IO
NO
Multi-chunk IO
Yes
Decision-making about
Collective IO
Yes
One Collective IO call for all chunks
NO
Independent IO
24
25
Conclusions
Acknowledgments
27