Académique Documents
Professionnel Documents
Culture Documents
Outline
Hour 1: Introduction Break Hour 2: Using the Grid Break Hour 3: Ongoing Research Q&A Session
2
Hour 1: Introduction
What is Grid Computing? Who Needs It? An Illustrative Example Grid Users Current Grids
Computational Grids
A network of geographically distributed resources including computers, peripherals, switches, instruments, and data. Each user should have a single login account to access all resources. Resources may be owned by diverse organizations.
5
Computational Grids
Grids are typically managed by gridware. Gridware can be viewed as a special type of middleware that enable sharing and manage grid components based on user requirements and resource attributes (e.g., capacity, performance, availability)
6
Distributed Computing
People often ask: Is Grid Computing a fancy new name for the concept of distributed computing? In general, the answer is no. Distributed Computing is most often concerned with distributing the load of a program across two or more processes.
8
PEER2PEER Computing
Sharing of computer resources and services by direct exchange between systems. Computers can act as clients or servers depending on what role is most efficient for the network.
Distributed Supercomputing
Combining multiple high-capacity resources on a computational grid into a single, virtual distributed supercomputer. Tackle problems that cannot be solved on a single system.
11
High-Throughput Computing
Uses the grid to schedule large numbers of loosely coupled or independent tasks, with the goal of putting unused processor cycles to work.
12
On-Demand Computing
Uses grid capabilities to meet short-term requirements for resources that are not locally accessible. Models real-time computing demands.
13
Data-Intensive Computing
The focus is on synthesizing new information from data that is maintained in geographically distributed repositories, digital libraries, and databases. Particularly useful for distributed data mining.
14
Collaborative Computing
Concerned primarily with enabling and enhancing human-to-human interactions. Applications are often structured in terms of a virtual shared space.
15
Logistical Networking
Global scheduling and optimization of data movement. Contrasts with traditional networking, which does not explicitly model storage resources in the network. Called "logistical" because of the analogy it bears with the systems of warehouses, depots, and distribution channels.
16
An Illustrative Example
Tiffany Moisan, a NASA research scientist, collected microbiological samples in the tidewaters around Wallops Island, Virginia. She needed the high-performance microscope located at the National Center for Microscopy and Imaging Research (NCMIR), University of California, San Diego.
18
Example (continued)
She sent the samples to San Diego and used NPACIs Telescience Grid and NASAs Information Power Grid (IPG) to view and control the output of the microscope from her desk on Wallops Island. Thus, in addition to viewing the samples, she could move the platform holding them and make adjustments to the microscope.
19
Example (continued)
The microscope produced a huge dataset of images. This dataset was stored using a storage resource broker on NASAs IPG. Moisan was able to run algorithms on this very dataset while watching the results in real time.
20
Grid Users
Grid developers Tool developers Application developers End Users System Administrators
21
Grid Developers
Very small group. Implementers of a grid protocol who provides the basic services required to construct a grid.
22
Tool Developers
Implement the programming models used by application developers. Implement basic services similar to conventional computing services:
User authentication/authorization Process management Data access and communication
23
Tool Developers
Also implement new (grid) services such as:
Resource locations Fault detection Security Electronic payment
24
Application Developers
Construct grid-enabled applications for end-users who should be able to use these applications without concern for the underlying grid. Provide programming models that are appropriate for grid environments and services that programmers can rely on when developing (higher-level) applications.
25
System Administrators
Balance local and global concerns. Manage grid components and infrastructure. Some tasks still not well delineated due to the high degree of sharing required.
26
DTF
Currently being built by NSFs Partnerships for Advanced Computational Infrastructure (PACI) A collaboration: NCSA, SDSC, Argonne, and Caltech will work in conjunction with IBM, Intel, Quest Communications, Myricom, Sun Microsystems, and Oracle.
28
DTF Expectations
A 40-billion-bits-per-second optical network (Called TeraGrid) is to link computers, visualization systems, and data at four sites. Performs 11.6 trillion calculations per second. Stores more than 450 trillion bytes of data.
29
Globus
A collaboration of Argonne National Laboratorys Mathematics and Computer Science Division, the University of Southern Californias Information Sciences Institute, and the University of Chicago's Distributed Systems Laboratory. Started in 1996 and is gaining popularity year after year.
31
Globus
A project to develop the underlying technologies needed for the construction of computational grids. Focuses on execution environments for integrating widely-distributed computational platforms, data resources, displays, special instruments and so forth.
32
Condor
The Condor project started in 1988 at the University of Wisconsin-Madison. The main goal is to develop tools to support High Throughput Computing on large collections of distributively owned computing resources.
36
Condor
Runs on a cluster of workstations to glean wasted CPU cycles. A Condor pool consists of any number of machines, of possibly different architectures and operating systems, that are connected by a network. Condor pools can share resources by a feature of Condor called flocking.
37
Machines with resource management installed are called execute machines. A machine could be a submit and an execute machine simultaneously.
39
Condor-G
A version of Condor that uses Globus to submit jobs to remote resources. Allows users to monitor jobs submitted through the Globus toolkit. Can be installed on a single machine. Thus no need to have a Condor pool installed.
40
Legion
An object-based metasystems software project designed at the University of Virginia to support millions of hosts and trillions of objects linked together with high-speed links. Allows groups of users to construct shared virtual work spaces, to collaborate research and exchange information.
41
Legion
An open system designed to encourage third party development of new or updated applications, runtime library implementations, and core components. The key feature of Legion is its object-oriented approach.
42
Harness
A Heterogeneous Adaptable Reconfigurable Networked System A collaboration between Oak Ridge National Lab, the University of Tennessee, and Emory University. Conceived as a natural successor of the PVM project.
43
Harness
An experimental system based on a highly customizable, distributed virtual machine (DVM) that can run on anything from a Supercomputer to a PDA. Built on three key areas of research: Parallel Plug-in Interface, Distributed Peer-to-Peer Control, and Multiple DVM Collaboration.
44
IBP
The Internet Backplane Protocol (IBP) is a middleware for managing and using remote storage. It was devised at the University of Tennessee to support Logistical Networking in large scale, distributed systems and applications.
45
IBP
Named because it was designed to enable applications to treat the Internet as if it were a processor backplane. On a processor backplane, the user has access to memory and peripherals, and can direct communication between them with DMA.
46
IBP
IBP gives the user access to remote storage and standard Internet resources (e.g. content servers implemented with standard sockets) and can direct communication between them with the IBP API.
47
IBP
By providing a uniform, applicationindependent interface to storage in the network, IBP makes it possible for applications of all kinds to use logistical networking to exploit data locality and more effectively manage buffer resources.
48
NetSolve
A client-server-agent model. Designed for solving complex scientific problems in a loosely-coupled heterogeneous environment.
49
51
54
Gridware Collaboarations
NetSolve is using Globus' "Heartbeat Monitor" to detect failed servers. A NetSolve client is now in testing that allows access to Globus. Legion has adopted NetSolves client-user interface to leverage its metacomputing resources. The NetSolve client uses Legions data-flow graphs to keep track of data dependencies.
57
Gridware Collaboarations
NetSolve can access Condor pools among its computational resources. IBP-enabled clients and servers allow NetSolve to allocate and schedule storage resources as part of its resource brokering. This improves fault tolerance.
58
GRID COMPUTING
BREAK
59
General Issues.
Open questions of interest to the entire research community
60
Motivation
Computer speed doubles every 18 months Network speed doubles every 9 months
Graph from Scientific American (Jan-2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins
61
Special Projects
The SInRG Project.
Grid Service Clusters (GSCs) Data Switches
Security.
62
63
65
69
71
Types of Hardware
General purpose hardware can implement any function ASICs hardware that can implement only a specific application FPGAs reconfigurable hardware that can implement any function
72
The FPGA
FPGAs offer reprogrammability Allows optimal logic design of each function to be implemented Hardware implementations offer acceleration over software implementations which are run on general purpose processors
73
74
Objectives
Evaluate utility of NetSolve gridware. Determine effectiveness of hardware acceleration in this environment. Provide an interface for the remote use of FPGAs. Allow users to experiment and gauge whether a given problem would benefit from hardware acceleration.
75
Sample Implementations
Fast Fourier Transform (FFT) Data Encryption Standard algorithm (DES) Image backprojection algorithm A variety of combinatorial algorithms
76
Implementation Techniques
Two types of functions are implemented Software version - runs on the PCs processor Hardware version - runs in the FPGA To implement the hardware version of the function, VHDL code is needed
77
78
Synthesis
Software programmer
Server administrator
Configuration file
Request
Client
Request results
NetSolve server
80
Conclusions
Hardware acceleration is offered to both local and remote users. Resources are available through an efficient and easy-to-use interface. A development environment is provided for devising and testing a wide variety of software, hardware and hybrid solutions.
81
Unbridled Parallelism
Sometimes the overhead of gridware is unneeded. Well known examples include SETI@home and Folding@home. Were currently building a Vertex Cover solver with multiple levels of acceleration.
82
Grid Security
Algorithm complexity theory
Verifiability Concealment
85