Académique Documents
Professionnel Documents
Culture Documents
Introduction
Say you've got a big computation task to perform. Perhaps you have found a way to cure cancer, or you want to look for aliens. All you need is a few super computers to work out some calculations, but you've only got the one PC on your desk. What to do? A popular solution is the "distributed computing" model, where the task is split up into smaller chunks and performed by the many computers owned by the general public. This guide shows you how. Computers spend a lot of their time doing nothing. If you are reading this on your computer, that expensive CPU is most probably just sitting around waiting for you to press a key. What a waste! With the recent popularity of the Internet, distributed computing has become popular. The technology behind distributed computing is old, where it is usually known as parallel computing. When people speak of parallel computing, it is usually in the context of a local group of computers, owned by the same person or organization, with good links between nodes. The key issue here is that you are using computing power that you don't own. These computers are owned and controlled by other people, who you would not necessarily trust. Both angels and demons populate the world, and unless you know them personally, you can't tell them apart.
Distributed Computing
General Defination:
Distributed computing is any computing that involves multiple computers remote from each other that each has a role in a computation problem and/or information processing.
Defination
"Distributed Computing is a vague term incorporating the overlapping fields of: 1. Client/Server computing 2. Internet computing 3. Geographical distribution of computing over a wide area 4. Network peer-to-peer computing 5. Co-operative computing between workstations on a local area network
Distributed Computing
DistributedComputingEnvironmentArchitecture
Distributed Computing
Remote Procedure Call, which provides portability, network independence, and secure distributed applications. Directory services, which provide full X.500 support and a single naming, model to allow programmers and maintainers to identify and access distributed resources more easily. Time service, which provides a mechanism to monitor and track, clocks in a distributed environment and accurate time stamps to reduce the load on system administrator. Security service, which provides the network with
authentication, authorization, and user account management services to maintain the integrity, privacy, and authenticity of the distributed system.
o
Thread
service,
which
provides
simple,
portable,
programming model for building concurrent applications. 2. Data-sharing services provide end users with capabilities built upon the fundamental distributed services. These services require no programming on the part of the end user and facilitate better use of information. They include
o
Distributed file system, which interoperates with the network file system to provide a high-performance, scalable, and secure file access system. Diskless support, which allows low-cost workstations to use disks on servers, possibly reducing the need/cost for local disks, and provides performance enhancements to reduce network overhead.
The DCE supports International Open Systems Interconnect (OSI) standards, which are critical to global interconnectivity. It also implements ISO standards
Distributed Computing
such as Remote Operations Service Element (ROSE), Association Control Service Element (ACSE), and the ISO session and presentation services.
How It Works
In most cases today, a distributed computing architecture consists of very lightweight software agents installed on a number of client systems, and one or more dedicated distributed computing management servers. There may also be requesting clients with software that allows them to submit jobs along with lists of their required resources. An agent running on a processing client detects when the system is idle, notifies the management server that the system is available for processing, and usually requests an application package. The client then receives an application package from the server and runs the software when it has spare CPU cycles, and sends the results back to the server. The application may run as a screen saver, or simply in the background, without impacting normal use of the computer. If the user of the client system needs to run his own applications at any time, control is immediately returned, and processing of the distributed application package ends. This must be essentially instantaneous, as any delay in returning control will probably be unacceptable to the user. The following steps, labeled in the components diagram, show the interaction between the different components. 1. When the Client is idle, it sends a request for an Application package to the Network Manager. The application package consists of an Application Manager process and the application for which sub-jobs are to be run. 2. The Network Manager sends an application package to the client machine. 3. The Client s/w runs the Application Manager, and Application Manager registers itself with the Job Manager telling it that it is available to run subjobs for the corresponding application.
Distributed Computing
4.The Job Manager schedules a sub-job to be run on the client, and sends the Client the sub-gob parameters and input files. 5.The Application Manager on the Client runs the sub-jobs with the corresponding inputs. 6.When the application has finished, the results from the sub-jobs are delivered to the Job Manager. The Job Manager can continue to schedule sub-jobs on the Client for that application until it no longer needs the client and release it back to the Network Manager, of until the Network Manager reclaims the client for use elsewhere.
Distributed Computing
Distributed Computing
are necessary to prevent unauthorized access to systems and data within distributed systems that are meant to be inaccessible.
Distributed Computing
The drawback of centralization is that all information resides at the hub. The hub is thus a single point of failure, since if the hub dies then all client applications connected to the hub also die. The hub is also a bottleneck to scalability and performance. While one can introduce redundant hardware and employ better or faster hardware at the hub, this only alleviates the problem and does not solve it completely. Even though the hub-and-spoke architecture has found widespread acceptance in database servers and webservers, the drawbacks of scalability and fault-tolerance make it unsuitable for general purpose distributed application deployment. Examples of systems conforming to this centralized topology include J2EE servers and most commercially available web-servers and transaction processing monitors, including Microsofts MTS. Pure Peer-to-Peer Systems
Distributed Computing
A primary virtue of pure P2P systems is their scalability; any node can join a network and start exchanging data with any other node. Decentralized systems also tend to be fault tolerant, as the failure or shutdown of any particular node does not impact the rest of the system.
Hybrid Peer-to-Peer Systems In a hybrid peer-to-peer system, the control information is exchanged through a central server, while data flow takes place in a pure peer-to-peer manner as above. This architecture alleviates the manageability problems of pure P2P systems. The control server acts as a monitoring agent for all the other peers and ensures information coherence.
Figure Hybrid P2P systems The drawbacks associated with control being centrally managed still remain. If the central server goes down, the system looses ability to affect changes in data flow. However, existing applications are not affected by a failure of the central server as the data flow between nodes continues regardless of whether the central server is functional or not. Peer-to-peer data routing, allows the Hybrid system to offer better scalability than a centralized system; but hybrid systems still suffer from scalability problems for control information that flows through a single node. While Hybrid systems are being
10
Distributed Computing
effectively used for mission critical applications, the solutions are limited to solve relatively small-scale problems only. An example of a commercial hybrid P2P system is Groove. Groove implements collaborative project management software in which a central synchronizing server controls all information being exchanged between peers. Super-Peer Architecture A new wave of peer-to-peer systems is advancing an architecture of centralized topology embedded in decentralized systems; such topology forms a super-peer network.
11
Distributed Computing
Next Generation Distributed Computing Architecture Combining the Super-Peer topology with the Coarse-grained Component model [1] enables a distributed computing platform for a whole new generation of distributed applications which are more flexible, scalable, and reliable than traditional applications.
Figure The super peer architecture The super peer architecture closely maps to real world business processes. Each cluster maps to a business division. Super peers can have well defined protocols for cross cluster communication (acting as firewall for this virtual internet). The adjoining figure illustrates a 2-redundant super peer architecture
12
Distributed Computing
that alleviates the bottlenecks associated with a super peer being a single point of failure for its clients. In the following sections, we examine a real-world problem that represents a typical business process and discuss the implementation of this process over multiple software infrastructure system topologies.
13
Distributed Computing
and small blocks of data should be such that they can be processed effectively on a modern PC and report results that, when combined with other PC's results, produce coherent output. And the individual tasks should be small enough to produce a result on these systems within a few hours to a few days.
A query search against a huge database that can be split across lots of desktops, with the submitted query running concurrently against each fragment on each desktop.
Complex modeling and simulation techniques that increase the accuracy of results by increasing the number of random trials would also be appropriate, as trials could be run concurrently on many desktops, and combined to achieve greater statistical significance (this is a common method used in various types of financial risk analysis).
Exhaustive search techniques that require searching through a huge number of results to find solutions to a problem also make sense. Drug screening is a prime example.
Many of today's vendors, particularly Entropia and United Devices, are aiming squarely at the life sciences market, which has a sudden need for massive computing power. As a result of sequencing the human genome, the number of identifiable biological targets for today's drugs is expected to increase from about 500 to about 10,000. Pharmaceutical firms have repositories of millions of different molecules and compounds, some of which may have characteristics that make them appropriate for inhibiting newly found proteins. The process of matching all these "ligands" to their appropriate targets is an ideal task for distributed computing, and the quicker it's done, the quicker and greater the benefits will be. Another related application is the recent trend of generating new types of drugs solely on computers.
14
Distributed Computing
Complex financial modeling, weather forecasting, and geophysical exploration are on the radar screens of these vendors, as well as car crash and other complex simulations.
15
Distributed Computing
Faulty clients This isn't really an attack, but unintentional faults. Even if your client software is perfect, it has to run on hardware owned by the general public. Faulty hardware isn't unheard of. The "floating point co-processor" is a popular place to find hardware glitches. If your project depends on accurate calculation, you should build in checks to confirm that the client is operating correctly.
16
Distributed Computing
a "no". This also helps detecting a faulty client, once you have established which one is faulty. With this in place, for parasites to avoid detection, they would have to collaborate. This makes their job harder (but not impossible). For example, a collaboration protocol amongst parasites might work; "Did any other parasite do job #xyz?" "Yes, I did, and my randomly chosen checking total was 58." To better detect collaboration, always vary the pairings so that the same two people do not always check each other. Include quickly checkable proof of work done. If you could repeat every single completed work package yourself, you wouldn't need to enlist help. But what if you can quickly check that the client has completed the work package, by having it respond with proof where it came close. Say that you know that in any work package, a client would probably come close (say) eight times. A completed work package shows ten occurrences where indeed it did come close. This would show that most probably did the work. This method usually works where the project is a "search". Consider a search for large prime numbers. If a client tells the server that a given number turned out not to be prime at all, a simple check would be for the client to report a factor. The organizers can very quickly check this answer by performing a simple division. If that value is indeed a factor, it's proof that the larger number is indeed not-prime and you can strike it off the list. Don't distribute the source code. This is a controversial topic. So controversial, it's in it's own section. Only reward success.
17
Distributed Computing
If the project is of a type where work packages fall into two groups, unsuccessful ones and successful ones, and the only way you'll find out which group your work package falls into is to actually do the work, how about only recognizing those who you can be sure did the work? Instead of recognizing unsuccessful effort, only reward successful effort. Ignore anyone who is unlucky enough to be given a work package, which doesn't contribute to the effort, except perhaps marking this work package as "done". Take away the reward away and the motivation for parasitical behavior goes away? This strategy alone still leaves you open to spoiling attacks and faulty clients, and unless the actual rewards will be frequent, you may find this strategy will limit the number of helpers. This would be filed under social engineering rather than technical. The effectiveness of social engineering is debatable. Reject "impossibly fast" responses. Even modern computers have their limits. A parasite will want to churn though as many work packages as possible. Have a secret threshold, where any responses before that threshold will be thrown out, or at least doublechecked. For good measure, tell the client (if such a message is normal) that the work package was accepted, and also move the threshold around, to keep attackers guessing. Under this scheme, a parasite would instead have to pretend to be a large number of computers. Request enough work for a 100 computers to do in a day, wait a day, respond with forged completion responses, as if they were from 100 different computers. This is much more difficult to detect, especially with firewalls being commonplace (so many computers appear behind one IP address). Perhaps mark "unusually busy" groups with suspicion and have a higher proportion of their work packages double-checked.
18
Distributed Computing
Benefits
1. Performance Parallel computing a subset of distributed computing 2. Scalability 3. Resource sharing
Challenges
1. Heterogeneity 2. Latency Interactions between distributed processes have a higher latency
3. Memory Access 4. Remote memory access is not the same as local memory access
5.
Partial failure Applications need to adapt gracefully in the face of partial failure Lamport once defined a distributed system as One on which I cannot get any work done because some machine I have never heard of has crashed
Conclusion
1. This architectures are ideal for distributed business process composition (BPM) and generic distributing computing applications such as computeintensive scientific problems. 2. Coarse-grained component model leads to an extremely reliable, high performance and scalable platform for distributing computing.
19
Distributed Computing
20