Seminar 1

Distributed Computing
Introduction
Say you've got a big computation task to perform. Perhaps you have found a way to cure cancer, or you want to look for aliens. All you need is a few super computers to work out some calculations, but you've only got the one PC on your desk. What to do? A popular solution is the "distributed computing" model, where the task is split up into smaller chunks and performed by the many computers owned by the general public. This guide shows you how. Computers spend a lot of their time doing nothing. If you are reading this on your computer, that expensive CPU is most probably just sitting around waiting for you to press a key. What a waste! With the recent popularity of the Internet, distributed computing has become popular. The technology behind distributed computing is old, where it is usually known as parallel computing. When people speak of parallel computing, it is usually in the context of a local group of computers, owned by the same person or organization, with good links between nodes. The key issue here is that you are using computing power that you don't own. These computers are owned and controlled by other people, who you would not necessarily trust. Both angels and demons populate the world, and unless you know them personally, you can't tell them apart.
General Defination:
Distributed computing is any computing that involves multiple computers remote from each other that each has a role in a computation problem and/or information processing.
Defination
"Distributed Computing is a vague term incorporating the overlapping fields of: 1. Client/Server computing 2. Internet computing 3. Geographical distribution of computing over a wide area 4. Network peer-to-peer computing 5. Co-operative computing between workstations on a local area network
Distributed Computing Environment Architecture

The Distributed Computing Environment (DCE) is an integrated distributed environment, which incorporates technology from industry. The DCE is a set of integrated system services that provide an interoperable and flexible distributed environment with the primary goal of solving interoperability problems in heterogeneous, networked environments. The DCE infrastructure supports the construction and integration of client/server applications while attempting to hide the inherent complexity of the distributed processing from the user. The DCE is intended to form a comprehensive software platform on which distributed applications can be built, executed, and maintained. The DCE architecture is shown in Figure.
DistributedComputingEnvironmentArchitecture
DCE services are organized into two categories:

1. Fundamental distributed services provide tools for software developers to create the end-user services needed for distributed computing. They include
o
Remote Procedure Call, which provides portability, network independence, and secure distributed applications. Directory services, which provide full X.500 support and a single naming, model to allow programmers and maintainers to identify and access distributed resources more easily. Time service, which provides a mechanism to monitor and track, clocks in a distributed environment and accurate time stamps to reduce the load on system administrator. Security service, which provides the network with
authentication, authorization, and user account management services to maintain the integrity, privacy, and authenticity of the distributed system.
o
Thread
service,
which
provides
simple,
portable,
programming model for building concurrent applications. 2. Data-sharing services provide end users with capabilities built upon the fundamental distributed services. These services require no programming on the part of the end user and facilitate better use of information. They include
o
Distributed file system, which interoperates with the network file system to provide a high-performance, scalable, and secure file access system. Diskless support, which allows low-cost workstations to use disks on servers, possibly reducing the need/cost for local disks, and provides performance enhancements to reduce network overhead.
The DCE supports International Open Systems Interconnect (OSI) standards, which are critical to global interconnectivity. It also implements ISO standards
such as Remote Operations Service Element (ROSE), Association Control Service Element (ACSE), and the ISO session and presentation services.
How It Works
In most cases today, a distributed computing architecture consists of very lightweight software agents installed on a number of client systems, and one or more dedicated distributed computing management servers. There may also be requesting clients with software that allows them to submit jobs along with lists of their required resources. An agent running on a processing client detects when the system is idle, notifies the management server that the system is available for processing, and usually requests an application package. The client then receives an application package from the server and runs the software when it has spare CPU cycles, and sends the results back to the server. The application may run as a screen saver, or simply in the background, without impacting normal use of the computer. If the user of the client system needs to run his own applications at any time, control is immediately returned, and processing of the distributed application package ends. This must be essentially instantaneous, as any delay in returning control will probably be unacceptable to the user. The following steps, labeled in the components diagram, show the interaction between the different components. 1. When the Client is idle, it sends a request for an Application package to the Network Manager. The application package consists of an Application Manager process and the application for which sub-jobs are to be run. 2. The Network Manager sends an application package to the client machine. 3. The Client s/w runs the Application Manager, and Application Manager registers itself with the Job Manager telling it that it is available to run subjobs for the corresponding application.
Interaction between components
4.The Job Manager schedules a sub-job to be run on the client, and sends the Client the sub-gob parameters and input files. 5.The Application Manager on the Client runs the sub-jobs with the corresponding inputs. 6.When the application has finished, the results from the sub-jobs are delivered to the Job Manager. The Job Manager can continue to schedule sub-jobs on the Client for that application until it no longer needs the client and release it back to the Network Manager, of until the Network Manager reclaims the client for use elsewhere.
Distributed Computing Management Server

The servers have several roles. They take distributed computing requests and divide their large processing tasks into smaller tasks that can run on individual desktop systems (though sometimes this is done by a requesting system). They send application packages and some client management software to the idle client machines that request them. They monitor the status of the jobs being run by the clients. After the client machines run those packages, they assemble the results sent back by the client and structure them for presentation, usually with the help of a database. If the server doesn't hear from a processing client for a certain period of time, possibly because the user has disconnected his system and gone on a business trip, or simply because he's using his system heavily for long periods, it may send the same application package to another idle system. Alternatively, it may have already sent out the package to several systems at once, assuming that one or more sets of results will be returned quickly. The server is also likely to manage any security, policy, or other management functions as necessary, including handling dialup users whose connections and IP addresses are inconsistent. Obviously the complexity of a distributed computing architecture increases with the size and type of environment. A larger environment that includes multiple departments, partners, or participants across the Web requires complex resource identification, policy management, authentication, encryption, and secure sand boxing functionality. Resource identification is necessary to define the level of processing power, memory, and storage each system can contribute. Policy management is used to varying degrees in different types of distributed computing environments. Administrators or others with rights can define which jobs and users get access to which systems, and who gets priority in various situations based on rank, deadlines, and the perceived importance of each project. Obviously, robust authentication, encryption, and sand boxing
are necessary to prevent unauthorized access to systems and data within distributed systems that are meant to be inaccessible.
Distributed System Topologies

Component-based network applications map naturally to business processes that involve an exchange of information among applications running across computer networks. Selection of appropriate system topology is fundamental to the software infrastructure platform enabling such distributed applications. While a distributed application only involves the flow of data, a distributed infrastructure platform needs to support both control and data flow. Control flow can be looked upon as a special flow of packets that enable, regulate and monitor data flow. In the following sections, we compare the organization of various distributed software system topologies with respect to flow of control and data. Centralized Systems Centralized systems form the most popular system topology, typically seen as the client/server pattern. All function and information is centralized on a single server (sometimes referred to as the hub), with many clients (the spokes) connecting directly to the server to send and receive information. Both control flow and data flow take place through the central server. The primary advantage of centralized systems is their simplicity. Because all data is concentrated in one place, centralized systems are easily managed and have no questions of data consistency or coherence. Centralized systems are also relatively easy to secure, since there is only one host to be protected.
The drawback of centralization is that all information resides at the hub. The hub is thus a single point of failure, since if the hub dies then all client applications connected to the hub also die. The hub is also a bottleneck to scalability and performance. While one can introduce redundant hardware and employ better or faster hardware at the hub, this only alleviates the problem and does not solve it completely. Even though the hub-and-spoke architecture has found widespread acceptance in database servers and webservers, the drawbacks of scalability and fault-tolerance make it unsuitable for general purpose distributed application deployment. Examples of systems conforming to this centralized topology include J2EE servers and most commercially available web-servers and transaction processing monitors, including Microsofts MTS. Pure Peer-to-Peer Systems
Figure P2P systems
A primary virtue of pure P2P systems is their scalability; any node can join a network and start exchanging data with any other node. Decentralized systems also tend to be fault tolerant, as the failure or shutdown of any particular node does not impact the rest of the system.
Hybrid Peer-to-Peer Systems In a hybrid peer-to-peer system, the control information is exchanged through a central server, while data flow takes place in a pure peer-to-peer manner as above. This architecture alleviates the manageability problems of pure P2P systems. The control server acts as a monitoring agent for all the other peers and ensures information coherence.
Figure Hybrid P2P systems The drawbacks associated with control being centrally managed still remain. If the central server goes down, the system looses ability to affect changes in data flow. However, existing applications are not affected by a failure of the central server as the data flow between nodes continues regardless of whether the central server is functional or not. Peer-to-peer data routing, allows the Hybrid system to offer better scalability than a centralized system; but hybrid systems still suffer from scalability problems for control information that flows through a single node. While Hybrid systems are being
10
effectively used for mission critical applications, the solutions are limited to solve relatively small-scale problems only. An example of a commercial hybrid P2P system is Groove. Groove implements collaborative project management software in which a central synchronizing server controls all information being exchanged between peers. Super-Peer Architecture A new wave of peer-to-peer systems is advancing an architecture of centralized topology embedded in decentralized systems; such topology forms a super-peer network.
Figure super-peer systems
11
Next Generation Distributed Computing Architecture Combining the Super-Peer topology with the Coarse-grained Component model [1] enables a distributed computing platform for a whole new generation of distributed applications which are more flexible, scalable, and reliable than traditional applications.
Figure The super peer architecture The super peer architecture closely maps to real world business processes. Each cluster maps to a business division. Super peers can have well defined protocols for cross cluster communication (acting as firewall for this virtual internet). The adjoining figure illustrates a 2-redundant super peer architecture
12
that alleviates the bottlenecks associated with a super peer being a single point of failure for its clients. In the following sections, we examine a real-world problem that represents a typical business process and discuss the implementation of this process over multiple software infrastructure system topologies.
Distributed Computing Application Characteristics

Obviously not all applications are suitable for distributed computing. The closer an application gets to running in real time, the less appropriate it is. Even processing tasks that normally take an hour are two may not derive much benefit if the communications among distributed systems and the constantly changing availability of processing clients becomes a bottleneck. Instead you should think in terms of tasks that take hours, days, weeks, and months. Generally the most appropriate applications consist of "loosely coupled, nonsequential tasks in batch processes with a high compute-to-data ratio." The high compute to data ratio goes hand-in-hand with a high compute-tocommunications ratio, as you don't want to bog down the network by sending large amounts of data to each client, though in some cases you can do so during off hours. Programs with large databases that can be easily parsed for distribution are very appropriate. Clearly, any application with individual tasks that need access to huge data sets will be more appropriate for larger systems than individual PCs. If terabytes of data are involved, a supercomputer makes sense as communications can take place across the system's very high-speed back plane without bogging down the network. Server and other dedicated system clusters will be more appropriate for other slightly less data intensive applications. For a distributed application using numerous PCs, the required data should fit very comfortably in the PC's memory, with lots of room to spare. Taking this further, United Devices recommends that the application should have the capability to fully exploit "coarse-grained parallelism," meaning it should be possible to partition the application into independent tasks or processes that can be computed concurrently. For most solutions there should not be any need for communication between the tasks except at task boundaries, though Data allows some interprocess communications. The tasks
13
and small blocks of data should be such that they can be processed effectively on a modern PC and report results that, when combined with other PC's results, produce coherent output. And the individual tasks should be small enough to produce a result on these systems within a few hours to a few days.
Types of Distributed Computing Applications

The following scenarios are examples of other types of application tasks that can be set up to take advantage of distributed computing.
A query search against a huge database that can be split across lots of desktops, with the submitted query running concurrently against each fragment on each desktop.
Complex modeling and simulation techniques that increase the accuracy of results by increasing the number of random trials would also be appropriate, as trials could be run concurrently on many desktops, and combined to achieve greater statistical significance (this is a common method used in various types of financial risk analysis).
Exhaustive search techniques that require searching through a huge number of results to find solutions to a problem also make sense. Drug screening is a prime example.
Many of today's vendors, particularly Entropia and United Devices, are aiming squarely at the life sciences market, which has a sudden need for massive computing power. As a result of sequencing the human genome, the number of identifiable biological targets for today's drugs is expected to increase from about 500 to about 10,000. Pharmaceutical firms have repositories of millions of different molecules and compounds, some of which may have characteristics that make them appropriate for inhibiting newly found proteins. The process of matching all these "ligands" to their appropriate targets is an ideal task for distributed computing, and the quicker it's done, the quicker and greater the benefits will be. Another related application is the recent trend of generating new types of drugs solely on computers.
14
Complex financial modeling, weather forecasting, and geophysical exploration are on the radar screens of these vendors, as well as car crash and other complex simulations.
Attacks against distributed computing system

There are three types of attack to consider. Parasites, spoilers and faulty clients. Parasites People running distributed computing systems will often have some sort of reward for the people who help out. These are often simple pleasures as karma and fame. At present, people tend to help distributed projects because they want to help out. Most offer a chart of the top helpers as a motivation. In future, we may start seeing cash payments to helpers, persuading the public to give their spare time to this project rather than the other. However small, there is often a reward for doing the work, and then there is the motivation for parasites to get the reward without having to do the work. To stop a parasite, you need to make sure they did the work correctly. Spoiler attacks As well as parasites, who want the reward, without doing the work, there are those people who want to stop your project from succeeding. These are commonly known as spoiler attacks. When designing your security checks, consider that there may be someone whose only motivation is bragging rights for having broken your project. ("I'm a great haxor, worship me!") Again, you need to make sure that when you receive your completed work package, it represents actual work done. In the case of "search" projects, so long as you have adequate parasite protection, a defence against spoilers is that it's statistically unlikely that the attacker will be lucky enough to be given the opportunity to hide a successful search.
15
Faulty clients This isn't really an attack, but unintentional faults. Even if your client software is perfect, it has to run on hardware owned by the general public. Faulty hardware isn't unheard of. The "floating point co-processor" is a popular place to find hardware glitches. If your project depends on accurate calculation, you should build in checks to confirm that the client is operating correctly.
How to stop parasites, spoilers and faulty clients?

When a client helps out with a distributed computing project, there is usually no contractual arrangement, and clients typically select themselves without their trustworthiness or reliability being checked. Randomly repeat a job. This is the simplest way. Routinely send the same job to a different client and check that they come back with the same answer. The problem is that where you have a genuine and accurate client, it wastes time. Reduce the number of retries, and risk a parasite getting though unchecked. Increase checks and waste time. There is the additional problem that you have no way of knowing if the client you give a checking job to isn't also a parasite. If a copy of a parasite client becomes widespread, then the risk of this happening increases. Having said that, this technique is simple, and easily implemented. It can even be used in conjunction with other schemes. To help this process along, instead of a simple "yes/no" response, have the client include (say) a totaled up calculation, even when the result was overall,
16
a "no". This also helps detecting a faulty client, once you have established which one is faulty. With this in place, for parasites to avoid detection, they would have to collaborate. This makes their job harder (but not impossible). For example, a collaboration protocol amongst parasites might work; "Did any other parasite do job #xyz?" "Yes, I did, and my randomly chosen checking total was 58." To better detect collaboration, always vary the pairings so that the same two people do not always check each other. Include quickly checkable proof of work done. If you could repeat every single completed work package yourself, you wouldn't need to enlist help. But what if you can quickly check that the client has completed the work package, by having it respond with proof where it came close. Say that you know that in any work package, a client would probably come close (say) eight times. A completed work package shows ten occurrences where indeed it did come close. This would show that most probably did the work. This method usually works where the project is a "search". Consider a search for large prime numbers. If a client tells the server that a given number turned out not to be prime at all, a simple check would be for the client to report a factor. The organizers can very quickly check this answer by performing a simple division. If that value is indeed a factor, it's proof that the larger number is indeed not-prime and you can strike it off the list. Don't distribute the source code. This is a controversial topic. So controversial, it's in it's own section. Only reward success.
17
If the project is of a type where work packages fall into two groups, unsuccessful ones and successful ones, and the only way you'll find out which group your work package falls into is to actually do the work, how about only recognizing those who you can be sure did the work? Instead of recognizing unsuccessful effort, only reward successful effort. Ignore anyone who is unlucky enough to be given a work package, which doesn't contribute to the effort, except perhaps marking this work package as "done". Take away the reward away and the motivation for parasitical behavior goes away? This strategy alone still leaves you open to spoiling attacks and faulty clients, and unless the actual rewards will be frequent, you may find this strategy will limit the number of helpers. This would be filed under social engineering rather than technical. The effectiveness of social engineering is debatable. Reject "impossibly fast" responses. Even modern computers have their limits. A parasite will want to churn though as many work packages as possible. Have a secret threshold, where any responses before that threshold will be thrown out, or at least doublechecked. For good measure, tell the client (if such a message is normal) that the work package was accepted, and also move the threshold around, to keep attackers guessing. Under this scheme, a parasite would instead have to pretend to be a large number of computers. Request enough work for a 100 computers to do in a day, wait a day, respond with forged completion responses, as if they were from 100 different computers. This is much more difficult to detect, especially with firewalls being commonplace (so many computers appear behind one IP address). Perhaps mark "unusually busy" groups with suspicion and have a higher proportion of their work packages double-checked.
18
Benefits
1. Performance Parallel computing a subset of distributed computing 2. Scalability 3. Resource sharing
Challenges
1. Heterogeneity 2. Latency Interactions between distributed processes have a higher latency
3. Memory Access 4. Remote memory access is not the same as local memory access
Synchronization Concurrent interactions the norm
5.
Partial failure Applications need to adapt gracefully in the face of partial failure Lamport once defined a distributed system as One on which I cannot get any work done because some machine I have never heard of has crashed
Conclusion
1. This architectures are ideal for distributed business process composition (BPM) and generic distributing computing applications such as computeintensive scientific problems. 2. Coarse-grained component model leads to an extremely reliable, high performance and scalable platform for distributing computing.
19
20

Seminar 1

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Seminar 1

Transféré par

Droits d'auteur :

Formats disponibles

Distributed Computing

Distributed Computing Environment Architecture

DCE services are organized into two categories:

Interaction between components

Distributed Computing Management Server

Distributed System Topologies

Figure P2P systems

Figure super-peer systems

Distributed Computing Application Characteristics

Types of Distributed Computing Applications

Attacks against distributed computing system

How to stop parasites, spoilers and faulty clients?

Synchronization Concurrent interactions the norm

Vous aimerez peut-être aussi