Vous êtes sur la page 1sur 19

DISTRIBUTED COMPUTING

A Technical Seminar Report Submitted in partial fulfillment of the Requirement for the Award of Bachelor of Technology Degree in

INFORMATION TECHNOLOGY
By B.PRAVEEN KUMAR REDDY ROLL NO: 06S11A1225

MALLA REDDY INSTITUTE OF TECHNOLOGY AND SCIENCE


Maisammaguda,Dhulapally, Secunderabad - 500014

CONTENTS

1. ABSTRACT

.3

2. INTRODUCTION....4 3. DISTRIBUTED COMPUTING..5 4. PARALLEL COMPUTING7 5. DISTRIBUTED COMPUTER SYSTEM METRICS...9 6. ARCHITECTURE...9 7. HOW DOES DISTRIBUTED COMPUTING WORK?.........................11 8. FORMS OF COMMUNICATION..12 9. SECURITY ENFORCEMENT....13 10. SECURITY AND STANDARD CHALLENGES..14 11. TYPES OF APPLICATIONS.15 12. ADVANTAGES16 13. CONCLUSION ...17

ABSTRACT

You can define distributed computing in many different ways. Various vendors have created and marketed distributed computing systems for years, and have developed numerous initiatives and architectures to permit distributed processing of data and objects across a network of connected systems. One flavor of distributed computing has received a lot of attention lately, and it will be a primary focus of this story--an environment where you can harness idle CPU cycles and storage space of tens, hundreds, or thousands of networked systems to work together on a particularly processing-intensive problem. The growth of such processing models has been limited, however, due to a lack of compelling applications and by bandwidth bottlenecks, combined with significant security, management, and standardization challenges.Distributed computing offers researchers the potential of solving complex problems using many dispersed machines. The result is faster computation at potentially lower costs when compared to the use of dedicated resources. The term Distributed Computation has been used to describe the use of distributed computing for the sake of raw computation rather than say, remote file sharing, storage or information retrieval. Distributed computing also often involves competition with other distributed systems. This competition may be for prestige, or it may be a means of enticing users to donate processing power to a specific project. This differs from cluster computing in that computers in a distributed computing environment are typically not exclusively running 'group' tasks, whereas clustered computers are usually much more tightly coupled. The difference makes distributed computing attractive because, when properly configured, it can use computational resources that would otherwise be unused. This paper examines what makes people to turn to distributed computing.

INTRODUCTION

A process can be run faster by being divided into subtasks (threads) that are run on two or more interconnected computers in parallel. The more computers, the more inter-dependant processes can be run at the same time. Although cluster based computing has made very high speed processing possible with a small budget, there are computational problems that require such extensive processing needs that there is no reasonable way to fund the project using dedicated machines. Grid computing hopes to solve budget and infrastructure constraints by using thousands or even millions of networked computers spare CPU time. When these computers are not in use or operating under capacity, they can allow big problems to be solved in small pieces. Distributed Computation offers researchers an opportunity to distribute the task of solving complex problems onto hundreds and in many cases thousands of Internet connected machines. Although, the network is itself distributed, the research and end user participants form a loosely bound partnership. The resulting partnership is not unlike a team or community. People band together to create and connect the resources required for the achievement of a common goal. A fascinating aspect of this continues to be humanitys willingness to transcend cultural barriers.

DISTRIBUTED COMPUTING:
Distributed computing is becoming an ever more common methodology to solve highly complex computing problems that would traditionally be solved using a supercomputer. It is used to more quickly and/or efficiently process information using available resources. Using a distributed operating system a collection of computers can be interconnected through a network into a cluster. It is based on the concept that most CPUs are not fully utilized and can be used to run tasks sent to them. Distributed computing differs from cluster computing, as in a 'Beowulf Cluster', in that machines in a distributed network are not dedicated to the tasks sent to them.

Factors which are causing the trend to distributed computing:


1. Low Cost Processors: - Minicomputers and microcomputers have dropped in

cost to a level where most end-user groups can afford one.


2. Better performance over phone lines: - Due to low bandwidth and high cost

of phone lines comparatively distributed system are generally preferred.


3. Remote database access: - Data bases maintained at one location are valuable

to users at other locations. It is desirable that programs used at other location should employ them.
4. Security: - Distributed systems may give better security and reliability because

they avoid putting all the eggs in one basket.


5. Centralization and Decentralization: - Distributed processing makes it

possible to build systems which to a large extent can achieve the advantages of both.

COUNTER COUNTER 1

COUNTER 2

Pleas e

Fig:distributed computing

Distributed Computing - The Poor Man's Supercomputer:


Distributed computing can be defined in many different ways. In most general terms it is a system to permit distributed processing of data and objects across a network of connected systems through the sharing of resources on that system. Distributed computing can encompass desktop PCs, powerful workstations, servers, and even mainframes and supercomputers interconnected through a network. Many scholastic, entrepreneurial and government efforts have developed numerous initiatives and architectures taking advantage of the power inherent in distributed computing.

PARALLEL COMPUTING Distributed Programming:


The terms 'parallel processing', 'parallelization' or 'distributed programming' all refer to the system where a complex task is broken up into many subtasks that are to be run in parallel. Each subtask is then assigned to a CPU on the network and the results are combined. Distributed programming uses a collection of computers connected over a network to solve a single problem. Programming multi-computers requires models which are different from normal systems. The programmer must be able to transfer data between different parts of the program through a shared memory space and to coordinate efforts through an inter-process communications system capable of communication between interconnected CPUs.

Distributed programs achieve the following:


a) Increased processing speed by using more than one computer at a time. b) Potential

for improved reliability when additional computers can

compensate for the failure of one.


c) Allowance for some problems, like remote data acquisition, to succeed in a

distributed environment. To run a distributed application, there are several issues that will need to be addressed. To begin, it must be possible to start processes on remote computers and the necessary data for these processes must be provided before they can do any work. Some mechanism for synchronizing these processes, such as 'inter-process Semaphores', should be available, so that they know when to access the data and produce any results. Starting a program on another computer is not very hard using programs like 'telnet' or 'rsh'. Exchanging data and synchronizing, however, can be quite difficult and complicated. These problems can distract the programmer from his original project and can be the source of numerous bugs. Linux already has some mechanisms for processes in the same computer to exchange data and synchronize between them. This is called Inter-Process Communication (IPC). One prominent example is the System V IPC, first introduced in AT&T's System V UNIX. Distributed computing first used machines connected in a finite physical network. These are PCs similar in both hardware and software. In order to solve massive computational problems most networks are not big enough. Grid computing is the answer to this problem.

DISTRIBUTED COMPUTER SYSTEM METRICS:


Latency network delay before any data is sent. Bandwidth maximum channel capacity (analogue communication Hz, digital

communication bps).
Granularity relative size of units of processing required. Distributed systems

operate best with coarse grain granularity because of the slow communication compared to processing speed in general.
Processor speed MIPS, FLOPS. Reliability ability to continue operating correctly for a given time. Fault tolerance resilience to partial system failure. Security policy to deal with threats to the communication or processing of data

in a system.
Administrative/management domains issues concerning the ownership and

access to distributed systems components.

ARCHITECTURE:
The simplest organization of distributed systems is the client-server model. It is to have only two types of machines: 1. A client machine containing only the programs implementing the userinterface level. 2. A server machine containing the rest, that is the programs implementing the processing and data level. Two kinds of architectures are possible with client-server organization. 1. Multi tiered Architectures 2. Modern Architectures
9

Multi tiered Architectures:


One approach for organizing clients and servers is to distribute the programs in the application layers across different machines leading to what is also referred to as a two-tiered architecture. It may sometimes happen that a server needs to act as a client, leading to a (physically) three-tiered architecture. In this architecture, programs that form a part of the processing level reside on a separate server, but may additionally be partly distributed across the client and server machines. A typical example of where a threetiered architecture is used is transaction processing. In this case, a separate process, called the transaction monitor, coordinates all transactions across possibly different data servers.

1.

Modern Architectures:

In modern architectures, it is often the distribution of the clients and servers that counts, which we refer to as horizontal distribution. In this type of distribution, a client or a server may be physically split up into logically equivalent parts, but each part is operating on its own share of the complete data set, thus balancing the load. Modern distributed systems are generally built by means of an additional layer of software on top of a network operating system. This layer , called middle ware, is designed to hide the heterogeneity and distributed nature of the underlying collection of computers.

10

HOW DOES DISTRIBUTED COMPUTING WORK


The two main entities in distributed computing are the server and the many clients. A central computer, the server, will generate work packages, which are passed onto worker clients. The clients will perform the task, detailed in the work package data, and when it has finished, the completed work package will be passed back to the server. This is the minimal distributed computing model. A popular model is the three-tier model, which splits the server task into two layers. One to generate work packages and another to act as a broker, communicating with clients. This is simply a means to manage load and protect against failure. So the server knows that a potential client exists, we need another message in addition to work packages and completed work packages, initiated by the client. A work

11

request

simply

says,

"Give

me

work

package".

Fig:3-tier system FORMS OF COMMUNICATION


Various forms are available for intercommunication in distributed systems: RPC s (Remote procedure calls) and RMI s (Remote method invocation) offer synchronous communication facilities by which a client until the server has sent a reply. It turns out that general-purpose, high-level message oriented models are often more convenient.In message-oriented models, the issues are whether are not communication is persistent, and whether or not communication is synchronous. The
12

essence of persistent communication is that a message that is submitted for transmission is stored by the communication system as long as it takes to deliver it. Message-oriented middleware models generally offer persistent asynchronous communication, and are used where RPCs and RMIs are not appropriate. They are primarily used to assist the integration of collection of databases into large-scale information systems. Other applications include e-mail and workflow. A completely different form of communication is that of streaming, in which the issue is whether or not two successive messages have a temporal relationship. In continuous data streams, a maximum end-to-end delay is specified for each message. In addition, it is also required that messages are sent subject to minimum end-to-end delay. Typical examples of such continuous data streams are video and audio streams. Though various modes of communication are available, message-oriented model turns out to be the optimal form of communication. The Operating systems commonly used for distributed computing systems can be broadly classified into two types: network operating systems and distributed operating systems. As compared to network operating systems, a distributed operating systems shows how better transparency and fault tolerance capability and provides the image of a virtual uni processor to the users.

The main issues involved in the design of a distributed operating system are transparency, reliability, flexibility, performance, scalability, heterogeneity. SECURITY ENFORCEMENT-KEY DESIGN ISSUES
Security enforcement can be taken care of during the design of distributed system itself. Three important design issues to be considered in this context are:

13

1. First design issue is whether to use only a symmetric cryptosystem or to

combine it with a public key system. Current practice shows the use of publickey cryptography for distributing short-term shared secret keys. 2. The second issue in secure distributed system is access control, or authorization. Authorization deals with protecting resources in such a way only processes that have proper access rights can actually access and those resources. Access control always takes place after a process has been authenticated. 3. The third and final issue in secure distributed systems concerns management and especially about key management and authorization management. Key management includes the distribution of cryptographic keys, for which certificates as issued by trusted third parties play an important role. Important with respect to authorization management are attribute certificates and delegation. Kerberos is a widely-used security system based on shared secret keys. Special attention is often paid to anonymity of a customer, as this distinguishes traditional cash-based systems from their electronic counterpart.

SECURITY AND STANDARD CHALLENGES


The major challenges come with increasing scale. As soon as you move outside of a corporate firewall, security and standardization challenges become quite significant. Most of today's vendors currently specialize in applications that stop at the corporate firewall, though Avaki, in particular, is staking out the global grid territory. Beyond spanning firewalls with a single platform, lies the challenge of spanning multiple firewalls and platforms, which means standards.

14

Most of the current platforms offer high level encryption such as Triple DES. The application packages that are sent to PCs are digitally signed, to make sure a rogue application does not infiltrate a system. Avaki comes with its own PKI (public key infrastructure). Identical application packages are typically sent to multiple PCs and the results of each are compared. Any set of results that differs from the rest becomes security suspect. Even with encryption, data can still be snooped when the process is running in the client's memory, so most platforms create application data chunks that are so small, that it is unlikely snooping them will provide useful information. Avaki claims that it integrates easily with different existing security infrastructures and can facilitate the communications among them, but this is obviously a challenge for global distributed computing. Working out standards for communications among platforms is part of the typical chaos that occurs early in any relatively new technology. In the generalized peer-topeer realm lies the Peer-to-Peer Working Group, started by Intel, which is looking to devise standards for communications among many different types of peer-to-peer platforms, including those that are used for edge services and collaboration. The Global Grid Forum is a collection of about 200 companies looking to devise grid computing standards. Then you have vendor-specific efforts such as Sun's Open Source JXTA platform, which provides a collection of protocols and services that allows peers to advertise themselves to and communicate with each other securely. JXTA has a lot in common with JINI, but is not Java specific (thought the first version is Java based). Intel recently released its own peer-to-peer middleware, the Intel Peer-to-Peer Accelerator Kit for Microsoft . Net, also designed for discovery, and based on the Microsoft.Net platform.
15

TYPES OF APPLICATIONS:
The following scenarios are examples of types of application tasks that can be set up to take advantage of distributed computing.

A query search against a huge database that can be split across lots of desktops, with the submitted query running concurrently against each fragment on each desktop.

Exhaustive search techniques that require searching through a huge number of results to find solutions to a problem also make sense. Drug screening is a prime example.

Complex modeling and simulation techniques that increase the accuracy of results by increasing the number of random trials would also be appropriate, as trials could be run concurrently on many desktops, and combined to achieve greater statistical.

Complex financial modeling, weather forecasting, and geophysical exploration are on the radar screens of the vendors, as well as car crash and other complex simulations.

Many of today's vendors are aiming squarely at the life sciences market, which has a sudden need for massive computing power. Pharmaceutical firms have repositories of millions of different molecules and compounds, some of which may have characteristics that make them appropriate for inhibiting newly found proteins. The process of matching all these ligands to their appropriate targets is an ideal task for distributed computing, and the quicker it's done, the quicker and greater the benefits will be.

16

ADVANTAGES
Distributed computing has been proposed for various reasons ranging from organizational decentralization to economical processing to greater economy.
a) Management of distributed data with different levels of transparency. b) Distribution or network transparency. c) Replication transparency. d) Increased reliability and availability. e) Increased performance.

CONCLUSION

Scalability is also a great advantage of distributed computing. Though they provide massive processing power, super computers are typically not very scalable once they're installed. A distributed computing installation is infinitely scalable--simply add more systems to the environment. In a corporate distributed computing setting, systems might be added within or beyond the corporate firewall. The inabilities to adequately address massive processing and data volume issues have always hampered the potential of computer science. No matter how fast a CPU is or the data
17

throughput rate, our imaginations come up with new applications that exceed the existing technology or budget. For today, however, the specific promise of distributed computing lies mostly in harnessing the system resources that lies within the firewall. It will take years before the systems on the Net will be sharing compute resources as effortlessly as they can share information.

REFERENCES

BOOKS: 1. Distributed Systems by Andrew S. Tanenbaum, Maarten Van Steen.


2.

Distributed Operating Systems by Andrew S. Tanenbaum.

18

WEBSITES: 1.www.intel.com 2.http://www.onlamp.com/pub/a/onlamp/2003/05/22/distributed.htlml

PAPERS/SEMINARS: 1. www.computer.howstuffworks.com 2. http://www.bacchae.co.uk/docs/dist.html


3.

Paper on Distributed Computing by University of Udine, Italy

19

Vous aimerez peut-être aussi