Vous êtes sur la page 1sur 16

1

CS6601 DISTRIBUTED SYSTEMS

UNIT I

PART A

1. Define distributed systems?

A distributed system is a collection of independent computers that appears to its users as a single
coherent system. A distributed system is one in which components located at networked
communicate and coordinate their actions only by passing message.

2. List the characteristics of distributed system.


• Programs are executed concurrently
• There is no global time
• Components can fail independently

3. Mention the challenges in distributed system.


• Heterogeneity
• Openness
• Security
• Scalability
• Failure handling
• Concurrency
• Transparency

4. Define heterogeneity.
The Internet enables users to access services and run applications over a heterogeneous collection
of computers and networks. Heterogeneity (that is, variety and difference) applies to all of the
following:
• Networks;
• Computer Hardware;
• Operating Systems;
• Programming Languages;
• Implementations By Different Developers.

5. Why do we need openness?


The openness of a computer system is the characteristic that determines whether the system
can be extended and reimplemented in various ways. The openness of distributed systems is
determined primarily by the degree to which new resource-sharing services can be added and
be made available for use by a variety of client programs.
2

6.What are the types of transparencies?


Various transparencies types are as follows,
• Access transparency
• Location transparency
• Concurrency transparency
• Replication transparency
• Failure transparency
• Performance transparency
• Scaling transparency

7. Define transparencies.
Transparency is defined as the concealment from the user and the application programmer of
the separation of components in a distributed system, so that the system is perceived as a
whole rather than as a collection of independent components. The implications of
transparency are a major influence on the design of the system software.

8. What are the advantages of Distributed Systems?


• Performance
• Distribution
• Reliability (fault tolerance)
• Incremental growth
• Sharing of data/resources Communication

9. Define Middleware.
The term middleware applies to a software layer that provides a programming abstraction as well
as masking the heterogeneity of the underlying networks, hardware, operating systems and
programming languages. In addition to solving the problems of heterogeneity, middleware
provides a uniform computational model for use by the programmers of servers and distributed
applications.

10.Name five reasons why to build distributed system.


The reasons for building distributed systems are resource sharing, computation speed up,
reliability, communication etc.

11. What is the need for openness in Distributed System.

If the well-defined interfaces for a system are published, it is easier for developers to add new
features or replace sub-systems in the future. Example: Twitter and Facebook have API that allows
developers to develop their own software interactively.

12. Define Ubiquitous computing.


3

Ubiquitous networking, also known as pervasive networking, is the distribution of communications


infrastructure and wireless technologies throughout the environment to enable continuous
connectivity. That capacity is an essential component of pervasive computing.

13. Discuss the design issues in internet

• Performance issues
o Throughput
o Balancing computational loads
• Quality of service
• Use of casting and replication
• Dependency issues
• Fault tolerance
• Security

14. List the Limitations of distributed systems.

• Security problem due to sharing


• Some messages can be lost in the network system
• Bandwidth is another problem if there is large data then all network wires to be replaced
• which tends to become expensive
• Overloading is another problem in distributed operating systems
• If there is a database connected on local system and many users accessing that database
through remote or distributed way then performance become slow
• The databases in network operating is difficult to administrate then single user system

15. Give examples of distributed systems.


• Financial trading
• Massively multiplayer online games (MMOGs)
• Web search
• Network of workstations
• Automatic banking (teller machine) system
• Automotive system (a distributed real-time system)
• Distributed Real-Time Systems
• Synchronization of physical clocks
4

UNIT -II

16.State the advantages of overlay networks.

(1) Overlay networks allow both networking developers and application users to easily design
and implement their own communication environment and protocols on top of the Internet,
such as data routing and file sharing management.
(2) (2) Data routing in overlay networks can be very flexible, quickly detecting and avoiding
network congestions by adaptively selecting paths based on different metrics, such as probed
latency.
(3) (3) The end-nodes in overlay networks are highly connected to each other due to flexible
routing. As long as the physical network connections exist, one end-node can always
communicate to another end-node via overlay networks. Thus, scalability and robustness in
overlay networks are two attractive features.
(4) The high connectivity of increasingly more end-nodes to join overlay networks enables
effective sharing of a huge amount of information and resources available in the Internet.

17. Give the advantages in using name caches in the file system

• Better performance since repeated accesses to the same information is handled additional
network accesses and disk transfers. This is due to locality in file access patterns.
• It contributes to the scalability and reliability of the distributed file system since data can
be remotely cached on the client node.

18. Name some services and examples of Middleware.

• Services that can be regarded as middleware


• enterprise application integration,
• data integration,
• message oriented middleware (MOM),
• object request brokers (ORBs),
• enterprise service bus (ESB).

Examples of Middleware

• Database access technology - e.g ODBC (Open Data Base Connectors)


• Java’s database connectivity API : JDBC
• Remote computation products - e.g ONC RPC, OSF RPC and RMI (Java Remote Method
Invocation)
• Distributed Computing Environment (DCE) products, Common Object Request Broker
• Architecture (CORBA), Distributed Component Object Model (DCOM)
5

19. What is meant by inter process Communication?

Inter process communication (IPC) is a set of programming interfaces that allow a programmer to
Coordinate activities among different program processes that can run concurrently in an operating
system. This allows a program to handle many user requests at the same time.

20. What is the difference between RMI and RPC?

• RPC is language neutral while RMI is limited to Java.


• RPC is procedural like in C, but RMI is object oriented.
• RPC supports only primitive data types while RMI allows objects to be passed as
arguments and return values.
• When using RPC, programmer must split any compound objects to primitive data
types.
• RMI is easy to program that RPC.
• RMI is slower than RPC since RMI involves execution of java byte code.
• RMI allows usage of design patterns due to the object oriented nature while RPC does
not have this capability.

21.What is the role of Proxy server and mobile code.


proxy server:
proxy server is a dedicated computer or a software system running on a computer that acts as
an intermediary between an endpoint device, such as a computer, and another server from
which a user or client is requesting a service.
mobile code:
• The term mobile code is used to refer to program code that can be transferred
from one computer to another and run at the destination.
• Code suitable for running on one computer is not necessarily suitable for
running on another because executable programs are normally specific both to
the instruction set and to the host operating system.
Example: Java applets.
22. Define Datagram.
The term ‘datagram’ refers to the similarity of this delivery mode to the way in which letters
and telegrams are delivered. The essential feature of datagram networks is that the delivery of
each packet is a ‘one-shot ‘process; no setup is required, and once the packet is delivered the
network retains no information about it.

23.What is the use of UDP?


• UDP uses a simple connectionless transmission model with a minimum of protocol
mechanism.
• UDP provides checksums for data integrity,
6

• port numbers for addressing different functions at the source and destination of the
datagram.

24. What is the use of remote object references?


The RMI (Java Remote Method Invocation) system is a mechanism that enables an object on one
Java virtual machine to invoke methods on an object in another Java virtual machine. Any object
whose methods can be invoked in this way must implement the java.rmi. Remote interface.

25. What is meant by client server communication?


A client-server network is a central computer, also known as a server, which hosts data and other
forms of resources. Clients such as laptops and desktop computers contact the server and request to
use data or share its other resources with it.

26. What is meant by group communication?


• Group is a collection of processes that act together in some system or user specified way.
• The key property that all groups have that when a message is sent to the group; itself all the
members of the group receive it.
• It is a form of one to many communication

27. What is remote method invocation?

Remote Method Invocation (RMI) is an API which allows an object to invoke a method on an object
that exists in another address space, which could be on the same machine or on a remote machine.
Through RMI, object running in a JVM present on a computer (Client side) can invoke methods on
an object present in another JVM (Server side). RMI creates a public remote server object that
enables client and server side communications through simple method calls on the server object.

28. What is marshalling and unmarshalling?


Marshalling is the process of taking a collection of data items and assembling them into a form
suitable for transmission in a message. Unmarshalling is the process of disassembling them on arrival
to produce an equivalent collection of data items at the destination.

29. What is the concept of remote procedure call?


The concept of remote procedure call (RPC) represents a major intellectual breakthrough in
distributed computing, with the goal of making the programming of distributed systems look similar,
if not identical, to conventional programming – that is, achieving a high level of distribution
transparency
7

30. Define multicast communication.


Multicast is group communication where information is addressed to a group of destination
computers simultaneously.

UNIT-III

31.Distinguish between physical clock and logical clock.


Physical clock
A physical clock is a physical process coupled with a method of measuring that process to
record the passage of time. For instance, the rotation of the Earth measured in solar days is a physical
clock. Most physical clocks are based on cyclic processes (such as a celestial rotation).

Logical Clock

Logical clocks are useful in computation analysis, distributed algorithm design, individual
event tracking, and exploring computational progress.
Some noteworthy logical clock algorithms are:
•Lamport times tamps, which are monotonically increasing software counters.
•Vector clocks, that allow for partial ordering of events in a distributed system.
•Version vectors, order replicas, according to updates, in an optimistic replicated system.
•Matrix clocks, an extension of vector clocks that also contains information about other
processes' views of the system.

32. What is meant by directory services?

The directory services provide a mapping between text names for files and their UFIDs. Client may
obtain the UFIDs of a file by quoting its text name to the directory services. The directory services
provide the function needed to generate directories, to add new file name to directories and to obtain
UFIDs from directories. It is client of the flat file services; its directory is stored in files of the flat
services. When a hierarchic file-naming scheme is adopted as in UNIX, directories hold references to
other directories.

33. What is file replication?


In a file service that supports replication, a file may be represented by several copies of its contents at
different locations. This has two benefits-its enables multiple servers to share the load of providing a
service to clients accessing the same set of files, enhancing the scalability of the service, and it
enhances fault tolerance by enabling clients to locate another server that holds a copy of the file when
one has failed. Few file services support replication fully, but most support the catching of files or
portions of files locally, a limited form of replication.
8

34. Define indirect communication


Indirect communication is defined as communication between entities in a distributed system through
an intermediary with no direct coupling between the sender and the receiver(s).

35. Mention the Applications of publish-subscribe systems.

• financial information systems;


• other areas with live feeds of real-time data (including RSS feeds);
• support for cooperative working, where a number of participants need to be
• informed of events of shared interest;
• support for ubiquitous computing, including the management of events emanating
• from the ubiquitous infrastructure (for example, location events);
• a broad set of monitoring applications, including network monitoring in the Internet

36. What is a tuple space?


Tuple space is an implementation of the associative memory paradigm for parallel/distributed
computing. It provides a repository of tuples that can be accessed concurrently.

37. What is an IDL?


An interface description language or interface definition language (IDL), is a specification language
used to describe a software component's application programming interface (API).

38. What is stateful server and stateless server?


A stateful server maintains information about all clients that are utilizing the
server to access a file.
A stateless server maintains no client information. Each and every request from a client must
include very specific request information such as file name, operation.

39. What is LDAP?

LDAP is a protocol that runs over TCP/IP. The LDAP protocol standard includes low-level
network protocol definitions plus data representations and handling functionality. A directory
that is accessible through LDAP is commonly referred to as an LDAP directory.

40. Describe the characteristics of peer to peer system

1. Each computer contributes resources


2. All the nodes have the same functional capabilities and responsibilities
3. No centrally administered system
4. Offers a limited degree of anonymity
9

41.What is Tapstry?

Tapestry is a peer-to-peer overlay network which provides a distributed hash table, routing, and
multicasting infrastructure for distributed applications.The Tapestry peer-to-peer system offers
efficient, scalable, self-repairing, location-aware routing to nearby resources.

42. Give the types of routing overlays.

• Pastry
• Tapestry

43. What is peer to peer system?

A peer-to-peer (P2P) network is created when two or more PCs are connected and share resources
without going through a separate server computer. A P2P network can be an ad hoc connection—a
couple of computers connected via a Universal Serial Bus to transfer files.

44. What are the main tasks of the Routing Overlay?


Routing of Requests to Objects: A client wishing to perform some act upon a particular object must
send that that request, with the GUID attached, through the routing overlay
• Insertion of Objects: A node wishing to insert a new object, must compute a new GUID for
that object and announce it to that routing overlay such that that object is available to all nodes
• Deletion of Objects: When an object is deleted the routing overlay must make it unavailable
for other clients
• Node addition and removal: Nodes may join and leave the service at will. The routing
overlay must organize for new nodes to take over some of the responsibilities of other
(hopefully nearby) nodes.When a node leaves, the routing overlay must distribute its
responsibility to remaining nodes

45. What are Naming Services?


In a Distributed System, a Naming Service is a specific service whose aim is to provide a consistent
and uniform naming of resources, thus allowing other programs or services to localize them and
obtain the required metadata for interacting with them.

UNIT –IV

46.How will you make use of name space and DNS?


The naming system on which DNS is based is a hierarchical and logical tree structure called the
domain namespace . Organizations can also create private networks that are not visible on the
Internet, using their own domain namespaces. Figure shows part of the Internet domain namespace,
from the root domain and top-level Internet DNS domains, to the fictional DNS domain named
reskit.com that contains a host (computer) named Mfgserver.
10

47. Define nested transactions.


Several transactions may be started from within a transaction, allowing transactions to be
regarded as modules that can be composed as required.
The outermost transaction in a set of nested transactions is called the top-level transaction.
Transactions other than the top-level transaction are called sub transactions. For example, in Figure, T
is a top-level transaction that starts a pair of sub transactions, T1 and T2. The sub transaction T1
starts its own pair of sub transactions, T11 and T22. Also, sub transaction T2 starts its own sub
transaction, T21, which starts another sub transaction, T211.

48. What is clock drift rate?


• It is a rate at which a computer clock deviates from a perfect reference clock
• The crystal-based clocks used in computers are subject to clock drift, which means that they
count time at different rates, and so diverge
11

49.Define distributed Mutual exclusion

Each process accessing the shared data excludes all others from doing simultaneously called as
Mutual Exclusion.
50. Why clock synchronization is necessary?

It is necessary to coordinate independent clocks. Even when initially set accurately, real clocks will
differ after some amount of time due to clock drift, caused by clocks counting time at slightly
different rates. There are several problems that occur as a result of clock rate differences and several
solutions, some being more appropriate than others in certain contexts

51. What is file sharing?


File sharing is the practice of sharing or offering access to digital information or resources, including
documents, multimedia (audio/video), graphics, computer programs, images and e-books. It is the
private or public distribution of data or resources in a network with different levels of sharing
privileges.

File sharing can be done using several methods. The most common techniques for file storage,
distribution and transmission include the following:

Removable storage devices


Centralized file hosting server installations on networks
World Wide Web-oriented hyperlinked documents
Distributed peer-to-peer networks

52. Define happened before relation .

Happened-before relation (denoted: → {\display style \to \;} ) is a relation between the result of two
events, such that if one event should happen before another event, the result must reflect that, even if
those events are in reality executed out of order (usually to optimize program flow).

53. Define clock skew.


Computer clocks tend not to be in perfect agreement .The instantaneous difference between the
readings of any two clocks is called clock skew.

54. Define cut, consistent cut and inconsistent cut.


Cut is a subset of the global history that contains an initial prefix of each local state
A cut of the system’s execution is a subset of its global history that is a union of prefixes of
process histories:
12

A cut C is consistent if, for each event it contains, it also contains all the events that
happened-before that event:

Consider the events occurring at processes p1 and p2 shown in Figure. The figure shows two cuts,
one with frontier <e10,e20> and another with frontier <e12,e22>. The leftmost cut is inconsistent. This is
because at p2 it includes the receipt of the message m1, but at p1 it does not include the sending of
that message.

55. Define External synchronization.


In order to know at what time of day events occur at the processes in distributed system „– it
is necessary to synchronize the processes’ clocks, Ci, with an authoritative, external source of time.
This is external synchronization.

56. Define internal synchronization.


If the clocks Ci are synchronized with one another to a known degree of accuracy, then we
can measure the interval between two events occurring at different computers by appealing to their
local clocks, even though they are not necessarily synchronized to an external source of time. This is
internal synchronization.

57. When an object is considered to be a garbage?


An object is considered to be garbage if there are no longer any references to it anywhere in
the distributed system. The memory taken up by that object can be reclaimed once it is known to be
garbage.
13

58.What is distributed dead lock ?


A distributed deadlock occurs when each of a collection of processes waits for another process to
send it a message, and where there is a cycle in the graph of this ‘waits-for’ relationship.
Figure shows that processes p1 and p2 are each waiting for a message from the other, so this system
will never make progress

59. Define strict two phase locking.


A transaction that needs to read or write an object must be delayed until other transactions that wrote
the same object have committed or aborted. To enforce this rule, any locks applied during the
progress of a transaction are held until the transaction commits or aborts. This is called strict two
phase locking. The presence of the locks prevents other transactions reading or writing the objects.

60. Define Edge chasing.


A distributed approach to deadlock detection uses a technique called edge chasing or path pushing. In
this approach, the global wait-for graph is not constructed, but each of the servers involved has
knowledge about some of its edges.
The servers attempt to find cycles by forwarding messages called probes, which follow the edges of
the graph throughout the distributed system. A probe message consists of transaction wait for
relationships representing a path in the global wait-for graph.
14

UNIT V

61. What is load balancing approach?


Load balancing in grid is a technique which distributes the workloads across multiple computing
nodes to get optimal resource utilization, minimum time delay, maximize throughput and avoid
overload

62. Define the term thread.


A minimal software processor in whose context a series of instructions can be executed. Saving a
thread context implies stopping the current execution and saving all the data needed to continue the
execution at a later stage.

63. What is thrashing?


Thrashing is said to occur where the DSM runtime spends an inordinate amount of time invalidating
and transferring shared data compared with the time spent by application processes doing useful
work. It occurs when several processes compete for the same data item, or for falsely shared data
items.

64. What is threshold policy?


This policy selects a random node, checks whether the node is able to receive the process and then
transfers the process. If node rejects, another node is selected randomly. This continuous until probe
limit is reached

65. What is thread pool?


Thread pool consists of a number of threads in a pool where they await work. This is slightly faster to
service a request with an existing thread than create a new thread. This allows the number of threads
in the application to be bound to the size of the pool.

66. Write any two advantages of Process Migration


• Load sharing
• Communications performance
• Availability
• Utilizing special capabilities

67. List the issues in designing load balancing algorithms


• Load estimation policy
• Process transfer policy
• Location policy
o Threshold method
o Bidding method
o Pairing
• State information exchange policy

68. Write any two advantages of Process Migration


• Load sharing
• Communications performance
15

• Availability
• Utilizing special capabilities

69. What is process migration?


• Process migration is a specialized form of process management whereby processes are
moved from one computing environment to another.
• This originated in distributed computing, but is now used more widely.
• Process migration happens as a standard part of process scheduling, and it is quite easy to
migrate a process within a given machine, since most resources (memory, files, sockets) do
not need to be changed, only the execution context.

70. What is task assignment approach?

In task assignment approach, each process submitted by a user for processing is viewed as a
collection of related tasks and these tasks are scheduled to suitable nodes so as to improve
performance.

71. What are the sub activities involved in process migration?


1. Selection of process that should be migrated
2. Selection of destination node to which the selected process should be migrated
3. Actual transfer of the selected process to the destination node
4. Steps 1 and 2 are used in process migration policy and step 3 for process migration
mechanism.

72. What are types of process migration?


There are two types of process migration
• Non-preemptive process migration
• Preemptive process migration

non- preemptive process migration


Process migration that takes place before execution of the process starts. This type of process
migration is relatively cheap, since relatively little administrative overhead is involved.
preemptive process migration
Process migration whereby a process is preempted, migrated and continues processing in a different
execution environment. This type of process migration is relatively expensive, since it involves
recording, migration and recreation of the process state as well as the reconstructing of any
interprocess communication channels to which the migrating process is connected.

73. List out the thread packages.


There are two thread packages
Static thread: The choice of how many threads there will be is made when the program is written or
when it is compiled. Each thread is allocated a fixed stack. This approach is simple, but inflexible
Dynamic thread:It allows threads to be created and destroyed on-the-fly during execution
16

74. Define resource management


Resource manager are to control the assignment of resources to processes. Resources can be logical
(shared file) or physical (CPU).

75. What are the types of resource management?


There are three types
• Task assignment approach
• Load-balancing approach
• Load-sharing approach