Distribuidos Chapter I Introduction PDF

Chapter I: Introduction to
Distributed System
Sistemas Distribuidos
Carrera de Ingeniería de Sistemas
Universidad Politécnica Salesiana
Based on Distributed System of UPM
Original Author: Sergio Arévalo
Rodrigo Tufiño
Mayo 2016
Contents
1. Motivation
2. Distributed abstractions
3. Examples of distributed applications
4. Model
Bibliography
Introduction to Reliable Distributed Programming .
Rachid Gerraoui, Luis Rodrigues. Springer-Verlag
2006. Chpts. 1 & 2.
2
Motivation
• Distributed computing has to do with algorithms

for a set of processes that cooperate.
• Besides some of the processes of the distributed

algorithm might stop by crashing while others
might stay alive and keep operating.
• This differentiates a distributed system from a

concurrent system
3
Motivation (cont.)
• The challenge is for the processes that are still alive.
• They must continue cooperating among them in a

consistent way in spite of the failure of the other
processes.
• The process cooperation must tolerate failures.
• The communication asynchrony and communication

link failures makes very difficult this cooperation.
4
Process cooperation
Motivation
•The most common processes cooperation is

client-server: Client
Server
•Tolerating failures would mean that:

• if the server fails, the client should do the request
to another server.
• If some clients fail, the server should continue
offering services to other clients.
5
Multiparty
Motivation
•Other form of cooperation: multiparty

interaction or peer-to-peer interaction
P1
get file A get part 3 of file A
P2
P3
P4
•Tolerating failures of this kind of interactions

is more complex
6
Uncertainties
Motivation
•Distributed computing means that processes

might execute in different physical nodes.
•This implies two more uncertainties:

• Processes might not share the same clock
• Processes might not share the same memory
7
Clock
Motivation
If processes don’t share the clock they can not

time-order easily the events of the system.
B. Y. W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen, “Don ’ t Settle for

Eventual Consistency,” Commun. Acm, vol. 57 |, no. 5, pp. 61–68, 2014.
8
Sharing memory
Motivation
• Without sharing memory: there is not instant global
state.
get global state() get global state()
Server 1
s1
m1 s1’ m1
Server 2
s2 s2’
Server 3
s3 s3’
Client
• State (s1,s2,s3) is not possible (instantaneous)

• State (s1’,s2’,s3’) is possible but problems with m1
9
Distributed abstractions
• To understand distributed system we need to

capture the properties/abstractions that help
distinguish the fundamental from the accessory.
• We will abstract the underlying physical system:
basic abstractions
• Then we will show some recurring interaction
patterns in distributed applications: applications
abstractions.
10
Basic abstractions
•Processes: that abstract the active entities

that perform computations (computer,
processor, a thread of execution).
•Links: that abstract the physical and logical

network that support communication among
processes.
11
Application abstractions
•Reliable and efficient communication: Because

there are failures and asynchrony periods,
some abstractions to get reliable and efficient
links are needed.
•Logical clocks: Because there is no global

clock, an abstraction to time-order the
distributed system is needed.
12
Application abstractions (cont.)
•Distributed global states: Because there is no

global state, an abstraction to obtain a
consistent distributed global state is needed.
•Multicast primitives: Because there is no

reliable and synchronous hardware broadcast
mechanism, an multicast abstraction,
implementing different quality of services, to
communicate groups of processes is needed.
13
•Shared memory: Because there is no shared

physical memory among processes, an
abstraction to allow process to share memory
is needed.
•Consensus: Some application group of

processes need to reach a consensus on some
value to advance in their computation, an
abstraction to get this consensus is needed.
14
•Failure detectors: The system asynchrony

creates uncertainties about the knowledge of
process failures, an abstraction to detect
failures is needed.
•Atomic commitment: A group of processes

need to agree to execute some step only if all
agree to do it, otherwise the step is not done,
an abstraction to do this commitment is
needed
15
•Leader election: A group of processes need to

elect among them a leader when a previous
leader fails, an abstraction to elect a leader is
needed
16
Examples of distributed
applications
•Information dissemination
•Process control applications
•Cooperative work
•Distributed databases
•Highly Available Services
17
Information dissemination
Examples of distributed applications
•Processes may produce information,

publishers
•Processes may consume information,
subscribers
•Also called publish-subscribe paradigm
•If several processes are interested in the same
notification a multicast primitive with reliable
delivery property is needed
•An example is a RSS news channel
18
Process control applications
•Software processes most control the

execution of a physical activity.
•They might control dynamic location of
aircrafts, temperature of nuclear installations,
automation of car production, ...
•Some of the processes have typically
connected a sensor. To tolerate processes
failures a group of processes may
consensuate their input sensor values in
order to offer a reliable output value.
19
Cooperative work
• Internet users may cooperate in building a

common software or document, or setting up a
distributed dialogue.
• They can use an space abstraction with read and
write operations on it.
• These abstractions can be a distributed shared
memory, or a distributed file service.
• To maintain a consistent view of the shared
space, processes must to agree on the order of
operations.
20
Distributed database
• In distributed systems several transaction

managers might cooperate to service each
transaction.
• When a transaction end a distributed atomic
commitment algorithm must be execute in order
to decide if the transaction must commit or
abort.
• A transaction manager might decide to abort the
transaction if it detects a violation of the
database integrity, a deadlock problem, a disk
error, etc.
21
High available services
• It is done using the state-machine replication

approach
• Several processes (replicas) execute the same code in
different nodes (independent probability of failure).
• They receive the same inputs (messages) in the same
order with a total-ordered multicast.
• All replicas execute the same states if they have the
same deterministic code.
• If one replica fails nothing happens because the others
continue offering the service of the replicated service.
22
Model
•Distributed Computation
•Process
• Failure modes
•Communication links
•Timing assumptions
23
Model
Distributed Computation
•Processes are the units of computations.

•System can be static or dynamic on the set of
processes.
•Processes might know the processes
identifiers of the system (known membership)
or not (unknown membership).
•Unless explicitly stated otherwise, it is
assumed that the set is static and the
membership is known.
24
Model
• No assumption is made on the mapping of

processes to actual processors, processes or
threads.
• Processes communicate exchanging messages
and the messages are uniquely identified
(proc_id, sec_num).
• Messages are exchanged through
communication links.
• A distributed algorithm is a collection of
distributed automata, one per process.
25
Model
• A process step consist in receiving (delivering) a

message (global event), executing a local
computation (local event) and sending a
message (global event).
• Only one process step in the distributed system
at the same time. (Virtual global scheduler)
• Some of the step events can be “nil” (nothing is
done).
• Unless specified otherwise we will consider
deterministic algorithms.
26
Model
Process
•Unless it fails a process is supposed to

execute the algorithm assigned to it.
•The unit of failure is the process (atomic
component).
•When it fails, all its components fail as well at
the same time.
•Process abstraction differ according to the
nature of the failure that are considered.
27
Model - Process
Failure modes
CRASHES
OMISSIONS
CRASHES & RECOVERY
ARBITRARY
28
Model - Process
Arbitrary failure mode
•It happens when a process execute deviates

arbitrarily from the algorithm assigned to it.
•It is the most general failure mode.
•A process can process any output and at any
time.
•They are also called byzantine failures and
malicious failures.
•They are the most expensive to tolerate
29
Model - Process
Omissions failure mode
•It happens when a process does not send (or

receive) a message it is supposed to send (or
receive) according to the algorithm.
•In general this faults are due to buffer
overflows or network congestion.
•With omissions a process deviates from the
algorithm assigned due to messages lost.
30
Model - Process
Crash failure mode
•It happens when a process stops executing

after some time t.
•It is called a crash failure and it is said that we
have a crash-stop process abstraction.
•It is typical to assume in algorithms to have
up to F failures. This means that during the
execution the number of real processes
crashes will be less or equal to F.
31
Model - Process
Crash-recovery failure mode
•In this mode process can recover after crash.

•Two options: to have stable storage or not.
•With the crash all the volatile memory is lost
but not the stable storage. After the recovery
the stable storage can be read.
•Processes: permanently up; eventually up;
eventually down; permanently up&down
32
Model - Communication links
The link abstraction
• The link is used to represent the network
components of the distributed systems.
• Unless otherwise stated every pair of processes

is connected by a bidirectional link, providing a
full connectivity among processes.
• In practice, different topologies may be used to

implement this abstraction, possibly using
routing algorithms: a fully connected mesh, an
ethernet, a ring, the internet.
33
The link abstraction (cont.)
•Some algorithms do not consider a fully

connected system.
•In this case the algorithm should route the

messages by itself.
•Messages are uniquely identified
34
Link failures
•Links can loss messages (omission) and delay

messages (timing).
•A process can retransmit messages if it loss

them.
•Using Fair-loss links we can implement

reliable links.
35
Link failures (cont.)
•The Fair-loss link properties are:

• Fair-loss: if a process p send infinitely number of
messages to process q, then q will deliver
infinitely number of messages, if p and q don’t
crash.
• Finite duplication: If p send to q a message m a
finite number of times, m cannot be deliver an
infinite number of times to q.
• No creation: If m is deliver then m was sent
36
Model – Timing assumptions
Types of timing systems
•The lack of a global clock and the

uncertainties in the communication delay
duration produces different types of timing
systems.
•This timing systems are:

• Asynchronous
• Synchronous
• Partially synchronous
37
Asynchronous system
Timing assumptions
•Processes: There is no upper bound on

maximum processing delays.
•Communication links: There is no upper
bound on maximum message transmission
delay
•More realistic. Like internet.
•Difficult or impossible to build algorithms:
consensus, atomic broadcast, membership
service.
38
Synchronous system
Timing assumptions
•Processes: There is a known upper bound on

maximum processing delays.
•Communication links: There is a known upper
bound on maximum message transmission
delay.
•Less realistic. Only real-time systems.
•Easy to detect processes failures reliably
39
Partially Synchronous system
Timing assumptions
• Processes: There is an upper bound on the
maximum processing delays but is unknown.
• Communication links: There is an upper bound
on the maximum message transmission delay
but is unknown.
• It is realistic.
• It is possible to detect processes failures
unreliably with adaptative timeouts.
• It is possible to implement consensus, atomic
• broadcast, membership services.
40

Distribuidos Chapter I Introduction PDF

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Distribuidos Chapter I Introduction PDF

Transféré par

Droits d'auteur :

Formats disponibles

Chapter I: Introduction to

• Distributed computing has to do with algorithms

• Besides some of the processes of the distributed

• This differentiates a distributed system from a

• The challenge is for the processes that are still alive.

• They must continue cooperating among them in a

• The process cooperation must tolerate failures.

• The communication asynchrony and communication

•The most common processes cooperation is

•Tolerating failures would mean that:

•Other form of cooperation: multiparty

•Tolerating failures of this kind of interactions

•Distributed computing means that processes

•This implies two more uncertainties:

If processes don’t share the clock they can not

B. Y. W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen, “Don ’ t Settle for

• State (s1,s2,s3) is not possible (instantaneous)

• To understand distributed system we need to

•Processes: that abstract the active entities

•Links: that abstract the physical and logical

•Reliable and efficient communication: Because

•Logical clocks: Because there is no global

•Distributed global states: Because there is no

•Multicast primitives: Because there is no

•Shared memory: Because there is no shared

•Consensus: Some application group of

•Failure detectors: The system asynchrony

•Atomic commitment: A group of processes

•Leader election: A group of processes need to

•Processes may produce information,

•Software processes most control the

• Internet users may cooperate in building a

• In distributed systems several transaction

• It is done using the state-machine replication

•Processes are the units of computations.

• No assumption is made on the mapping of

• A process step consist in receiving (delivering) a

•Unless it fails a process is supposed to

•It happens when a process execute deviates

•It happens when a process does not send (or

•It happens when a process stops executing

•In this mode process can recover after crash.

• Unless otherwise stated every pair of processes

• In practice, different topologies may be used to

•Some algorithms do not consider a fully

•In this case the algorithm should route the

•Messages are uniquely identified

•Links can loss messages (omission) and delay

•A process can retransmit messages if it loss

•Using Fair-loss links we can implement

•The Fair-loss link properties are:

•The lack of a global clock and the

•This timing systems are:

•Processes: There is no upper bound on

•Processes: There is a known upper bound on

Vous aimerez peut-être aussi