Vous êtes sur la page 1sur 40

Chapter I: Introduction to

Distributed System
Sistemas Distribuidos
Carrera de Ingeniería de Sistemas
Universidad Politécnica Salesiana
Based on Distributed System of UPM
Original Author: Sergio Arévalo

Rodrigo Tufiño
Mayo 2016
Contents

1. Motivation
2. Distributed abstractions
3. Examples of distributed applications
4. Model

Bibliography
Introduction to Reliable Distributed Programming .
Rachid Gerraoui, Luis Rodrigues. Springer-Verlag
2006. Chpts. 1 & 2.
2
Motivation

• Distributed computing has to do with algorithms


for a set of processes that cooperate.

• Besides some of the processes of the distributed


algorithm might stop by crashing while others
might stay alive and keep operating.

• This differentiates a distributed system from a


concurrent system
3
Motivation (cont.)

• The challenge is for the processes that are still alive.

• They must continue cooperating among them in a


consistent way in spite of the failure of the other
processes.

• The process cooperation must tolerate failures.

• The communication asynchrony and communication


link failures makes very difficult this cooperation.

4
Process cooperation
Motivation

•The most common processes cooperation is


client-server: Client

Server

•Tolerating failures would mean that:


• if the server fails, the client should do the request
to another server.
• If some clients fail, the server should continue
offering services to other clients.
5
Multiparty
Motivation

•Other form of cooperation: multiparty


interaction or peer-to-peer interaction
P1
get file A get part 3 of file A
P2
get file A get part 2 of file A
P3
get file A get part 1 of file A
P4

•Tolerating failures of this kind of interactions


is more complex

6
Uncertainties
Motivation

•Distributed computing means that processes


might execute in different physical nodes.

•This implies two more uncertainties:


• Processes might not share the same clock
• Processes might not share the same memory

7
Clock
Motivation

If processes don’t share the clock they can not


time-order easily the events of the system.

B. Y. W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen, “Don ’ t Settle for


Eventual Consistency,” Commun. Acm, vol. 57 |, no. 5, pp. 61–68, 2014.

8
Sharing memory
Motivation
• Without sharing memory: there is not instant global
state.
get global state() get global state()
Server 1
s1
m1 s1’ m1
Server 2
s2 s2’
Server 3
s3 s3’
Client

• State (s1,s2,s3) is not possible (instantaneous)


• State (s1’,s2’,s3’) is possible but problems with m1
9
Distributed abstractions

• To understand distributed system we need to


capture the properties/abstractions that help
distinguish the fundamental from the accessory.
• We will abstract the underlying physical system:
basic abstractions
• Then we will show some recurring interaction
patterns in distributed applications: applications
abstractions.

10
Basic abstractions
Distributed abstractions

•Processes: that abstract the active entities


that perform computations (computer,
processor, a thread of execution).

•Links: that abstract the physical and logical


network that support communication among
processes.

11
Application abstractions
Distributed abstractions

•Reliable and efficient communication: Because


there are failures and asynchrony periods,
some abstractions to get reliable and efficient
links are needed.

•Logical clocks: Because there is no global


clock, an abstraction to time-order the
distributed system is needed.

12
Application abstractions (cont.)
Distributed abstractions

•Distributed global states: Because there is no


global state, an abstraction to obtain a
consistent distributed global state is needed.

•Multicast primitives: Because there is no


reliable and synchronous hardware broadcast
mechanism, an multicast abstraction,
implementing different quality of services, to
communicate groups of processes is needed.

13
Application abstractions (cont.)
Distributed abstractions

•Shared memory: Because there is no shared


physical memory among processes, an
abstraction to allow process to share memory
is needed.

•Consensus: Some application group of


processes need to reach a consensus on some
value to advance in their computation, an
abstraction to get this consensus is needed.

14
Application abstractions (cont.)
Distributed abstractions

•Failure detectors: The system asynchrony


creates uncertainties about the knowledge of
process failures, an abstraction to detect
failures is needed.

•Atomic commitment: A group of processes


need to agree to execute some step only if all
agree to do it, otherwise the step is not done,
an abstraction to do this commitment is
needed
15
Application abstractions (cont.)
Distributed abstractions

•Leader election: A group of processes need to


elect among them a leader when a previous
leader fails, an abstraction to elect a leader is
needed

16
Examples of distributed
applications

•Information dissemination
•Process control applications
•Cooperative work
•Distributed databases
•Highly Available Services

17
Information dissemination
Examples of distributed applications

•Processes may produce information,


publishers
•Processes may consume information,
subscribers
•Also called publish-subscribe paradigm
•If several processes are interested in the same
notification a multicast primitive with reliable
delivery property is needed
•An example is a RSS news channel
18
Process control applications
Examples of distributed applications

•Software processes most control the


execution of a physical activity.
•They might control dynamic location of
aircrafts, temperature of nuclear installations,
automation of car production, ...
•Some of the processes have typically
connected a sensor. To tolerate processes
failures a group of processes may
consensuate their input sensor values in
order to offer a reliable output value.
19
Cooperative work
Examples of distributed applications

• Internet users may cooperate in building a


common software or document, or setting up a
distributed dialogue.
• They can use an space abstraction with read and
write operations on it.
• These abstractions can be a distributed shared
memory, or a distributed file service.
• To maintain a consistent view of the shared
space, processes must to agree on the order of
operations.
20
Distributed database
Examples of distributed applications

• In distributed systems several transaction


managers might cooperate to service each
transaction.
• When a transaction end a distributed atomic
commitment algorithm must be execute in order
to decide if the transaction must commit or
abort.
• A transaction manager might decide to abort the
transaction if it detects a violation of the
database integrity, a deadlock problem, a disk
error, etc.

21
High available services
Examples of distributed applications

• It is done using the state-machine replication


approach
• Several processes (replicas) execute the same code in
different nodes (independent probability of failure).
• They receive the same inputs (messages) in the same
order with a total-ordered multicast.
• All replicas execute the same states if they have the
same deterministic code.
• If one replica fails nothing happens because the others
continue offering the service of the replicated service.

22
Model

•Distributed Computation

•Process
• Failure modes

•Communication links

•Timing assumptions
23
Model
Distributed Computation

•Processes are the units of computations.


•System can be static or dynamic on the set of
processes.
•Processes might know the processes
identifiers of the system (known membership)
or not (unknown membership).
•Unless explicitly stated otherwise, it is
assumed that the set is static and the
membership is known.

24
Model
Distributed Computation

• No assumption is made on the mapping of


processes to actual processors, processes or
threads.
• Processes communicate exchanging messages
and the messages are uniquely identified
(proc_id, sec_num).
• Messages are exchanged through
communication links.
• A distributed algorithm is a collection of
distributed automata, one per process.
25
Model
Distributed Computation

• A process step consist in receiving (delivering) a


message (global event), executing a local
computation (local event) and sending a
message (global event).
• Only one process step in the distributed system
at the same time. (Virtual global scheduler)
• Some of the step events can be “nil” (nothing is
done).
• Unless specified otherwise we will consider
deterministic algorithms.
26
Model
Process

•Unless it fails a process is supposed to


execute the algorithm assigned to it.
•The unit of failure is the process (atomic
component).
•When it fails, all its components fail as well at
the same time.
•Process abstraction differ according to the
nature of the failure that are considered.

27
Model - Process
Failure modes

CRASHES

OMISSIONS
CRASHES & RECOVERY
ARBITRARY
28
Model - Process
Arbitrary failure mode

•It happens when a process execute deviates


arbitrarily from the algorithm assigned to it.
•It is the most general failure mode.
•A process can process any output and at any
time.
•They are also called byzantine failures and
malicious failures.
•They are the most expensive to tolerate
29
Model - Process
Omissions failure mode

•It happens when a process does not send (or


receive) a message it is supposed to send (or
receive) according to the algorithm.
•In general this faults are due to buffer
overflows or network congestion.
•With omissions a process deviates from the
algorithm assigned due to messages lost.

30
Model - Process
Crash failure mode

•It happens when a process stops executing


after some time t.
•It is called a crash failure and it is said that we
have a crash-stop process abstraction.
•It is typical to assume in algorithms to have
up to F failures. This means that during the
execution the number of real processes
crashes will be less or equal to F.

31
Model - Process
Crash-recovery failure mode

•In this mode process can recover after crash.


•Two options: to have stable storage or not.
•With the crash all the volatile memory is lost
but not the stable storage. After the recovery
the stable storage can be read.
•Processes: permanently up; eventually up;
eventually down; permanently up&down

32
Model - Communication links
The link abstraction
• The link is used to represent the network
components of the distributed systems.

• Unless otherwise stated every pair of processes


is connected by a bidirectional link, providing a
full connectivity among processes.

• In practice, different topologies may be used to


implement this abstraction, possibly using
routing algorithms: a fully connected mesh, an
ethernet, a ring, the internet.
33
Model - Communication links
The link abstraction (cont.)

•Some algorithms do not consider a fully


connected system.

•In this case the algorithm should route the


messages by itself.

•Messages are uniquely identified

34
Model - Communication links
Link failures

•Links can loss messages (omission) and delay


messages (timing).

•A process can retransmit messages if it loss


them.

•Using Fair-loss links we can implement


reliable links.
35
Model - Communication links
Link failures (cont.)

•The Fair-loss link properties are:


• Fair-loss: if a process p send infinitely number of
messages to process q, then q will deliver
infinitely number of messages, if p and q don’t
crash.
• Finite duplication: If p send to q a message m a
finite number of times, m cannot be deliver an
infinite number of times to q.
• No creation: If m is deliver then m was sent

36
Model – Timing assumptions
Types of timing systems

•The lack of a global clock and the


uncertainties in the communication delay
duration produces different types of timing
systems.

•This timing systems are:


• Asynchronous
• Synchronous
• Partially synchronous
37
Asynchronous system
Timing assumptions

•Processes: There is no upper bound on


maximum processing delays.
•Communication links: There is no upper
bound on maximum message transmission
delay
•More realistic. Like internet.
•Difficult or impossible to build algorithms:
consensus, atomic broadcast, membership
service.

38
Synchronous system
Timing assumptions

•Processes: There is a known upper bound on


maximum processing delays.
•Communication links: There is a known upper
bound on maximum message transmission
delay.
•Less realistic. Only real-time systems.
•Easy to detect processes failures reliably

39
Partially Synchronous system
Timing assumptions
• Processes: There is an upper bound on the
maximum processing delays but is unknown.
• Communication links: There is an upper bound
on the maximum message transmission delay
but is unknown.
• It is realistic.
• It is possible to detect processes failures
unreliably with adaptative timeouts.
• It is possible to implement consensus, atomic
• broadcast, membership services.

40

Vous aimerez peut-être aussi