80 lines
4.2 KiB
Markdown
Executable File
80 lines
4.2 KiB
Markdown
Executable File
# Michel Raynal - FAULT-TOLERANT DISTRIBUTED SERVICES IN MESSAGE-PASSING SYSTEMS
|
||
|
||
## Connexes
|
||
|
||
Comprendre la théorie derrière le Failure Detector. __T. D. Chandra and S. Toueg, “Unreliable failure detectors for reliable distributed systems,” J. ACM, vol. 43, no. 2, pp. 225–267, 1996.__
|
||
|
||
## Definition
|
||
|
||
Fault-Tolerence: The service remains uninterrupted even if some component in the network fail.
|
||
Distributed System: A collection of computers (or nodes) that communicate amongst themselves [...] to perform a given task.
|
||
Distributed Computing: The use of a Distributed System to solve a computational problems.
|
||
Static system: The system composition is fixed.
|
||
Dynamic system: nodes may enter, leave or move in the system with time.
|
||
FLP impossibility result: It is impossible to design a distributed system that is both asynchronous and fault-tolerant.
|
||
|
||
ADD (Average Delayed/Dropped): model used to describe realisticly the network.
|
||
|
||
Data-Strcutures:
|
||
|
||
- linearizability: a data structure is said to be linearizable if it guarantees that all operations appear to happen at a single pointin time between the invocation and response of the operation.
|
||
- Shared Register: [a data strcuture] that stores a value and has two opérations: read [...] and write.
|
||
- Fault-Tolerent Register: Linearizable (atomic) Shared register.
|
||
|
||
Attacks:
|
||
|
||
- crash: a node halts, but was working correctly until it halts.
|
||
- omission: a node fails to receive incoming messages or send outgoing messages.
|
||
- timing: a node's message delivery lies outside of the specified delivery time interval.
|
||
- Byzantine: Malicious attacks, operator mistake, software errors and conventional crash faults.
|
||
- churn: change in system composition due to nodes entering and leaving.
|
||
|
||
Usefull terms:
|
||
|
||
- shared memory/message-passing model
|
||
- synchronous/asynchronous systems
|
||
- static/dynamic systems
|
||
|
||
|
||
algorithms of sharded registers:
|
||
- RAMBO
|
||
- DynaStore
|
||
- Baldoni et Al.
|
||
|
||
## Chapter 1
|
||
|
||
He's began to define the terms of distributed systemsn and the possibles uses cases.
|
||
He define synchronous message-passing systems as giving the best guarantees. Opposite to asynchronous message-passing systems.
|
||
|
||
### Failure Detectors
|
||
|
||
He's defining te concept of Failure Detectors as an oracle able to identify the failed nodes. And how they can be used to circumvent the FLP impossibility result.
|
||
Actually the Failure Detectors needs a certain level of synchronicity to work. And two lines of research are proposed to solve this problem: The first one is to implement the Failure Detector on a increasingly weaker system model. And the second one is to find the weakest Failure Detector.
|
||
|
||
### Fault-Tolerant Register
|
||
|
||
He defined a "shared register" and explained how it's complicated to implementing them due to the possibility of faulty nodes. And he present the solution who's the Fault-Tolerant Register. He also present the "linearizability" property and how it's used to define the Fault-Tolerant Register.
|
||
Finally he introduce two implementation of the Fault-Tolerant Register: one who's crash-tolerent and the other one who's Byzantine-tolerent.
|
||
|
||
## Chapter 2
|
||
|
||
He precised the context of the implementation. We are on an arbitrary, partitionnable network composed of Average Delayed/Dropped channels (ADD).
|
||
The failure detectors can be defined by their accuracy and completness tel que:
|
||
|
||
- Strong completeness is satisfied if the failure detector of each node eventually suspects all nodes that are crashed.
|
||
- Eventual strong accuracy is satisfied if the failure detector of every node eventually stops suspecting all nodes that are correct.
|
||
|
||
He described he's algorithm.
|
||
|
||
## Chapter 3.1
|
||
|
||
He purposed a new Fault-Tolerant Register who's crash-tolerent and churn proof.
|
||
The algorithm is tolerent of node who could crash or leave the system.
|
||
There is no hierarchy between the nodes. And the algorithm emulated a shared memory using the message-passing model.
|
||
|
||
## Chapter 3.2
|
||
|
||
He purposed a new Fault-Tolerant Register who's crash-tolerent and churn and Byzantin proof.
|
||
The model add a notion of server in the previous model (where we had only clients). And a system of asymetric signature.
|
||
Also he proved than it's impossible with thiss model to determine the number of Byzantin server as a fraction of the total number of servers.
|