# Michel Raynal - FAULT-TOLERANT DISTRIBUTED SERVICES IN MESSAGE-PASSING SYSTEMS ## Connexes Comprendre la théorie derrière le Failure Detector. __T. D. Chandra and S. Toueg, “Unreliable failure detectors for reliable distributed systems,” J. ACM, vol. 43, no. 2, pp. 225–267, 1996.__ ## Definition Fault-Tolerence: The service remains uninterrupted even if some component in the network fail. Distributed System: A collection of computers (or nodes) that communicate amongst themselves [...] to perform a given task. Distributed Computing: The use of a Distributed System to solve a computational problems. Static system: The system composition is fixed. Dynamic system: nodes may enter, leave or move in the system with time. FLP impossibility result: It is impossible to design a distributed system that is both asynchronous and fault-tolerant. ADD (Average Delayed/Dropped): model used to describe realisticly the network. Data-Strcutures: - linearizability: a data structure is said to be linearizable if it guarantees that all operations appear to happen at a single pointin time between the invocation and response of the operation. - Shared Register: [a data strcuture] that stores a value and has two opérations: read [...] and write. - Fault-Tolerent Register: Linearizable (atomic) Shared register. Attacks: - crash: a node halts, but was working correctly until it halts. - omission: a node fails to receive incoming messages or send outgoing messages. - timing: a node's message delivery lies outside of the specified delivery time interval. - Byzantine: Malicious attacks, operator mistake, software errors and conventional crash faults. - churn: change in system composition due to nodes entering and leaving. Usefull terms: - shared memory/message-passing model - synchronous/asynchronous systems - static/dynamic systems algorithms of sharded registers: - RAMBO - DynaStore - Baldoni et Al. ## Chapter 1 He's began to define the terms of distributed systemsn and the possibles uses cases. He define synchronous message-passing systems as giving the best guarantees. Opposite to asynchronous message-passing systems. ### Failure Detectors He's defining te concept of Failure Detectors as an oracle able to identify the failed nodes. And how they can be used to circumvent the FLP impossibility result. Actually the Failure Detectors needs a certain level of synchronicity to work. And two lines of research are proposed to solve this problem: The first one is to implement the Failure Detector on a increasingly weaker system model. And the second one is to find the weakest Failure Detector. ### Fault-Tolerant Register He defined a "shared register" and explained how it's complicated to implementing them due to the possibility of faulty nodes. And he present the solution who's the Fault-Tolerant Register. He also present the "linearizability" property and how it's used to define the Fault-Tolerant Register. Finally he introduce two implementation of the Fault-Tolerant Register: one who's crash-tolerent and the other one who's Byzantine-tolerent. ## Chapter 2 He precised the context of the implementation. We are on an arbitrary, partitionnable network composed of Average Delayed/Dropped channels (ADD). The failure detectors can be defined by their accuracy and completness tel que: - Strong completeness is satisfied if the failure detector of each node eventually suspects all nodes that are crashed. - Eventual strong accuracy is satisfied if the failure detector of every node eventually stops suspecting all nodes that are correct. He described he's algorithm. ## Chapter 3.1 He purposed a new Fault-Tolerant Register who's crash-tolerent and churn proof. The algorithm is tolerent of node who could crash or leave the system. There is no hierarchy between the nodes. And the algorithm emulated a shared memory using the message-passing model. ## Chapter 3.2 He purposed a new Fault-Tolerant Register who's crash-tolerent and churn and Byzantin proof. The model add a notion of server in the previous model (where we had only clients). And a system of asymetric signature. Also he proved than it's impossible with thiss model to determine the number of Byzantin server as a fraction of the total number of servers.