\documentclass[11pt]{article} \usepackage{graphicx} \usepackage{paralist} %% needed for compact lists \usepackage[normalem]{ulem} %% needed by strike \usepackage[urlcolor=blue,colorlinks=true,breaklinks]{hyperref} \usepackage[utf8x]{inputenc} %% char encoding \usepackage{framed} %% frame multipages \usepackage{fullpage} \usepackage{a4wide} \usepackage{mathpazo} %% math & rm \linespread{1.05} %% Palatino needs more leading (space between lines) \usepackage[scaled]{helvet} %% ss \usepackage{courier} %% tt \normalfont \usepackage[T1]{fontenc} \usepackage[english]{babel} %% en englais \usepackage{xspace} %% gestion des espaces après une macro \usepackage{listings} \lstset{breaklines} \lstset{language=java} \lstset{escapechar=§} \usepackage{xcolor} \usepackage{comment} %%%% comment env %%%%%%%%%%%%%% %% fancy et brouillon %% Date en haut de page %% A commenter pour la version finale \usepackage[margin=2.5cm]{geometry} \usepackage{fancyhdr} %% Header and footer \fancyhf{} %%clear head and footer \fancyhead[C]{\thepage} %%draft \renewcommand{\headrulewidth}{0pt} \renewcommand{\footrulewidth}{2pt} \fancyfoot[C]{\textsc{SUJETCOURT}} \fancypagestyle{premiere}{%% première page \fancyhf{} %%clear head and footer \fancyfoot[L]{\textbf{LIF}} \renewcommand{\headrulewidth}{0pt} \renewcommand{\footrulewidth}{2pt} \fancyfoot[C]{\textsc{SUJETCOURT}} \fancyhead[C]{}%%\includegraphics[scale=0.25]{logo-lif.png}} %%UFR } \fancypagestyle{notete}{%% première page \fancyhf{} %%clear head and footer \renewcommand{\headrulewidth}{0pt} \renewcommand{\footrulewidth}{2pt} \fancyfoot[C]{\textsc{Sujet}} } \newcommand{\myversion}{\textit{version du \today{}}} \pagestyle{plain} \title{Weak Consistency for zero-trust cloud} \author{Research Subject} \begin{document} \date{Emmanuel Godard (LIS) -- Corentin Travers (LIS)\\emmanuel.godard@lis-lab.fr et corentin.travers@lis-lab.fr} \maketitle \textbf{Keywords:} Cloud, Security by design, Distributed Structures and Algorithms, Weak Consistencies, Byzantine systems \section*{Summary} Real-time collaborative applications are increasingly utilized in the context of remote work systems. These applications often rely on centralized client-server architectures, which pose security and privacy challenges. Data is stored on a centralized server, requiring users to trust a third party with their data management. Additionally, these architectures are often vulnerable to denial-of-service attacks and do not ensure data confidentiality. To address these issues, we propose exploring information exchange solutions based on zero-trust and/or peer-to-peer architectures that eliminate the need for trusted third parties. These solutions would offer high-level security while ensuring system resilience. To maintain strong performance, especially in high availability scenarios, weak consistency models are frequently employed. In this context, we propose studying weak consistency properties applied to cloud-related challenges. Initially, we will conduct a state-of-the-art review of Byzantine fault-tolerant solutions without cryptographic primitives, along with existing implementations (WP1). A second step will involve proposing more efficient solutions using cryptographic primitives (WP2). Finally, a proof-of-concept will be developed for a key-value storage solution using the algorithms selected in the previous stages (WP3). \pagebreak \section*{Problematic} Since the pioneering work in the 1980s by Lamport \cite{LamportInterprocess1986} and Misra \cite{MisraAxioms1986}, replication management has been central to digital developments in terms of high availability. One of the fundamental challenges is to provide application developers with an abstraction of replicated memory that is both easy to use and enables flexible and fault-tolerant utilization of distributed resources. This line of research has led to the concept of \textit{data consistency}, with its various forms tailored to suit the best compromises in usage and specificities of each application. The current trend towards cloud-based deployment of software applications entails significant changes in usage patterns and development approaches for new applications. With the advent of user-friendly cloud services where infrastructure maintenance is outsourced to a provider, there's a noticeable centralization of resources. This reintroduces classic security issues, such as the need for trust/sovereignty or the risk of a \textit{single point of failure} (SPOF). In response, new approaches termed \textit{zero-trust} have been proposed to continue using cloud resources without depending on any specific provider. These approaches require both multi-provider architectures and advanced cryptographic techniques. \medskip From a programmer's perspective, it's often advantageous to consider cloud-based applications as a single centralized system. This requires that the data structures used exhibit a property known as \textit{strong consistency}. In real-world conditions, servers may have to endure very challenging operating conditions. It is well-known to both theorists and practitioners, through the CAP theorem (Consistency, Availability, Partition tolerance), that operational compromises are often necessary. Specifically, if strong consistency is desired, the computation time is proportional to the latency of \textbf{the entire} network, which in practice reduces availability. Referring to the CAP theorem, applying strong consistency makes it impossible to implement a highly resilient system while providing a highly available application. Yet, both of these aspects can be essential in building a collaborative application. The peer-to-peer approach indeed implies significant system resilience against failures. Replicas may become disconnected from one another and experience significant and uneven latency differences. The lack of control over the client's system and execution environment compels us to envision systems capable of withstanding the worst possible scenarios. In the context of real-time collaboration applications, the need for high availability is intimately tied to the requirement of enabling different replicas to access the same shared data for real-time work. It would therefore be unacceptable to introduce significant latencies between two modifications. Given the impossibility of fully satisfying both strong consistency and high availability, we turn to the study of weak consistencies, specifically focusing on convergence. We define a system as convergent if it adheres to the following property: If replicas cease to propose modifications, then these same replicas must eventually reach a consistent state. Convergence (or Eventual Consistency) has been extensively studied, leading to the development of various distributed data structures that aim to uphold convergence. However, convergence alone does not resolve our problem. This property does not guarantee behaviors during execution, where inconsistency within the system is permissible due to convergence. Simply achieving eventual consistency in a document does not suffice to make it a satisfactory collaborative editing application. We also need mechanisms to resolve conflicts, which are inevitable in collaborative approaches. This conflict resolution must be carried out optimally to maximize the preservation of the meaning intended by each modifying replica. These issues have indeed been extensively studied, and the solutions proposed, particularly suitable in our context, are the \textit{Replicated Data Types} (RDTs). There are two classes of RDTs: Commutative Replicated Data Types (CmRDTs): Operations on these types yield the same result regardless of the order of their local executions. Convergent Replicated Data Types (CvRDTs): These types, for example, a system where data aims to continuously grow, converge towards a maximal structure. Both classes fall under the umbrella term of Conflict-free Replicated Data Types (CRDTs) and are actually equivalent to each other \cite{ShapiroConflictFree2011}. CRDTs provide a powerful framework for building distributed applications that require high availability and eventual consistency. By ensuring that operations are commutative and can be merged across eplicas without conflicts, CRDTs enable efficient conflict resolution and convergence of data across distributed systems. The study of CRDTs has significantly advanced our ability to design collaborative and resilient distributed applications, offering a practical approach to dealing with the challenges posed by real-time collaboration over unreliable and latency-prone networks. \medskip Furthermore, to provide truly secure solutions in a zero-trust context, the most challenging operational conditions to consider are when servers or participating clients have been compromised and do not strictly adhere to the protocol. In the literature, this is referred to as Byzantine behavior. Given these difficult constraints of availability and security, ensuring strong consistency can be very computationally and time-intensive. Application requirements are sometimes not compatible with such operational conditions. Therefore, it becomes necessary to consider data with properties of so-called \textit{weak consistency}. Weak consistency models, such as eventual consistency offered by CRDTs, become valuable in such scenarios. These models prioritize availability and partition tolerance while allowing for some degree of inconsistency that can be resolved over time. They are designed to cope with the challenges of distributed systems operating under non-ideal conditions, including the presence of Byzantine faults. In zero-trust environments where malicious behaviors are a constant threat, adopting weak consistency models can strike a balance between functionality, security, and operational feasibility. They provide pragmatic solutions for building resilient and secure distributed applications that can withstand the challenges posed by compromised nodes and unreliable network conditions. \section*{State of the art} The landscape of weak consistency properties is relatively complex, with three major families of weak consistencies identified \cite{Raynal18}, \cite{MPBook}: \begin{itemize} \item Serializability \item Causal Consistency \item Eventual Strong Consistency \end{itemize} While eventual strong consistency is typically desired for collaborative applications, it is particularly costly to achieve. Serializability, on the other hand, is simpler to implement but may result in transactions that do not complete, requiring application-level error handling. Causal consistency maintains the causal order perceived by each process and generally allows for the efficient implementation of higher-level data structures. For a comprehensive overview of these weak consistency models, readers can refer to M. Perrin's detailed mapping \cite{MPBook}. Each of these models offers a different trade-off between consistency guarantees, implementation complexity, and operational efficiency, making them suitable for different use cases and application requirements. Understanding and selecting the appropriate weak consistency model is crucial for designing effective and robust distributed systems, especially in the context of collaborative applications operating in dynamic and unreliable environments. \subsection*{Algorithmic Results} The earliest work on secure collaborative tools in a high availability context dates back to 2009; however, more systematic research on weak consistency security is quite recent. In 2009, Sing et al. introduced the Zeno system, which was the first to propose a Byzantine algorithm favoring availability over strong consistency. It provides Byzantine fault tolerance with potentially strong consistency \cite{SinghZeno2009}. The algorithm experimentally demonstrated better availability performance compared to classical Byzantine algorithms. Currently, there are primarily partial studies and solutions for causal consistency \cite{TsengDistributed2019} and \cite{VanDerLindePractical2020}. Tseng et al. present exact computability bounds within a Byzantine framework on one hand and provide an algorithm whose performance is compared with that of the Google Compute platform. Van Der Linde et al. introduce a peer-to-peer system resilient to Byzantine attacks that offers causal consistency guarantees. Their evaluation suggests that despite a peer-to-peer architecture, performance, especially in terms of latency, is very good compared to a traditional client-server architecture. In addition to these algorithms, Misra and Kshemkalyani demonstrated in \cite{MisraByzantine2021} that in an asynchronous context, it is not possible to achieve causal consistency even with a single Byzantine participant. One of the notable features of \cite{VanDerLindePractical2020} is its exploration of Byzantine failures within the context of weak consistencies. A peer-to-peer system like that in \cite{MisraByzantine2021} prompts new considerations where a participant leverages information from lower layers of replication to create attacks at the application level. Applying weak consistency criteria alone doesn't fully address the scope of our concerns. The cloud context raises significant questions regarding data centralization and governance, with a market dominated by a few major players to whom users must blindly entrust their data, posing substantial challenges to privacy and data sovereignty. In this context, integrating the notion of a zero-trust cloud is essential, anchoring our discussions in a relevant approach from both industrial and regulatory perspectives. Zero-trust, as defined by NIST in SP 800-207 \cite{RoseZero2020}, is a security model that trusts no one and makes no assumptions about network security. It helps guard against malicious behaviors by intermediaries, reducing the attack surface and confining Byzantine behaviors solely to clients who have access to the data. Certainly, the consideration of data-centric security alongside communication security is crucial. Adopting "Data-Centric" approaches involves treating data itself as a dynamic entity within the system, assigning it processes for access control and monitoring \cite{BayukDatacentric2009}. These issues represent growing concerns and are addressed by state and inter-state actors, exemplified by NATO's stance on these matters through STANAG 4774 and 4778. These topics have been extensively studied since the 2010s with works such as \cite{GoyalAttributebased2006, MullerDistributed2009} defining solutions for attribute-based encryption, issuing encryption keys based on rights to establish security policies. Other works like \cite{YanFlexible2017} propose cloud-adapted solutions based on more flexible architectures with finer granularity in defining rights. However, concerning zero-trust and data-centric security aspects, there is currently no academic consensus on the formalization of these notions. These terms are subject to various interpretations, necessitating a formal specification to understand which properties need to be satisfied to achieve weak consistency within a zero-trust context. \subsection*{Existing Implementations} Currently, there are ongoing projects aimed at implementing weak consistency protocols for real-time collaborative applications. One notable project is yjs \cite{Yjs2023}, which implements the YATA protocol \cite{NicolaescuRealTime2016}. This protocol ensures strong convergence (or SEC, according to the Perrin reference) through a CRDT (Conflict-free Replicated Data Type) system. On the other hand, older projects like Etherpad use simpler conflict resolution solutions, also ensuring strong convergence but employing more complex algorithmic operations in terms of memory and computation time compared to CRDTs \cite{AppJetEtherpad2011}. \section*{Goals} The objectives of this thesis encompass studying the three types of weak consistency in a Byzantine setting and defining efficient Byzantine algorithms for their implementation. Given that causal consistency is already well-studied, the main focus of this thesis will be on the other two types of weak consistency. The first stage (WP1) will involve studying Byzantine solutions without cryptographic primitives or with reasonably cost-effective primitives, specifically excluding homomorphic computation. An analysis of existing implementations will be conducted to determine the guarantees provided by these solutions within the vocabulary of weak consistencies. The second stage (WP2) will focus on developing more efficient solutions using cryptographic primitives that require advanced secret-sharing and/or homomorphic computation. A final stage (WP3) will involve producing a proof-of-concept key/value storage solution using the algorithms selected in the preceding stages. \section*{Methodology and Planning} A detailed review of distributed computing models, particularly focusing on solutions for causal consistency, will be conducted to establish the set of theoretical and practical assumptions underlying these solutions. Concurrently, in collaboration with Parsec, a list of attacks on weakly consistent peer-to-peer architectures will be compiled. The emphasis will be on generating new knowledge, including novel solutions compared to the current state of the art, as well as identifying new attack vectors. The algorithms will undergo formal validation initially, followed by the development of a proof of concept. WP1 will take place in 2024, WP2 in 2025, and WP3 in ZO26. \section*{Monitoring and Exchange Terms} Le doctorant participe aux réunions hebdomadaires de suivi de l'entreprise Parsec. Les partenaires se rencontreront tous les trois mois pour un point d'avancée sur les travaux. Il participera également aux réunions physiques de l'entreprise tous les 6 mois. \section*{Material resources} The Phd student will participate in Parsec's weekly progress meetings. Additionally, partners will convene every three months for project status updates. Furthermore, the student will attend in-person meetings at the company every six months. \section*{Expected Benefits} On the LIS laboratory side, the expected outcomes include the following scientific publications: \begin{compactitem} \item State-of-the-art review and synthesis concerning Byzantine fault tolerance in weak consistencies. \item Proposals and proofs of new algorithms within the zero-trust context. \end{compactitem} For Parsec, the expected deliverables comprise a mini-model of cloud synchronization and collaboration, a proof of concept for the aforementioned algorithms, and consultancy and expertise in the scientific development of products created by Parsec. \section*{Team} \subsection*{Distributed Algorithmics Team (DALGO)} The Distributed Algorithms team, led by Arnaud Labourel, is part of the Laboratory of Computer Science and Systems (LIS CNRS UMR 7020). This research team is internationally recognized at the highest level, comprising 8 permanent members whose interests span from reliable distributed algorithms and confidentiality in distributed systems to communication networks, graph algorithms, mobile agents, and IoT (Internet of Things). \subsection*{Supervisors} \textbf{Emmanuel Godard} is a professor at Aix-Marseille University. His research interests primarily focus on understanding and maximizing decentralization (in a broad sense) in distributed systems. He is an expert in distributed algorithms and computability. \textbf{Corentin Travers} is an Associate Professor at Aix-Marseille University. His research interests focus on robust and efficient distributed algorithms for shared-memory systems or distributed networks. He is an expert in distributed algorithmics and complexity. \textbf{Marcos Medrano} is an R\&D engineer at Parsec. He holds a master's degree in research in computer science and applied mathematics. Marcos is responsible for the development strategy of the Parsec product and facilitates collaboration between engineers and academic stakeholders. \subsection*{Candidate Choice} The DALGO team is involved in the "Reliability and Computer Security" Master's program at Aix-Marseille University. This master's track is certified as \textit{SecNumEdu} by ANSSI (National Cybersecurity Agency of France). In autumn 2022, a project in collaboration with the company Parsec was presented to all master's students. Following this call for applications, Mr. Amaury Joly was selected for a preliminary 6-month research internship on the topic of weak consistency at the LIS laboratory. Mr. Amaury Joly has achieved excellent academic results, earning a good mention in the master's program. Additionally, he possesses a strong dual theoretical and technical profile, with a keen motivation for research activities related to cloud security. He is the ideal candidate for such a research topic. {\footnotesize \nocite{*} \bibliography{sujet-cifre.bib} \bibliographystyle{alpha} } % LaTeX2e code generated by txt2tags 3.4 (http://txt2tags.org) % cmdline: txt2tags -t tex sujet-cifre.t2t \end{document}