hard push

2024-12-17 14:57:43 +01:00
parent e51d7de452
commit ab70a09cbf
38 changed files with 2570 additions and 1 deletions
--- a/docs/sujetThese/sujet-cifre.tex
+++ b/docs/sujetThese/sujet-cifre.tex
@ -0,0 +1,380 @@
+\documentclass[11pt]{article}
+\usepackage{graphicx}
+\usepackage{paralist} %% needed for compact lists
+\usepackage[normalem]{ulem} %% needed by strike
+\usepackage[urlcolor=blue,colorlinks=true,breaklinks]{hyperref}
+\usepackage[utf8x]{inputenc}  %% char encoding
+\usepackage{framed} %% frame multipages
+\usepackage{fullpage}
+\usepackage{a4wide}
+\usepackage{mathpazo} %% math & rm
+\linespread{1.05}        %% Palatino needs more leading (space between lines)
+\usepackage[scaled]{helvet} %% ss
+\usepackage{courier} %% tt
+\normalfont
+\usepackage[T1]{fontenc}
+\usepackage[english]{babel} %% en englais
+\usepackage{xspace} %% gestion des espaces après une macro
+\usepackage{listings}
+\lstset{breaklines}
+\lstset{language=java}
+\lstset{escapechar=§}
+\usepackage{xcolor}
+
+
+\usepackage{comment} %%%% comment env
+
+%%%%%%%%%%%%%%
+%% fancy et brouillon
+%% Date en haut de page
+%% A commenter pour la version finale
+\usepackage[margin=2.5cm]{geometry}
+\usepackage{fancyhdr}
+%% Header and footer
+\fancyhf{} %%clear head and footer
+\fancyhead[C]{\thepage} %%draft
+\renewcommand{\headrulewidth}{0pt} \renewcommand{\footrulewidth}{2pt}
+\fancyfoot[C]{\textsc{SUJETCOURT}}
+\fancypagestyle{premiere}{%% première page
+\fancyhf{} %%clear head and footer
+\fancyfoot[L]{\textbf{LIF}}
+\renewcommand{\headrulewidth}{0pt} \renewcommand{\footrulewidth}{2pt}
+\fancyfoot[C]{\textsc{SUJETCOURT}}
+\fancyhead[C]{}%%\includegraphics[scale=0.25]{logo-lif.png}} %%UFR
+}
+\fancypagestyle{notete}{%% première page
+\fancyhf{} %%clear head and footer
+\renewcommand{\headrulewidth}{0pt} \renewcommand{\footrulewidth}{2pt}
+\fancyfoot[C]{\textsc{Sujet}}
+}
+
+\newcommand{\myversion}{\textit{version du \today{}}}
+
+\pagestyle{plain}
+
+\title{Weak Consistency for zero-trust cloud}
+\author{Research Subject}
+\begin{document}
+
+\date{Emmanuel Godard (LIS) -- Corentin Travers (LIS)\\emmanuel.godard@lis-lab.fr et corentin.travers@lis-lab.fr}
+\maketitle
+
+\textbf{Keywords:} Cloud, Security by design, Distributed Structures and Algorithms, Weak Consistencies, Byzantine systems
+
+\section*{Summary}
+
+Real-time collaborative applications are increasingly utilized in the context 
+of remote work systems. These applications often rely on centralized client-server 
+architectures, which pose security and privacy challenges. Data is stored on a 
+centralized server, requiring users to trust a third party with their data management. 
+Additionally, these architectures are often vulnerable to denial-of-service attacks 
+and do not ensure data confidentiality.
+
+To address these issues, we propose exploring information exchange solutions based 
+on zero-trust and/or peer-to-peer architectures that eliminate the need for trusted 
+third parties. These solutions would offer high-level security while ensuring system 
+resilience. To maintain strong performance, especially in high availability scenarios, 
+weak consistency models are frequently employed.
+
+In this context, we propose studying weak consistency properties applied to 
+cloud-related challenges. Initially, we will conduct a state-of-the-art review of 
+Byzantine fault-tolerant solutions without cryptographic primitives, along with 
+existing implementations (WP1). A second step will involve proposing more efficient 
+solutions using cryptographic primitives (WP2). Finally, a proof-of-concept will be 
+developed for a key-value storage solution using the algorithms selected in the 
+previous stages (WP3).
+
+\pagebreak
+
+\section*{Problematic}
+
+Since the pioneering work in the 1980s by Lamport \cite{LamportInterprocess1986} 
+and Misra \cite{MisraAxioms1986}, replication management has been central to digital 
+developments in terms of high availability. One of the fundamental challenges is 
+to provide application developers with an abstraction of replicated memory that is 
+both easy to use and enables flexible and fault-tolerant utilization of distributed resources.
+
+This line of research has led to the concept of \textit{data consistency}, with its 
+various forms tailored to suit the best compromises in usage and specificities of each application.
+
+The current trend towards cloud-based deployment of software applications entails significant 
+changes in usage patterns and development approaches for new applications. With the advent 
+of user-friendly cloud services where infrastructure maintenance is outsourced to a provider, 
+there's a noticeable centralization of resources. This reintroduces classic security issues, 
+such as the need for trust/sovereignty or the risk of a \textit{single point of failure} (SPOF).
+
+In response, new approaches termed \textit{zero-trust} have been proposed to continue using 
+cloud resources without depending on any specific provider. These approaches require both 
+multi-provider architectures and advanced cryptographic techniques. 
+
+\medskip
+
+From a programmer's perspective, it's often advantageous to consider cloud-based applications 
+as a single centralized system. This requires that the data structures used exhibit a 
+property known as \textit{strong consistency}.
+
+In real-world conditions, servers may have to endure very challenging operating conditions. 
+It is well-known to both theorists and practitioners, through the CAP theorem 
+(Consistency, Availability, Partition tolerance), that operational compromises are often 
+necessary. Specifically, if strong consistency is desired, the computation time is proportional 
+to the latency of \textbf{the entire} network, which in practice reduces availability.
+
+Referring to the CAP theorem, applying strong consistency makes it impossible to implement 
+a highly resilient system while providing a highly available application. Yet, both of 
+these aspects can be essential in building a collaborative application.
+
+The peer-to-peer approach indeed implies significant system resilience against failures. 
+Replicas may become disconnected from one another and experience significant and uneven latency 
+differences. The lack of control over the client's system and execution environment compels 
+us to envision systems capable of withstanding the worst possible scenarios. 
+
+In the context of real-time collaboration applications, the need for high availability is 
+intimately tied to the requirement of enabling different replicas to access the same 
+shared data for real-time work. It would therefore be unacceptable to introduce significant 
+latencies between two modifications.
+
+Given the impossibility of fully satisfying both strong consistency and high availability, 
+we turn to the study of weak consistencies, specifically focusing on convergence. We define 
+a system as convergent if it adheres to the following property:
+
+If replicas cease to propose modifications, then these same replicas must eventually 
+reach a consistent state.
+
+Convergence (or Eventual Consistency) has been extensively studied, leading to the development 
+of various distributed data structures that aim to uphold convergence. However, convergence 
+alone does not resolve our problem. This property does not guarantee behaviors during execution, 
+where inconsistency within the system is permissible due to convergence. Simply achieving 
+eventual consistency in a document does not suffice to make it a satisfactory collaborative 
+editing application. We also need mechanisms to resolve conflicts, which are inevitable in 
+collaborative approaches. This conflict resolution must be carried out optimally to maximize 
+the preservation of the meaning intended by each modifying replica.
+
+These issues have indeed been extensively studied, and the solutions proposed, particularly 
+suitable in our context, are the \textit{Replicated Data Types} (RDTs). There are two classes of RDTs:
+
+Commutative Replicated Data Types (CmRDTs): Operations on these types yield the same result 
+regardless of the order of their local executions.
+
+Convergent Replicated Data Types (CvRDTs): These types, for example, a system where data aims 
+to continuously grow, converge towards a maximal structure.
+
+Both classes fall under the umbrella term of Conflict-free Replicated Data Types (CRDTs) and are 
+actually equivalent to each other \cite{ShapiroConflictFree2011}.
+CRDTs provide a powerful framework for building distributed applications that require high availability 
+and eventual consistency. By ensuring that operations are commutative and can be merged across 
+eplicas without conflicts, CRDTs enable efficient conflict resolution and convergence of data 
+across distributed systems.
+The study of CRDTs has significantly advanced our ability to design collaborative and resilient 
+distributed applications, offering a practical approach to dealing with the challenges posed by real-time 
+collaboration over unreliable and latency-prone networks.
+
+\medskip
+
+Furthermore, to provide truly secure solutions in a zero-trust context, the most challenging operational
+conditions to consider are when servers or participating clients have been compromised and do not 
+strictly adhere to the protocol. In the literature, this is referred to as Byzantine behavior.
+
+Given these difficult constraints of availability and security, ensuring strong consistency can be 
+very computationally and time-intensive. Application requirements are sometimes not compatible with 
+such operational conditions. Therefore, it becomes necessary to consider data with properties of 
+so-called \textit{weak consistency}.
+
+Weak consistency models, such as eventual consistency offered by CRDTs, become valuable in such scenarios. 
+These models prioritize availability and partition tolerance while allowing for some degree of 
+inconsistency that can be resolved over time. They are designed to cope with the challenges of distributed 
+systems operating under non-ideal conditions, including the presence of Byzantine faults.
+
+In zero-trust environments where malicious behaviors are a constant threat, adopting weak consistency models 
+can strike a balance between functionality, security, and operational feasibility. They provide pragmatic 
+solutions for building resilient and secure distributed applications that can withstand the challenges posed 
+by compromised nodes and unreliable network conditions.
+
+\section*{State of the art}
+
+The landscape of weak consistency properties is relatively complex, with three major families of weak 
+consistencies identified \cite{Raynal18}, \cite{MPBook}:
+
+\begin{itemize}
+  \item Serializability
+  \item Causal Consistency
+  \item Eventual Strong Consistency
+\end{itemize}
+
+While eventual strong consistency is typically desired for collaborative applications, it is particularly 
+costly to achieve. Serializability, on the other hand, is simpler to implement but may result in transactions 
+that do not complete, requiring application-level error handling.
+
+Causal consistency maintains the causal order perceived by each process and generally allows for the efficient 
+implementation of higher-level data structures.
+
+For a comprehensive overview of these weak consistency models, readers can refer to M. Perrin's detailed 
+mapping \cite{MPBook}. Each of these models offers a different trade-off between consistency guarantees, 
+implementation complexity, and operational efficiency, making them suitable for different use cases and 
+application requirements. Understanding and selecting the appropriate weak consistency model is crucial for 
+designing effective and robust distributed systems, especially in the context of collaborative applications 
+operating in dynamic and unreliable environments.
+
+\subsection*{Algorithmic Results}
+
+The earliest work on secure collaborative tools in a high availability context dates back to 2009; however, 
+more systematic research on weak consistency security is quite recent. In 2009, Sing et al. introduced the 
+Zeno system, which was the first to propose a Byzantine algorithm favoring availability over strong consistency. 
+It provides Byzantine fault tolerance with potentially strong consistency \cite{SinghZeno2009}. The algorithm 
+experimentally demonstrated better availability performance compared to classical Byzantine algorithms.
+
+Currently, there are primarily partial studies and solutions for causal consistency \cite{TsengDistributed2019} 
+and \cite{VanDerLindePractical2020}. Tseng et al. present exact computability bounds within a Byzantine 
+framework on one hand and provide an algorithm whose performance is compared with that of the Google Compute 
+platform. Van Der Linde et al. introduce a peer-to-peer system resilient to Byzantine attacks that offers causal 
+consistency guarantees. Their evaluation suggests that despite a peer-to-peer architecture, performance, especially 
+in terms of latency, is very good compared to a traditional client-server architecture.
+
+In addition to these algorithms, Misra and Kshemkalyani demonstrated in \cite{MisraByzantine2021} that in an 
+asynchronous context, it is not possible to achieve causal consistency even with a single Byzantine participant.
+
+One of the notable features of \cite{VanDerLindePractical2020} is its exploration of Byzantine failures within 
+the context of weak consistencies. A peer-to-peer system like that in \cite{MisraByzantine2021} prompts new 
+considerations where a participant leverages information from lower layers of replication to create attacks at 
+the application level.
+
+Applying weak consistency criteria alone doesn't fully address the scope of our concerns. The cloud context 
+raises significant questions regarding data centralization and governance, with a market dominated by a few 
+major players to whom users must blindly entrust their data, posing substantial challenges to privacy and data 
+sovereignty.
+
+In this context, integrating the notion of a zero-trust cloud is essential, anchoring our discussions in a 
+relevant approach from both industrial and regulatory perspectives. Zero-trust, as defined by NIST in SP 800-207 
+\cite{RoseZero2020}, is a security model that trusts no one and makes no assumptions about network security. It 
+helps guard against malicious behaviors by intermediaries, reducing the attack surface and confining Byzantine 
+behaviors solely to clients who have access to the data.
+
+Certainly, the consideration of data-centric security alongside communication security is crucial. Adopting 
+"Data-Centric" approaches involves treating data itself as a dynamic entity within the system, assigning it 
+processes for access control and monitoring \cite{BayukDatacentric2009}. These issues represent growing concerns 
+and are addressed by state and inter-state actors, exemplified by NATO's stance on these matters through 
+STANAG 4774 and 4778. These topics have been extensively studied since the 2010s with works such as 
+\cite{GoyalAttributebased2006, MullerDistributed2009} defining solutions for attribute-based encryption, 
+issuing encryption keys based on rights to establish security policies. Other works like \cite{YanFlexible2017} 
+propose cloud-adapted solutions based on more flexible architectures with finer granularity in defining rights.
+
+However, concerning zero-trust and data-centric security aspects, there is currently no academic consensus 
+on the formalization of these notions. These terms are subject to various interpretations, necessitating a 
+formal specification to understand which properties need to be satisfied to achieve weak consistency within 
+a zero-trust context.
+
+\subsection*{Existing Implementations}
+
+Currently, there are ongoing projects aimed at implementing weak consistency protocols for real-time collaborative 
+applications. One notable project is yjs \cite{Yjs2023}, which implements the YATA protocol \cite{NicolaescuRealTime2016}. 
+This protocol ensures strong convergence (or SEC, according to the Perrin reference) through a CRDT 
+(Conflict-free Replicated Data Type) system. 
+
+On the other hand, older projects like Etherpad use simpler conflict resolution solutions, also ensuring strong 
+convergence but employing more complex algorithmic operations in terms of memory and computation time compared 
+to CRDTs \cite{AppJetEtherpad2011}.
+
+\section*{Goals}
+
+The objectives of this thesis encompass studying the three types of weak consistency in a Byzantine setting and 
+defining efficient Byzantine algorithms for their implementation. Given that causal consistency is already well-studied, 
+the main focus of this thesis will be on the other two types of weak consistency.
+
+The first stage (WP1) will involve studying Byzantine solutions without cryptographic primitives or with reasonably
+cost-effective primitives, specifically excluding homomorphic computation. An analysis of existing implementations will
+be conducted to determine the guarantees provided by these solutions within the vocabulary of weak consistencies.
+
+The second stage (WP2) will focus on developing more efficient solutions using cryptographic primitives that require 
+advanced secret-sharing and/or homomorphic computation.
+
+A final stage (WP3) will involve producing a proof-of-concept key/value storage solution using the algorithms 
+selected in the preceding stages.
+
+\section*{Methodology and Planning}
+
+A detailed review of distributed computing models, particularly focusing on solutions for causal consistency, 
+will be conducted to establish the set of theoretical and practical assumptions underlying these solutions. 
+Concurrently, in collaboration with Parsec, a list of attacks on weakly consistent peer-to-peer architectures 
+will be compiled. The emphasis will be on generating new knowledge, including novel solutions compared to the 
+current state of the art, as well as identifying new attack vectors.
+
+The algorithms will undergo formal validation initially, followed by the development of a proof of concept.
+
+WP1 will take place in 2024, WP2 in 2025, and WP3 in ZO26.
+
+\section*{Monitoring and Exchange Terms}
+
+Le doctorant participe aux réunions hebdomadaires de suivi de
+l'entreprise Parsec. Les partenaires se rencontreront tous les trois
+mois pour un point d'avancée sur les travaux.
+
+Il participera également aux réunions physiques de
+l'entreprise tous les 6 mois.
+
+\section*{Material resources}
+
+The Phd student will participate in Parsec's weekly progress meetings. Additionally, partners will convene 
+every three months for project status updates.
+
+Furthermore, the student will attend in-person meetings at the company every six months.
+
+\section*{Expected Benefits}
+
+On the LIS laboratory side, the expected outcomes include the following scientific publications:
+
+\begin{compactitem}
+\item State-of-the-art review and synthesis concerning Byzantine fault tolerance in weak consistencies.
+\item Proposals and proofs of new algorithms within the zero-trust context.
+\end{compactitem}
+
+For Parsec, the expected deliverables comprise a mini-model of cloud synchronization and collaboration, 
+a proof of concept for the aforementioned algorithms, and consultancy and expertise in the scientific 
+development of products created by Parsec.
+
+\section*{Team}
+
+\subsection*{Distributed Algorithmics Team (DALGO)}
+
+The Distributed Algorithms team, led by Arnaud Labourel, is part of the Laboratory of Computer 
+Science and Systems (LIS CNRS UMR 7020). This research team is internationally recognized at the 
+highest level, comprising 8 permanent members whose interests span from reliable distributed 
+algorithms and confidentiality in distributed systems to communication networks, graph algorithms, 
+mobile agents, and IoT (Internet of Things).
+
+\subsection*{Supervisors}
+
+\textbf{Emmanuel Godard}  is a professor at Aix-Marseille University. His research interests 
+  primarily focus on understanding and maximizing decentralization (in a broad sense) in 
+  distributed systems. He is an expert in distributed algorithms and computability.
+
+\textbf{Corentin Travers}  is an Associate Professor at Aix-Marseille University. His research 
+  interests focus on robust and efficient distributed algorithms for shared-memory systems or 
+  distributed networks. He is an expert in distributed algorithmics and complexity.
+
+\textbf{Marcos Medrano} is an R\&D engineer at Parsec. He holds a master's degree in research 
+  in computer science and applied mathematics. Marcos is responsible for the development 
+  strategy of the Parsec product and facilitates collaboration between engineers and academic stakeholders.
+
+\subsection*{Candidate Choice}
+
+The DALGO team is involved in the "Reliability and Computer Security" Master's program at Aix-Marseille 
+University. This master's track is certified as \textit{SecNumEdu} by ANSSI 
+(National Cybersecurity Agency of France). In autumn 2022, a project in collaboration with the company 
+Parsec was presented to all master's students. Following this call for applications, Mr. Amaury Joly 
+was selected for a preliminary 6-month research internship on the topic of weak consistency at the 
+LIS laboratory.
+
+Mr. Amaury Joly has achieved excellent academic results, earning a good mention in the master's 
+program. Additionally, he possesses a strong dual theoretical and technical profile, with a keen 
+motivation for research activities related to cloud security. He is the ideal candidate for such 
+a research topic.
+
+{\footnotesize
+  \nocite{*}
+
+  \bibliography{sujet-cifre.bib}
+  \bibliographystyle{alpha}
+}
+
+% LaTeX2e code generated by txt2tags 3.4 (http://txt2tags.org)
+% cmdline: txt2tags -t tex sujet-cifre.t2t
+\end{document}