381 lines
21 KiB
TeX
381 lines
21 KiB
TeX
\documentclass[11pt]{article}
|
|
\usepackage{graphicx}
|
|
\usepackage{paralist} %% needed for compact lists
|
|
\usepackage[normalem]{ulem} %% needed by strike
|
|
\usepackage[urlcolor=blue,colorlinks=true,breaklinks]{hyperref}
|
|
\usepackage[utf8x]{inputenc} %% char encoding
|
|
\usepackage{framed} %% frame multipages
|
|
\usepackage{fullpage}
|
|
\usepackage{a4wide}
|
|
\usepackage{mathpazo} %% math & rm
|
|
\linespread{1.05} %% Palatino needs more leading (space between lines)
|
|
\usepackage[scaled]{helvet} %% ss
|
|
\usepackage{courier} %% tt
|
|
\normalfont
|
|
\usepackage[T1]{fontenc}
|
|
\usepackage[english]{babel} %% en englais
|
|
\usepackage{xspace} %% gestion des espaces après une macro
|
|
\usepackage{listings}
|
|
\lstset{breaklines}
|
|
\lstset{language=java}
|
|
\lstset{escapechar=§}
|
|
\usepackage{xcolor}
|
|
|
|
|
|
\usepackage{comment} %%%% comment env
|
|
|
|
%%%%%%%%%%%%%%
|
|
%% fancy et brouillon
|
|
%% Date en haut de page
|
|
%% A commenter pour la version finale
|
|
\usepackage[margin=2.5cm]{geometry}
|
|
\usepackage{fancyhdr}
|
|
%% Header and footer
|
|
\fancyhf{} %%clear head and footer
|
|
\fancyhead[C]{\thepage} %%draft
|
|
\renewcommand{\headrulewidth}{0pt} \renewcommand{\footrulewidth}{2pt}
|
|
\fancyfoot[C]{\textsc{SUJETCOURT}}
|
|
\fancypagestyle{premiere}{%% première page
|
|
\fancyhf{} %%clear head and footer
|
|
\fancyfoot[L]{\textbf{LIF}}
|
|
\renewcommand{\headrulewidth}{0pt} \renewcommand{\footrulewidth}{2pt}
|
|
\fancyfoot[C]{\textsc{SUJETCOURT}}
|
|
\fancyhead[C]{}%%\includegraphics[scale=0.25]{logo-lif.png}} %%UFR
|
|
}
|
|
\fancypagestyle{notete}{%% première page
|
|
\fancyhf{} %%clear head and footer
|
|
\renewcommand{\headrulewidth}{0pt} \renewcommand{\footrulewidth}{2pt}
|
|
\fancyfoot[C]{\textsc{Sujet}}
|
|
}
|
|
|
|
\newcommand{\myversion}{\textit{version du \today{}}}
|
|
|
|
\pagestyle{plain}
|
|
|
|
\title{Weak Consistency for zero-trust cloud}
|
|
\author{Research Subject}
|
|
\begin{document}
|
|
|
|
\date{Emmanuel Godard (LIS) -- Corentin Travers (LIS)\\emmanuel.godard@lis-lab.fr et corentin.travers@lis-lab.fr}
|
|
\maketitle
|
|
|
|
\textbf{Keywords:} Cloud, Security by design, Distributed Structures and Algorithms, Weak Consistencies, Byzantine systems
|
|
|
|
\section*{Summary}
|
|
|
|
Real-time collaborative applications are increasingly utilized in the context
|
|
of remote work systems. These applications often rely on centralized client-server
|
|
architectures, which pose security and privacy challenges. Data is stored on a
|
|
centralized server, requiring users to trust a third party with their data management.
|
|
Additionally, these architectures are often vulnerable to denial-of-service attacks
|
|
and do not ensure data confidentiality.
|
|
|
|
To address these issues, we propose exploring information exchange solutions based
|
|
on zero-trust and/or peer-to-peer architectures that eliminate the need for trusted
|
|
third parties. These solutions would offer high-level security while ensuring system
|
|
resilience. To maintain strong performance, especially in high availability scenarios,
|
|
weak consistency models are frequently employed.
|
|
|
|
In this context, we propose studying weak consistency properties applied to
|
|
cloud-related challenges. Initially, we will conduct a state-of-the-art review of
|
|
Byzantine fault-tolerant solutions without cryptographic primitives, along with
|
|
existing implementations (WP1). A second step will involve proposing more efficient
|
|
solutions using cryptographic primitives (WP2). Finally, a proof-of-concept will be
|
|
developed for a key-value storage solution using the algorithms selected in the
|
|
previous stages (WP3).
|
|
|
|
\pagebreak
|
|
|
|
\section*{Problematic}
|
|
|
|
Since the pioneering work in the 1980s by Lamport \cite{LamportInterprocess1986}
|
|
and Misra \cite{MisraAxioms1986}, replication management has been central to digital
|
|
developments in terms of high availability. One of the fundamental challenges is
|
|
to provide application developers with an abstraction of replicated memory that is
|
|
both easy to use and enables flexible and fault-tolerant utilization of distributed resources.
|
|
|
|
This line of research has led to the concept of \textit{data consistency}, with its
|
|
various forms tailored to suit the best compromises in usage and specificities of each application.
|
|
|
|
The current trend towards cloud-based deployment of software applications entails significant
|
|
changes in usage patterns and development approaches for new applications. With the advent
|
|
of user-friendly cloud services where infrastructure maintenance is outsourced to a provider,
|
|
there's a noticeable centralization of resources. This reintroduces classic security issues,
|
|
such as the need for trust/sovereignty or the risk of a \textit{single point of failure} (SPOF).
|
|
|
|
In response, new approaches termed \textit{zero-trust} have been proposed to continue using
|
|
cloud resources without depending on any specific provider. These approaches require both
|
|
multi-provider architectures and advanced cryptographic techniques.
|
|
|
|
\medskip
|
|
|
|
From a programmer's perspective, it's often advantageous to consider cloud-based applications
|
|
as a single centralized system. This requires that the data structures used exhibit a
|
|
property known as \textit{strong consistency}.
|
|
|
|
In real-world conditions, servers may have to endure very challenging operating conditions.
|
|
It is well-known to both theorists and practitioners, through the CAP theorem
|
|
(Consistency, Availability, Partition tolerance), that operational compromises are often
|
|
necessary. Specifically, if strong consistency is desired, the computation time is proportional
|
|
to the latency of \textbf{the entire} network, which in practice reduces availability.
|
|
|
|
Referring to the CAP theorem, applying strong consistency makes it impossible to implement
|
|
a highly resilient system while providing a highly available application. Yet, both of
|
|
these aspects can be essential in building a collaborative application.
|
|
|
|
The peer-to-peer approach indeed implies significant system resilience against failures.
|
|
Replicas may become disconnected from one another and experience significant and uneven latency
|
|
differences. The lack of control over the client's system and execution environment compels
|
|
us to envision systems capable of withstanding the worst possible scenarios.
|
|
|
|
In the context of real-time collaboration applications, the need for high availability is
|
|
intimately tied to the requirement of enabling different replicas to access the same
|
|
shared data for real-time work. It would therefore be unacceptable to introduce significant
|
|
latencies between two modifications.
|
|
|
|
Given the impossibility of fully satisfying both strong consistency and high availability,
|
|
we turn to the study of weak consistencies, specifically focusing on convergence. We define
|
|
a system as convergent if it adheres to the following property:
|
|
|
|
If replicas cease to propose modifications, then these same replicas must eventually
|
|
reach a consistent state.
|
|
|
|
Convergence (or Eventual Consistency) has been extensively studied, leading to the development
|
|
of various distributed data structures that aim to uphold convergence. However, convergence
|
|
alone does not resolve our problem. This property does not guarantee behaviors during execution,
|
|
where inconsistency within the system is permissible due to convergence. Simply achieving
|
|
eventual consistency in a document does not suffice to make it a satisfactory collaborative
|
|
editing application. We also need mechanisms to resolve conflicts, which are inevitable in
|
|
collaborative approaches. This conflict resolution must be carried out optimally to maximize
|
|
the preservation of the meaning intended by each modifying replica.
|
|
|
|
These issues have indeed been extensively studied, and the solutions proposed, particularly
|
|
suitable in our context, are the \textit{Replicated Data Types} (RDTs). There are two classes of RDTs:
|
|
|
|
Commutative Replicated Data Types (CmRDTs): Operations on these types yield the same result
|
|
regardless of the order of their local executions.
|
|
|
|
Convergent Replicated Data Types (CvRDTs): These types, for example, a system where data aims
|
|
to continuously grow, converge towards a maximal structure.
|
|
|
|
Both classes fall under the umbrella term of Conflict-free Replicated Data Types (CRDTs) and are
|
|
actually equivalent to each other \cite{ShapiroConflictFree2011}.
|
|
CRDTs provide a powerful framework for building distributed applications that require high availability
|
|
and eventual consistency. By ensuring that operations are commutative and can be merged across
|
|
eplicas without conflicts, CRDTs enable efficient conflict resolution and convergence of data
|
|
across distributed systems.
|
|
The study of CRDTs has significantly advanced our ability to design collaborative and resilient
|
|
distributed applications, offering a practical approach to dealing with the challenges posed by real-time
|
|
collaboration over unreliable and latency-prone networks.
|
|
|
|
\medskip
|
|
|
|
Furthermore, to provide truly secure solutions in a zero-trust context, the most challenging operational
|
|
conditions to consider are when servers or participating clients have been compromised and do not
|
|
strictly adhere to the protocol. In the literature, this is referred to as Byzantine behavior.
|
|
|
|
Given these difficult constraints of availability and security, ensuring strong consistency can be
|
|
very computationally and time-intensive. Application requirements are sometimes not compatible with
|
|
such operational conditions. Therefore, it becomes necessary to consider data with properties of
|
|
so-called \textit{weak consistency}.
|
|
|
|
Weak consistency models, such as eventual consistency offered by CRDTs, become valuable in such scenarios.
|
|
These models prioritize availability and partition tolerance while allowing for some degree of
|
|
inconsistency that can be resolved over time. They are designed to cope with the challenges of distributed
|
|
systems operating under non-ideal conditions, including the presence of Byzantine faults.
|
|
|
|
In zero-trust environments where malicious behaviors are a constant threat, adopting weak consistency models
|
|
can strike a balance between functionality, security, and operational feasibility. They provide pragmatic
|
|
solutions for building resilient and secure distributed applications that can withstand the challenges posed
|
|
by compromised nodes and unreliable network conditions.
|
|
|
|
\section*{State of the art}
|
|
|
|
The landscape of weak consistency properties is relatively complex, with three major families of weak
|
|
consistencies identified \cite{Raynal18}, \cite{MPBook}:
|
|
|
|
\begin{itemize}
|
|
\item Serializability
|
|
\item Causal Consistency
|
|
\item Eventual Strong Consistency
|
|
\end{itemize}
|
|
|
|
While eventual strong consistency is typically desired for collaborative applications, it is particularly
|
|
costly to achieve. Serializability, on the other hand, is simpler to implement but may result in transactions
|
|
that do not complete, requiring application-level error handling.
|
|
|
|
Causal consistency maintains the causal order perceived by each process and generally allows for the efficient
|
|
implementation of higher-level data structures.
|
|
|
|
For a comprehensive overview of these weak consistency models, readers can refer to M. Perrin's detailed
|
|
mapping \cite{MPBook}. Each of these models offers a different trade-off between consistency guarantees,
|
|
implementation complexity, and operational efficiency, making them suitable for different use cases and
|
|
application requirements. Understanding and selecting the appropriate weak consistency model is crucial for
|
|
designing effective and robust distributed systems, especially in the context of collaborative applications
|
|
operating in dynamic and unreliable environments.
|
|
|
|
\subsection*{Algorithmic Results}
|
|
|
|
The earliest work on secure collaborative tools in a high availability context dates back to 2009; however,
|
|
more systematic research on weak consistency security is quite recent. In 2009, Sing et al. introduced the
|
|
Zeno system, which was the first to propose a Byzantine algorithm favoring availability over strong consistency.
|
|
It provides Byzantine fault tolerance with potentially strong consistency \cite{SinghZeno2009}. The algorithm
|
|
experimentally demonstrated better availability performance compared to classical Byzantine algorithms.
|
|
|
|
Currently, there are primarily partial studies and solutions for causal consistency \cite{TsengDistributed2019}
|
|
and \cite{VanDerLindePractical2020}. Tseng et al. present exact computability bounds within a Byzantine
|
|
framework on one hand and provide an algorithm whose performance is compared with that of the Google Compute
|
|
platform. Van Der Linde et al. introduce a peer-to-peer system resilient to Byzantine attacks that offers causal
|
|
consistency guarantees. Their evaluation suggests that despite a peer-to-peer architecture, performance, especially
|
|
in terms of latency, is very good compared to a traditional client-server architecture.
|
|
|
|
In addition to these algorithms, Misra and Kshemkalyani demonstrated in \cite{MisraByzantine2021} that in an
|
|
asynchronous context, it is not possible to achieve causal consistency even with a single Byzantine participant.
|
|
|
|
One of the notable features of \cite{VanDerLindePractical2020} is its exploration of Byzantine failures within
|
|
the context of weak consistencies. A peer-to-peer system like that in \cite{MisraByzantine2021} prompts new
|
|
considerations where a participant leverages information from lower layers of replication to create attacks at
|
|
the application level.
|
|
|
|
Applying weak consistency criteria alone doesn't fully address the scope of our concerns. The cloud context
|
|
raises significant questions regarding data centralization and governance, with a market dominated by a few
|
|
major players to whom users must blindly entrust their data, posing substantial challenges to privacy and data
|
|
sovereignty.
|
|
|
|
In this context, integrating the notion of a zero-trust cloud is essential, anchoring our discussions in a
|
|
relevant approach from both industrial and regulatory perspectives. Zero-trust, as defined by NIST in SP 800-207
|
|
\cite{RoseZero2020}, is a security model that trusts no one and makes no assumptions about network security. It
|
|
helps guard against malicious behaviors by intermediaries, reducing the attack surface and confining Byzantine
|
|
behaviors solely to clients who have access to the data.
|
|
|
|
Certainly, the consideration of data-centric security alongside communication security is crucial. Adopting
|
|
"Data-Centric" approaches involves treating data itself as a dynamic entity within the system, assigning it
|
|
processes for access control and monitoring \cite{BayukDatacentric2009}. These issues represent growing concerns
|
|
and are addressed by state and inter-state actors, exemplified by NATO's stance on these matters through
|
|
STANAG 4774 and 4778. These topics have been extensively studied since the 2010s with works such as
|
|
\cite{GoyalAttributebased2006, MullerDistributed2009} defining solutions for attribute-based encryption,
|
|
issuing encryption keys based on rights to establish security policies. Other works like \cite{YanFlexible2017}
|
|
propose cloud-adapted solutions based on more flexible architectures with finer granularity in defining rights.
|
|
|
|
However, concerning zero-trust and data-centric security aspects, there is currently no academic consensus
|
|
on the formalization of these notions. These terms are subject to various interpretations, necessitating a
|
|
formal specification to understand which properties need to be satisfied to achieve weak consistency within
|
|
a zero-trust context.
|
|
|
|
\subsection*{Existing Implementations}
|
|
|
|
Currently, there are ongoing projects aimed at implementing weak consistency protocols for real-time collaborative
|
|
applications. One notable project is yjs \cite{Yjs2023}, which implements the YATA protocol \cite{NicolaescuRealTime2016}.
|
|
This protocol ensures strong convergence (or SEC, according to the Perrin reference) through a CRDT
|
|
(Conflict-free Replicated Data Type) system.
|
|
|
|
On the other hand, older projects like Etherpad use simpler conflict resolution solutions, also ensuring strong
|
|
convergence but employing more complex algorithmic operations in terms of memory and computation time compared
|
|
to CRDTs \cite{AppJetEtherpad2011}.
|
|
|
|
\section*{Goals}
|
|
|
|
The objectives of this thesis encompass studying the three types of weak consistency in a Byzantine setting and
|
|
defining efficient Byzantine algorithms for their implementation. Given that causal consistency is already well-studied,
|
|
the main focus of this thesis will be on the other two types of weak consistency.
|
|
|
|
The first stage (WP1) will involve studying Byzantine solutions without cryptographic primitives or with reasonably
|
|
cost-effective primitives, specifically excluding homomorphic computation. An analysis of existing implementations will
|
|
be conducted to determine the guarantees provided by these solutions within the vocabulary of weak consistencies.
|
|
|
|
The second stage (WP2) will focus on developing more efficient solutions using cryptographic primitives that require
|
|
advanced secret-sharing and/or homomorphic computation.
|
|
|
|
A final stage (WP3) will involve producing a proof-of-concept key/value storage solution using the algorithms
|
|
selected in the preceding stages.
|
|
|
|
\section*{Methodology and Planning}
|
|
|
|
A detailed review of distributed computing models, particularly focusing on solutions for causal consistency,
|
|
will be conducted to establish the set of theoretical and practical assumptions underlying these solutions.
|
|
Concurrently, in collaboration with Parsec, a list of attacks on weakly consistent peer-to-peer architectures
|
|
will be compiled. The emphasis will be on generating new knowledge, including novel solutions compared to the
|
|
current state of the art, as well as identifying new attack vectors.
|
|
|
|
The algorithms will undergo formal validation initially, followed by the development of a proof of concept.
|
|
|
|
WP1 will take place in 2024, WP2 in 2025, and WP3 in ZO26.
|
|
|
|
\section*{Monitoring and Exchange Terms}
|
|
|
|
Le doctorant participe aux réunions hebdomadaires de suivi de
|
|
l'entreprise Parsec. Les partenaires se rencontreront tous les trois
|
|
mois pour un point d'avancée sur les travaux.
|
|
|
|
Il participera également aux réunions physiques de
|
|
l'entreprise tous les 6 mois.
|
|
|
|
\section*{Material resources}
|
|
|
|
The Phd student will participate in Parsec's weekly progress meetings. Additionally, partners will convene
|
|
every three months for project status updates.
|
|
|
|
Furthermore, the student will attend in-person meetings at the company every six months.
|
|
|
|
\section*{Expected Benefits}
|
|
|
|
On the LIS laboratory side, the expected outcomes include the following scientific publications:
|
|
|
|
\begin{compactitem}
|
|
\item State-of-the-art review and synthesis concerning Byzantine fault tolerance in weak consistencies.
|
|
\item Proposals and proofs of new algorithms within the zero-trust context.
|
|
\end{compactitem}
|
|
|
|
For Parsec, the expected deliverables comprise a mini-model of cloud synchronization and collaboration,
|
|
a proof of concept for the aforementioned algorithms, and consultancy and expertise in the scientific
|
|
development of products created by Parsec.
|
|
|
|
\section*{Team}
|
|
|
|
\subsection*{Distributed Algorithmics Team (DALGO)}
|
|
|
|
The Distributed Algorithms team, led by Arnaud Labourel, is part of the Laboratory of Computer
|
|
Science and Systems (LIS CNRS UMR 7020). This research team is internationally recognized at the
|
|
highest level, comprising 8 permanent members whose interests span from reliable distributed
|
|
algorithms and confidentiality in distributed systems to communication networks, graph algorithms,
|
|
mobile agents, and IoT (Internet of Things).
|
|
|
|
\subsection*{Supervisors}
|
|
|
|
\textbf{Emmanuel Godard} is a professor at Aix-Marseille University. His research interests
|
|
primarily focus on understanding and maximizing decentralization (in a broad sense) in
|
|
distributed systems. He is an expert in distributed algorithms and computability.
|
|
|
|
\textbf{Corentin Travers} is an Associate Professor at Aix-Marseille University. His research
|
|
interests focus on robust and efficient distributed algorithms for shared-memory systems or
|
|
distributed networks. He is an expert in distributed algorithmics and complexity.
|
|
|
|
\textbf{Marcos Medrano} is an R\&D engineer at Parsec. He holds a master's degree in research
|
|
in computer science and applied mathematics. Marcos is responsible for the development
|
|
strategy of the Parsec product and facilitates collaboration between engineers and academic stakeholders.
|
|
|
|
\subsection*{Candidate Choice}
|
|
|
|
The DALGO team is involved in the "Reliability and Computer Security" Master's program at Aix-Marseille
|
|
University. This master's track is certified as \textit{SecNumEdu} by ANSSI
|
|
(National Cybersecurity Agency of France). In autumn 2022, a project in collaboration with the company
|
|
Parsec was presented to all master's students. Following this call for applications, Mr. Amaury Joly
|
|
was selected for a preliminary 6-month research internship on the topic of weak consistency at the
|
|
LIS laboratory.
|
|
|
|
Mr. Amaury Joly has achieved excellent academic results, earning a good mention in the master's
|
|
program. Additionally, he possesses a strong dual theoretical and technical profile, with a keen
|
|
motivation for research activities related to cloud security. He is the ideal candidate for such
|
|
a research topic.
|
|
|
|
{\footnotesize
|
|
\nocite{*}
|
|
|
|
\bibliography{sujet-cifre.bib}
|
|
\bibliographystyle{alpha}
|
|
}
|
|
|
|
% LaTeX2e code generated by txt2tags 3.4 (http://txt2tags.org)
|
|
% cmdline: txt2tags -t tex sujet-cifre.t2t
|
|
\end{document}
|