hard push
This commit is contained in:
380
docs/sujetThese/sujet-cifre.tex
Normal file
380
docs/sujetThese/sujet-cifre.tex
Normal file
@ -0,0 +1,380 @@
|
||||
\documentclass[11pt]{article}
|
||||
\usepackage{graphicx}
|
||||
\usepackage{paralist} %% needed for compact lists
|
||||
\usepackage[normalem]{ulem} %% needed by strike
|
||||
\usepackage[urlcolor=blue,colorlinks=true,breaklinks]{hyperref}
|
||||
\usepackage[utf8x]{inputenc} %% char encoding
|
||||
\usepackage{framed} %% frame multipages
|
||||
\usepackage{fullpage}
|
||||
\usepackage{a4wide}
|
||||
\usepackage{mathpazo} %% math & rm
|
||||
\linespread{1.05} %% Palatino needs more leading (space between lines)
|
||||
\usepackage[scaled]{helvet} %% ss
|
||||
\usepackage{courier} %% tt
|
||||
\normalfont
|
||||
\usepackage[T1]{fontenc}
|
||||
\usepackage[english]{babel} %% en englais
|
||||
\usepackage{xspace} %% gestion des espaces après une macro
|
||||
\usepackage{listings}
|
||||
\lstset{breaklines}
|
||||
\lstset{language=java}
|
||||
\lstset{escapechar=§}
|
||||
\usepackage{xcolor}
|
||||
|
||||
|
||||
\usepackage{comment} %%%% comment env
|
||||
|
||||
%%%%%%%%%%%%%%
|
||||
%% fancy et brouillon
|
||||
%% Date en haut de page
|
||||
%% A commenter pour la version finale
|
||||
\usepackage[margin=2.5cm]{geometry}
|
||||
\usepackage{fancyhdr}
|
||||
%% Header and footer
|
||||
\fancyhf{} %%clear head and footer
|
||||
\fancyhead[C]{\thepage} %%draft
|
||||
\renewcommand{\headrulewidth}{0pt} \renewcommand{\footrulewidth}{2pt}
|
||||
\fancyfoot[C]{\textsc{SUJETCOURT}}
|
||||
\fancypagestyle{premiere}{%% première page
|
||||
\fancyhf{} %%clear head and footer
|
||||
\fancyfoot[L]{\textbf{LIF}}
|
||||
\renewcommand{\headrulewidth}{0pt} \renewcommand{\footrulewidth}{2pt}
|
||||
\fancyfoot[C]{\textsc{SUJETCOURT}}
|
||||
\fancyhead[C]{}%%\includegraphics[scale=0.25]{logo-lif.png}} %%UFR
|
||||
}
|
||||
\fancypagestyle{notete}{%% première page
|
||||
\fancyhf{} %%clear head and footer
|
||||
\renewcommand{\headrulewidth}{0pt} \renewcommand{\footrulewidth}{2pt}
|
||||
\fancyfoot[C]{\textsc{Sujet}}
|
||||
}
|
||||
|
||||
\newcommand{\myversion}{\textit{version du \today{}}}
|
||||
|
||||
\pagestyle{plain}
|
||||
|
||||
\title{Weak Consistency for zero-trust cloud}
|
||||
\author{Research Subject}
|
||||
\begin{document}
|
||||
|
||||
\date{Emmanuel Godard (LIS) -- Corentin Travers (LIS)\\emmanuel.godard@lis-lab.fr et corentin.travers@lis-lab.fr}
|
||||
\maketitle
|
||||
|
||||
\textbf{Keywords:} Cloud, Security by design, Distributed Structures and Algorithms, Weak Consistencies, Byzantine systems
|
||||
|
||||
\section*{Summary}
|
||||
|
||||
Real-time collaborative applications are increasingly utilized in the context
|
||||
of remote work systems. These applications often rely on centralized client-server
|
||||
architectures, which pose security and privacy challenges. Data is stored on a
|
||||
centralized server, requiring users to trust a third party with their data management.
|
||||
Additionally, these architectures are often vulnerable to denial-of-service attacks
|
||||
and do not ensure data confidentiality.
|
||||
|
||||
To address these issues, we propose exploring information exchange solutions based
|
||||
on zero-trust and/or peer-to-peer architectures that eliminate the need for trusted
|
||||
third parties. These solutions would offer high-level security while ensuring system
|
||||
resilience. To maintain strong performance, especially in high availability scenarios,
|
||||
weak consistency models are frequently employed.
|
||||
|
||||
In this context, we propose studying weak consistency properties applied to
|
||||
cloud-related challenges. Initially, we will conduct a state-of-the-art review of
|
||||
Byzantine fault-tolerant solutions without cryptographic primitives, along with
|
||||
existing implementations (WP1). A second step will involve proposing more efficient
|
||||
solutions using cryptographic primitives (WP2). Finally, a proof-of-concept will be
|
||||
developed for a key-value storage solution using the algorithms selected in the
|
||||
previous stages (WP3).
|
||||
|
||||
\pagebreak
|
||||
|
||||
\section*{Problematic}
|
||||
|
||||
Since the pioneering work in the 1980s by Lamport \cite{LamportInterprocess1986}
|
||||
and Misra \cite{MisraAxioms1986}, replication management has been central to digital
|
||||
developments in terms of high availability. One of the fundamental challenges is
|
||||
to provide application developers with an abstraction of replicated memory that is
|
||||
both easy to use and enables flexible and fault-tolerant utilization of distributed resources.
|
||||
|
||||
This line of research has led to the concept of \textit{data consistency}, with its
|
||||
various forms tailored to suit the best compromises in usage and specificities of each application.
|
||||
|
||||
The current trend towards cloud-based deployment of software applications entails significant
|
||||
changes in usage patterns and development approaches for new applications. With the advent
|
||||
of user-friendly cloud services where infrastructure maintenance is outsourced to a provider,
|
||||
there's a noticeable centralization of resources. This reintroduces classic security issues,
|
||||
such as the need for trust/sovereignty or the risk of a \textit{single point of failure} (SPOF).
|
||||
|
||||
In response, new approaches termed \textit{zero-trust} have been proposed to continue using
|
||||
cloud resources without depending on any specific provider. These approaches require both
|
||||
multi-provider architectures and advanced cryptographic techniques.
|
||||
|
||||
\medskip
|
||||
|
||||
From a programmer's perspective, it's often advantageous to consider cloud-based applications
|
||||
as a single centralized system. This requires that the data structures used exhibit a
|
||||
property known as \textit{strong consistency}.
|
||||
|
||||
In real-world conditions, servers may have to endure very challenging operating conditions.
|
||||
It is well-known to both theorists and practitioners, through the CAP theorem
|
||||
(Consistency, Availability, Partition tolerance), that operational compromises are often
|
||||
necessary. Specifically, if strong consistency is desired, the computation time is proportional
|
||||
to the latency of \textbf{the entire} network, which in practice reduces availability.
|
||||
|
||||
Referring to the CAP theorem, applying strong consistency makes it impossible to implement
|
||||
a highly resilient system while providing a highly available application. Yet, both of
|
||||
these aspects can be essential in building a collaborative application.
|
||||
|
||||
The peer-to-peer approach indeed implies significant system resilience against failures.
|
||||
Replicas may become disconnected from one another and experience significant and uneven latency
|
||||
differences. The lack of control over the client's system and execution environment compels
|
||||
us to envision systems capable of withstanding the worst possible scenarios.
|
||||
|
||||
In the context of real-time collaboration applications, the need for high availability is
|
||||
intimately tied to the requirement of enabling different replicas to access the same
|
||||
shared data for real-time work. It would therefore be unacceptable to introduce significant
|
||||
latencies between two modifications.
|
||||
|
||||
Given the impossibility of fully satisfying both strong consistency and high availability,
|
||||
we turn to the study of weak consistencies, specifically focusing on convergence. We define
|
||||
a system as convergent if it adheres to the following property:
|
||||
|
||||
If replicas cease to propose modifications, then these same replicas must eventually
|
||||
reach a consistent state.
|
||||
|
||||
Convergence (or Eventual Consistency) has been extensively studied, leading to the development
|
||||
of various distributed data structures that aim to uphold convergence. However, convergence
|
||||
alone does not resolve our problem. This property does not guarantee behaviors during execution,
|
||||
where inconsistency within the system is permissible due to convergence. Simply achieving
|
||||
eventual consistency in a document does not suffice to make it a satisfactory collaborative
|
||||
editing application. We also need mechanisms to resolve conflicts, which are inevitable in
|
||||
collaborative approaches. This conflict resolution must be carried out optimally to maximize
|
||||
the preservation of the meaning intended by each modifying replica.
|
||||
|
||||
These issues have indeed been extensively studied, and the solutions proposed, particularly
|
||||
suitable in our context, are the \textit{Replicated Data Types} (RDTs). There are two classes of RDTs:
|
||||
|
||||
Commutative Replicated Data Types (CmRDTs): Operations on these types yield the same result
|
||||
regardless of the order of their local executions.
|
||||
|
||||
Convergent Replicated Data Types (CvRDTs): These types, for example, a system where data aims
|
||||
to continuously grow, converge towards a maximal structure.
|
||||
|
||||
Both classes fall under the umbrella term of Conflict-free Replicated Data Types (CRDTs) and are
|
||||
actually equivalent to each other \cite{ShapiroConflictFree2011}.
|
||||
CRDTs provide a powerful framework for building distributed applications that require high availability
|
||||
and eventual consistency. By ensuring that operations are commutative and can be merged across
|
||||
eplicas without conflicts, CRDTs enable efficient conflict resolution and convergence of data
|
||||
across distributed systems.
|
||||
The study of CRDTs has significantly advanced our ability to design collaborative and resilient
|
||||
distributed applications, offering a practical approach to dealing with the challenges posed by real-time
|
||||
collaboration over unreliable and latency-prone networks.
|
||||
|
||||
\medskip
|
||||
|
||||
Furthermore, to provide truly secure solutions in a zero-trust context, the most challenging operational
|
||||
conditions to consider are when servers or participating clients have been compromised and do not
|
||||
strictly adhere to the protocol. In the literature, this is referred to as Byzantine behavior.
|
||||
|
||||
Given these difficult constraints of availability and security, ensuring strong consistency can be
|
||||
very computationally and time-intensive. Application requirements are sometimes not compatible with
|
||||
such operational conditions. Therefore, it becomes necessary to consider data with properties of
|
||||
so-called \textit{weak consistency}.
|
||||
|
||||
Weak consistency models, such as eventual consistency offered by CRDTs, become valuable in such scenarios.
|
||||
These models prioritize availability and partition tolerance while allowing for some degree of
|
||||
inconsistency that can be resolved over time. They are designed to cope with the challenges of distributed
|
||||
systems operating under non-ideal conditions, including the presence of Byzantine faults.
|
||||
|
||||
In zero-trust environments where malicious behaviors are a constant threat, adopting weak consistency models
|
||||
can strike a balance between functionality, security, and operational feasibility. They provide pragmatic
|
||||
solutions for building resilient and secure distributed applications that can withstand the challenges posed
|
||||
by compromised nodes and unreliable network conditions.
|
||||
|
||||
\section*{State of the art}
|
||||
|
||||
The landscape of weak consistency properties is relatively complex, with three major families of weak
|
||||
consistencies identified \cite{Raynal18}, \cite{MPBook}:
|
||||
|
||||
\begin{itemize}
|
||||
\item Serializability
|
||||
\item Causal Consistency
|
||||
\item Eventual Strong Consistency
|
||||
\end{itemize}
|
||||
|
||||
While eventual strong consistency is typically desired for collaborative applications, it is particularly
|
||||
costly to achieve. Serializability, on the other hand, is simpler to implement but may result in transactions
|
||||
that do not complete, requiring application-level error handling.
|
||||
|
||||
Causal consistency maintains the causal order perceived by each process and generally allows for the efficient
|
||||
implementation of higher-level data structures.
|
||||
|
||||
For a comprehensive overview of these weak consistency models, readers can refer to M. Perrin's detailed
|
||||
mapping \cite{MPBook}. Each of these models offers a different trade-off between consistency guarantees,
|
||||
implementation complexity, and operational efficiency, making them suitable for different use cases and
|
||||
application requirements. Understanding and selecting the appropriate weak consistency model is crucial for
|
||||
designing effective and robust distributed systems, especially in the context of collaborative applications
|
||||
operating in dynamic and unreliable environments.
|
||||
|
||||
\subsection*{Algorithmic Results}
|
||||
|
||||
The earliest work on secure collaborative tools in a high availability context dates back to 2009; however,
|
||||
more systematic research on weak consistency security is quite recent. In 2009, Sing et al. introduced the
|
||||
Zeno system, which was the first to propose a Byzantine algorithm favoring availability over strong consistency.
|
||||
It provides Byzantine fault tolerance with potentially strong consistency \cite{SinghZeno2009}. The algorithm
|
||||
experimentally demonstrated better availability performance compared to classical Byzantine algorithms.
|
||||
|
||||
Currently, there are primarily partial studies and solutions for causal consistency \cite{TsengDistributed2019}
|
||||
and \cite{VanDerLindePractical2020}. Tseng et al. present exact computability bounds within a Byzantine
|
||||
framework on one hand and provide an algorithm whose performance is compared with that of the Google Compute
|
||||
platform. Van Der Linde et al. introduce a peer-to-peer system resilient to Byzantine attacks that offers causal
|
||||
consistency guarantees. Their evaluation suggests that despite a peer-to-peer architecture, performance, especially
|
||||
in terms of latency, is very good compared to a traditional client-server architecture.
|
||||
|
||||
In addition to these algorithms, Misra and Kshemkalyani demonstrated in \cite{MisraByzantine2021} that in an
|
||||
asynchronous context, it is not possible to achieve causal consistency even with a single Byzantine participant.
|
||||
|
||||
One of the notable features of \cite{VanDerLindePractical2020} is its exploration of Byzantine failures within
|
||||
the context of weak consistencies. A peer-to-peer system like that in \cite{MisraByzantine2021} prompts new
|
||||
considerations where a participant leverages information from lower layers of replication to create attacks at
|
||||
the application level.
|
||||
|
||||
Applying weak consistency criteria alone doesn't fully address the scope of our concerns. The cloud context
|
||||
raises significant questions regarding data centralization and governance, with a market dominated by a few
|
||||
major players to whom users must blindly entrust their data, posing substantial challenges to privacy and data
|
||||
sovereignty.
|
||||
|
||||
In this context, integrating the notion of a zero-trust cloud is essential, anchoring our discussions in a
|
||||
relevant approach from both industrial and regulatory perspectives. Zero-trust, as defined by NIST in SP 800-207
|
||||
\cite{RoseZero2020}, is a security model that trusts no one and makes no assumptions about network security. It
|
||||
helps guard against malicious behaviors by intermediaries, reducing the attack surface and confining Byzantine
|
||||
behaviors solely to clients who have access to the data.
|
||||
|
||||
Certainly, the consideration of data-centric security alongside communication security is crucial. Adopting
|
||||
"Data-Centric" approaches involves treating data itself as a dynamic entity within the system, assigning it
|
||||
processes for access control and monitoring \cite{BayukDatacentric2009}. These issues represent growing concerns
|
||||
and are addressed by state and inter-state actors, exemplified by NATO's stance on these matters through
|
||||
STANAG 4774 and 4778. These topics have been extensively studied since the 2010s with works such as
|
||||
\cite{GoyalAttributebased2006, MullerDistributed2009} defining solutions for attribute-based encryption,
|
||||
issuing encryption keys based on rights to establish security policies. Other works like \cite{YanFlexible2017}
|
||||
propose cloud-adapted solutions based on more flexible architectures with finer granularity in defining rights.
|
||||
|
||||
However, concerning zero-trust and data-centric security aspects, there is currently no academic consensus
|
||||
on the formalization of these notions. These terms are subject to various interpretations, necessitating a
|
||||
formal specification to understand which properties need to be satisfied to achieve weak consistency within
|
||||
a zero-trust context.
|
||||
|
||||
\subsection*{Existing Implementations}
|
||||
|
||||
Currently, there are ongoing projects aimed at implementing weak consistency protocols for real-time collaborative
|
||||
applications. One notable project is yjs \cite{Yjs2023}, which implements the YATA protocol \cite{NicolaescuRealTime2016}.
|
||||
This protocol ensures strong convergence (or SEC, according to the Perrin reference) through a CRDT
|
||||
(Conflict-free Replicated Data Type) system.
|
||||
|
||||
On the other hand, older projects like Etherpad use simpler conflict resolution solutions, also ensuring strong
|
||||
convergence but employing more complex algorithmic operations in terms of memory and computation time compared
|
||||
to CRDTs \cite{AppJetEtherpad2011}.
|
||||
|
||||
\section*{Goals}
|
||||
|
||||
The objectives of this thesis encompass studying the three types of weak consistency in a Byzantine setting and
|
||||
defining efficient Byzantine algorithms for their implementation. Given that causal consistency is already well-studied,
|
||||
the main focus of this thesis will be on the other two types of weak consistency.
|
||||
|
||||
The first stage (WP1) will involve studying Byzantine solutions without cryptographic primitives or with reasonably
|
||||
cost-effective primitives, specifically excluding homomorphic computation. An analysis of existing implementations will
|
||||
be conducted to determine the guarantees provided by these solutions within the vocabulary of weak consistencies.
|
||||
|
||||
The second stage (WP2) will focus on developing more efficient solutions using cryptographic primitives that require
|
||||
advanced secret-sharing and/or homomorphic computation.
|
||||
|
||||
A final stage (WP3) will involve producing a proof-of-concept key/value storage solution using the algorithms
|
||||
selected in the preceding stages.
|
||||
|
||||
\section*{Methodology and Planning}
|
||||
|
||||
A detailed review of distributed computing models, particularly focusing on solutions for causal consistency,
|
||||
will be conducted to establish the set of theoretical and practical assumptions underlying these solutions.
|
||||
Concurrently, in collaboration with Parsec, a list of attacks on weakly consistent peer-to-peer architectures
|
||||
will be compiled. The emphasis will be on generating new knowledge, including novel solutions compared to the
|
||||
current state of the art, as well as identifying new attack vectors.
|
||||
|
||||
The algorithms will undergo formal validation initially, followed by the development of a proof of concept.
|
||||
|
||||
WP1 will take place in 2024, WP2 in 2025, and WP3 in ZO26.
|
||||
|
||||
\section*{Monitoring and Exchange Terms}
|
||||
|
||||
Le doctorant participe aux réunions hebdomadaires de suivi de
|
||||
l'entreprise Parsec. Les partenaires se rencontreront tous les trois
|
||||
mois pour un point d'avancée sur les travaux.
|
||||
|
||||
Il participera également aux réunions physiques de
|
||||
l'entreprise tous les 6 mois.
|
||||
|
||||
\section*{Material resources}
|
||||
|
||||
The Phd student will participate in Parsec's weekly progress meetings. Additionally, partners will convene
|
||||
every three months for project status updates.
|
||||
|
||||
Furthermore, the student will attend in-person meetings at the company every six months.
|
||||
|
||||
\section*{Expected Benefits}
|
||||
|
||||
On the LIS laboratory side, the expected outcomes include the following scientific publications:
|
||||
|
||||
\begin{compactitem}
|
||||
\item State-of-the-art review and synthesis concerning Byzantine fault tolerance in weak consistencies.
|
||||
\item Proposals and proofs of new algorithms within the zero-trust context.
|
||||
\end{compactitem}
|
||||
|
||||
For Parsec, the expected deliverables comprise a mini-model of cloud synchronization and collaboration,
|
||||
a proof of concept for the aforementioned algorithms, and consultancy and expertise in the scientific
|
||||
development of products created by Parsec.
|
||||
|
||||
\section*{Team}
|
||||
|
||||
\subsection*{Distributed Algorithmics Team (DALGO)}
|
||||
|
||||
The Distributed Algorithms team, led by Arnaud Labourel, is part of the Laboratory of Computer
|
||||
Science and Systems (LIS CNRS UMR 7020). This research team is internationally recognized at the
|
||||
highest level, comprising 8 permanent members whose interests span from reliable distributed
|
||||
algorithms and confidentiality in distributed systems to communication networks, graph algorithms,
|
||||
mobile agents, and IoT (Internet of Things).
|
||||
|
||||
\subsection*{Supervisors}
|
||||
|
||||
\textbf{Emmanuel Godard} is a professor at Aix-Marseille University. His research interests
|
||||
primarily focus on understanding and maximizing decentralization (in a broad sense) in
|
||||
distributed systems. He is an expert in distributed algorithms and computability.
|
||||
|
||||
\textbf{Corentin Travers} is an Associate Professor at Aix-Marseille University. His research
|
||||
interests focus on robust and efficient distributed algorithms for shared-memory systems or
|
||||
distributed networks. He is an expert in distributed algorithmics and complexity.
|
||||
|
||||
\textbf{Marcos Medrano} is an R\&D engineer at Parsec. He holds a master's degree in research
|
||||
in computer science and applied mathematics. Marcos is responsible for the development
|
||||
strategy of the Parsec product and facilitates collaboration between engineers and academic stakeholders.
|
||||
|
||||
\subsection*{Candidate Choice}
|
||||
|
||||
The DALGO team is involved in the "Reliability and Computer Security" Master's program at Aix-Marseille
|
||||
University. This master's track is certified as \textit{SecNumEdu} by ANSSI
|
||||
(National Cybersecurity Agency of France). In autumn 2022, a project in collaboration with the company
|
||||
Parsec was presented to all master's students. Following this call for applications, Mr. Amaury Joly
|
||||
was selected for a preliminary 6-month research internship on the topic of weak consistency at the
|
||||
LIS laboratory.
|
||||
|
||||
Mr. Amaury Joly has achieved excellent academic results, earning a good mention in the master's
|
||||
program. Additionally, he possesses a strong dual theoretical and technical profile, with a keen
|
||||
motivation for research activities related to cloud security. He is the ideal candidate for such
|
||||
a research topic.
|
||||
|
||||
{\footnotesize
|
||||
\nocite{*}
|
||||
|
||||
\bibliography{sujet-cifre.bib}
|
||||
\bibliographystyle{alpha}
|
||||
}
|
||||
|
||||
% LaTeX2e code generated by txt2tags 3.4 (http://txt2tags.org)
|
||||
% cmdline: txt2tags -t tex sujet-cifre.t2t
|
||||
\end{document}
|
Reference in New Issue
Block a user