bwconsistency/docs/sujetThese/sujet-cifre.tex

\documentclass[11pt]{article}
\usepackage{graphicx}
\usepackage{paralist} %% needed for compact lists
\usepackage[normalem]{ulem} %% needed by strike
\usepackage[urlcolor=blue,colorlinks=true,breaklinks]{hyperref}
\usepackage[utf8x]{inputenc}  %% char encoding
\usepackage{framed} %% frame multipages
\usepackage{fullpage}
\usepackage{a4wide}
\usepackage{mathpazo} %% math & rm
\linespread{1.05}        %% Palatino needs more leading (space between lines)
\usepackage[scaled]{helvet} %% ss
\usepackage{courier} %% tt
\normalfont
\usepackage[T1]{fontenc}
\usepackage[english]{babel} %% en englais
\usepackage{xspace} %% gestion des espaces après une macro
\usepackage{listings}
\lstset{breaklines}
\lstset{language=java}
\lstset{escapechar=§}
\usepackage{xcolor}


\usepackage{comment} %%%% comment env

%%%%%%%%%%%%%%
%% fancy et brouillon
%% Date en haut de page
%% A commenter pour la version finale
\usepackage[margin=2.5cm]{geometry}
\usepackage{fancyhdr}
%% Header and footer
\fancyhf{} %%clear head and footer
\fancyhead[C]{\thepage} %%draft
\renewcommand{\headrulewidth}{0pt} \renewcommand{\footrulewidth}{2pt}
\fancyfoot[C]{\textsc{SUJETCOURT}}
\fancypagestyle{premiere}{%% première page
\fancyhf{} %%clear head and footer
\fancyfoot[L]{\textbf{LIF}}
\renewcommand{\headrulewidth}{0pt} \renewcommand{\footrulewidth}{2pt}
\fancyfoot[C]{\textsc{SUJETCOURT}}
\fancyhead[C]{}%%\includegraphics[scale=0.25]{logo-lif.png}} %%UFR
}
\fancypagestyle{notete}{%% première page
\fancyhf{} %%clear head and footer
\renewcommand{\headrulewidth}{0pt} \renewcommand{\footrulewidth}{2pt}
\fancyfoot[C]{\textsc{Sujet}}
}

\newcommand{\myversion}{\textit{version du \today{}}}

\pagestyle{plain}

\title{Weak Consistency for zero-trust cloud}
\author{Research Subject}
\begin{document}

\date{Emmanuel Godard (LIS) -- Corentin Travers (LIS)\\emmanuel.godard@lis-lab.fr et corentin.travers@lis-lab.fr}
\maketitle

\textbf{Keywords:} Cloud, Security by design, Distributed Structures and Algorithms, Weak Consistencies, Byzantine systems

\section*{Summary}

Real-time collaborative applications are increasingly utilized in the context
of remote work systems. These applications often rely on centralized client-server
architectures, which pose security and privacy challenges. Data is stored on a
centralized server, requiring users to trust a third party with their data management.
Additionally, these architectures are often vulnerable to denial-of-service attacks
and do not ensure data confidentiality.

To address these issues, we propose exploring information exchange solutions based
on zero-trust and/or peer-to-peer architectures that eliminate the need for trusted
third parties. These solutions would offer high-level security while ensuring system
resilience. To maintain strong performance, especially in high availability scenarios,
weak consistency models are frequently employed.

In this context, we propose studying weak consistency properties applied to
cloud-related challenges. Initially, we will conduct a state-of-the-art review of
Byzantine fault-tolerant solutions without cryptographic primitives, along with
existing implementations (WP1). A second step will involve proposing more efficient
solutions using cryptographic primitives (WP2). Finally, a proof-of-concept will be
developed for a key-value storage solution using the algorithms selected in the
previous stages (WP3).

\pagebreak

\section*{Problematic}

Since the pioneering work in the 1980s by Lamport \cite{LamportInterprocess1986}
and Misra \cite{MisraAxioms1986}, replication management has been central to digital
developments in terms of high availability. One of the fundamental challenges is
to provide application developers with an abstraction of replicated memory that is
both easy to use and enables flexible and fault-tolerant utilization of distributed resources.

This line of research has led to the concept of \textit{data consistency}, with its
various forms tailored to suit the best compromises in usage and specificities of each application.

The current trend towards cloud-based deployment of software applications entails significant
changes in usage patterns and development approaches for new applications. With the advent
of user-friendly cloud services where infrastructure maintenance is outsourced to a provider,
there's a noticeable centralization of resources. This reintroduces classic security issues,
such as the need for trust/sovereignty or the risk of a \textit{single point of failure} (SPOF).

In response, new approaches termed \textit{zero-trust} have been proposed to continue using
cloud resources without depending on any specific provider. These approaches require both
multi-provider architectures and advanced cryptographic techniques.

\medskip

From a programmer's perspective, it's often advantageous to consider cloud-based applications
as a single centralized system. This requires that the data structures used exhibit a
property known as \textit{strong consistency}.

In real-world conditions, servers may have to endure very challenging operating conditions.
It is well-known to both theorists and practitioners, through the CAP theorem
(Consistency, Availability, Partition tolerance), that operational compromises are often
necessary. Specifically, if strong consistency is desired, the computation time is proportional
to the latency of \textbf{the entire} network, which in practice reduces availability.

Referring to the CAP theorem, applying strong consistency makes it impossible to implement
a highly resilient system while providing a highly available application. Yet, both of
these aspects can be essential in building a collaborative application.

The peer-to-peer approach indeed implies significant system resilience against failures.
Replicas may become disconnected from one another and experience significant and uneven latency
differences. The lack of control over the client's system and execution environment compels
us to envision systems capable of withstanding the worst possible scenarios.

In the context of real-time collaboration applications, the need for high availability is
intimately tied to the requirement of enabling different replicas to access the same
shared data for real-time work. It would therefore be unacceptable to introduce significant
latencies between two modifications.

Given the impossibility of fully satisfying both strong consistency and high availability,
we turn to the study of weak consistencies, specifically focusing on convergence. We define
a system as convergent if it adheres to the following property:

If replicas cease to propose modifications, then these same replicas must eventually
reach a consistent state.

Convergence (or Eventual Consistency) has been extensively studied, leading to the development
of various distributed data structures that aim to uphold convergence. However, convergence
alone does not resolve our problem. This property does not guarantee behaviors during execution,
where inconsistency within the system is permissible due to convergence. Simply achieving
eventual consistency in a document does not suffice to make it a satisfactory collaborative
editing application. We also need mechanisms to resolve conflicts, which are inevitable in
collaborative approaches. This conflict resolution must be carried out optimally to maximize
the preservation of the meaning intended by each modifying replica.

These issues have indeed been extensively studied, and the solutions proposed, particularly
suitable in our context, are the \textit{Replicated Data Types} (RDTs). There are two classes of RDTs:

Commutative Replicated Data Types (CmRDTs): Operations on these types yield the same result
regardless of the order of their local executions.

Convergent Replicated Data Types (CvRDTs): These types, for example, a system where data aims
to continuously grow, converge towards a maximal structure.

Both classes fall under the umbrella term of Conflict-free Replicated Data Types (CRDTs) and are
actually equivalent to each other \cite{ShapiroConflictFree2011}.
CRDTs provide a powerful framework for building distributed applications that require high availability
and eventual consistency. By ensuring that operations are commutative and can be merged across
eplicas without conflicts, CRDTs enable efficient conflict resolution and convergence of data
across distributed systems.
The study of CRDTs has significantly advanced our ability to design collaborative and resilient
distributed applications, offering a practical approach to dealing with the challenges posed by real-time
collaboration over unreliable and latency-prone networks.

\medskip

Furthermore, to provide truly secure solutions in a zero-trust context, the most challenging operational
conditions to consider are when servers or participating clients have been compromised and do not
strictly adhere to the protocol. In the literature, this is referred to as Byzantine behavior.

Given these difficult constraints of availability and security, ensuring strong consistency can be
very computationally and time-intensive. Application requirements are sometimes not compatible with
such operational conditions. Therefore, it becomes necessary to consider data with properties of
so-called \textit{weak consistency}.

Weak consistency models, such as eventual consistency offered by CRDTs, become valuable in such scenarios.
These models prioritize availability and partition tolerance while allowing for some degree of
inconsistency that can be resolved over time. They are designed to cope with the challenges of distributed
systems operating under non-ideal conditions, including the presence of Byzantine faults.

In zero-trust environments where malicious behaviors are a constant threat, adopting weak consistency models
can strike a balance between functionality, security, and operational feasibility. They provide pragmatic
solutions for building resilient and secure distributed applications that can withstand the challenges posed
by compromised nodes and unreliable network conditions.

\section*{State of the art}

The landscape of weak consistency properties is relatively complex, with three major families of weak
consistencies identified \cite{Raynal18}, \cite{MPBook}:

\begin{itemize}
  \item Serializability
  \item Causal Consistency
  \item Eventual Strong Consistency
\end{itemize}

While eventual strong consistency is typically desired for collaborative applications, it is particularly
costly to achieve. Serializability, on the other hand, is simpler to implement but may result in transactions
that do not complete, requiring application-level error handling.

Causal consistency maintains the causal order perceived by each process and generally allows for the efficient
implementation of higher-level data structures.

For a comprehensive overview of these weak consistency models, readers can refer to M. Perrin's detailed
mapping \cite{MPBook}. Each of these models offers a different trade-off between consistency guarantees,
implementation complexity, and operational efficiency, making them suitable for different use cases and
application requirements. Understanding and selecting the appropriate weak consistency model is crucial for
designing effective and robust distributed systems, especially in the context of collaborative applications
operating in dynamic and unreliable environments.

\subsection*{Algorithmic Results}

The earliest work on secure collaborative tools in a high availability context dates back to 2009; however,
more systematic research on weak consistency security is quite recent. In 2009, Sing et al. introduced the
Zeno system, which was the first to propose a Byzantine algorithm favoring availability over strong consistency.
It provides Byzantine fault tolerance with potentially strong consistency \cite{SinghZeno2009}. The algorithm
experimentally demonstrated better availability performance compared to classical Byzantine algorithms.

Currently, there are primarily partial studies and solutions for causal consistency \cite{TsengDistributed2019}
and \cite{VanDerLindePractical2020}. Tseng et al. present exact computability bounds within a Byzantine
framework on one hand and provide an algorithm whose performance is compared with that of the Google Compute
platform. Van Der Linde et al. introduce a peer-to-peer system resilient to Byzantine attacks that offers causal
consistency guarantees. Their evaluation suggests that despite a peer-to-peer architecture, performance, especially
in terms of latency, is very good compared to a traditional client-server architecture.

In addition to these algorithms, Misra and Kshemkalyani demonstrated in \cite{MisraByzantine2021} that in an
asynchronous context, it is not possible to achieve causal consistency even with a single Byzantine participant.

One of the notable features of \cite{VanDerLindePractical2020} is its exploration of Byzantine failures within
the context of weak consistencies. A peer-to-peer system like that in \cite{MisraByzantine2021} prompts new
considerations where a participant leverages information from lower layers of replication to create attacks at
the application level.

Applying weak consistency criteria alone doesn't fully address the scope of our concerns. The cloud context
raises significant questions regarding data centralization and governance, with a market dominated by a few
major players to whom users must blindly entrust their data, posing substantial challenges to privacy and data
sovereignty.

In this context, integrating the notion of a zero-trust cloud is essential, anchoring our discussions in a
relevant approach from both industrial and regulatory perspectives. Zero-trust, as defined by NIST in SP 800-207
\cite{RoseZero2020}, is a security model that trusts no one and makes no assumptions about network security. It
helps guard against malicious behaviors by intermediaries, reducing the attack surface and confining Byzantine
behaviors solely to clients who have access to the data.

Certainly, the consideration of data-centric security alongside communication security is crucial. Adopting
"Data-Centric" approaches involves treating data itself as a dynamic entity within the system, assigning it
processes for access control and monitoring \cite{BayukDatacentric2009}. These issues represent growing concerns
and are addressed by state and inter-state actors, exemplified by NATO's stance on these matters through
STANAG 4774 and 4778. These topics have been extensively studied since the 2010s with works such as
\cite{GoyalAttributebased2006, MullerDistributed2009} defining solutions for attribute-based encryption,
issuing encryption keys based on rights to establish security policies. Other works like \cite{YanFlexible2017}
propose cloud-adapted solutions based on more flexible architectures with finer granularity in defining rights.

However, concerning zero-trust and data-centric security aspects, there is currently no academic consensus
on the formalization of these notions. These terms are subject to various interpretations, necessitating a
formal specification to understand which properties need to be satisfied to achieve weak consistency within
a zero-trust context.

\subsection*{Existing Implementations}

Currently, there are ongoing projects aimed at implementing weak consistency protocols for real-time collaborative
applications. One notable project is yjs \cite{Yjs2023}, which implements the YATA protocol \cite{NicolaescuRealTime2016}.
This protocol ensures strong convergence (or SEC, according to the Perrin reference) through a CRDT
(Conflict-free Replicated Data Type) system.

On the other hand, older projects like Etherpad use simpler conflict resolution solutions, also ensuring strong
convergence but employing more complex algorithmic operations in terms of memory and computation time compared
to CRDTs \cite{AppJetEtherpad2011}.

\section*{Goals}

The objectives of this thesis encompass studying the three types of weak consistency in a Byzantine setting and
defining efficient Byzantine algorithms for their implementation. Given that causal consistency is already well-studied,
the main focus of this thesis will be on the other two types of weak consistency.

The first stage (WP1) will involve studying Byzantine solutions without cryptographic primitives or with reasonably
cost-effective primitives, specifically excluding homomorphic computation. An analysis of existing implementations will
be conducted to determine the guarantees provided by these solutions within the vocabulary of weak consistencies.

The second stage (WP2) will focus on developing more efficient solutions using cryptographic primitives that require
advanced secret-sharing and/or homomorphic computation.

A final stage (WP3) will involve producing a proof-of-concept key/value storage solution using the algorithms
selected in the preceding stages.

\section*{Methodology and Planning}

A detailed review of distributed computing models, particularly focusing on solutions for causal consistency,
will be conducted to establish the set of theoretical and practical assumptions underlying these solutions.
Concurrently, in collaboration with Parsec, a list of attacks on weakly consistent peer-to-peer architectures
will be compiled. The emphasis will be on generating new knowledge, including novel solutions compared to the
current state of the art, as well as identifying new attack vectors.

The algorithms will undergo formal validation initially, followed by the development of a proof of concept.

WP1 will take place in 2024, WP2 in 2025, and WP3 in ZO26.

\section*{Monitoring and Exchange Terms}

Le doctorant participe aux réunions hebdomadaires de suivi de
l'entreprise Parsec. Les partenaires se rencontreront tous les trois
mois pour un point d'avancée sur les travaux.

Il participera également aux réunions physiques de
l'entreprise tous les 6 mois.

\section*{Material resources}

The Phd student will participate in Parsec's weekly progress meetings. Additionally, partners will convene
every three months for project status updates.

Furthermore, the student will attend in-person meetings at the company every six months.

\section*{Expected Benefits}

On the LIS laboratory side, the expected outcomes include the following scientific publications:

\begin{compactitem}
\item State-of-the-art review and synthesis concerning Byzantine fault tolerance in weak consistencies.
\item Proposals and proofs of new algorithms within the zero-trust context.
\end{compactitem}

For Parsec, the expected deliverables comprise a mini-model of cloud synchronization and collaboration,
a proof of concept for the aforementioned algorithms, and consultancy and expertise in the scientific
development of products created by Parsec.

\section*{Team}

\subsection*{Distributed Algorithmics Team (DALGO)}

The Distributed Algorithms team, led by Arnaud Labourel, is part of the Laboratory of Computer
Science and Systems (LIS CNRS UMR 7020). This research team is internationally recognized at the
highest level, comprising 8 permanent members whose interests span from reliable distributed
algorithms and confidentiality in distributed systems to communication networks, graph algorithms,
mobile agents, and IoT (Internet of Things).

\subsection*{Supervisors}

\textbf{Emmanuel Godard}  is a professor at Aix-Marseille University. His research interests
  primarily focus on understanding and maximizing decentralization (in a broad sense) in
  distributed systems. He is an expert in distributed algorithms and computability.

\textbf{Corentin Travers}  is an Associate Professor at Aix-Marseille University. His research
  interests focus on robust and efficient distributed algorithms for shared-memory systems or
  distributed networks. He is an expert in distributed algorithmics and complexity.

\textbf{Marcos Medrano} is an R\&D engineer at Parsec. He holds a master's degree in research
  in computer science and applied mathematics. Marcos is responsible for the development
  strategy of the Parsec product and facilitates collaboration between engineers and academic stakeholders.

\subsection*{Candidate Choice}

The DALGO team is involved in the "Reliability and Computer Security" Master's program at Aix-Marseille
University. This master's track is certified as \textit{SecNumEdu} by ANSSI
(National Cybersecurity Agency of France). In autumn 2022, a project in collaboration with the company
Parsec was presented to all master's students. Following this call for applications, Mr. Amaury Joly
was selected for a preliminary 6-month research internship on the topic of weak consistency at the
LIS laboratory.

Mr. Amaury Joly has achieved excellent academic results, earning a good mention in the master's
program. Additionally, he possesses a strong dual theoretical and technical profile, with a keen
motivation for research activities related to cloud security. He is the ideal candidate for such
a research topic.

{\footnotesize
  \nocite{*}

  \bibliography{sujet-cifre.bib}
  \bibliographystyle{alpha}
}

% LaTeX2e code generated by txt2tags 3.4 (http://txt2tags.org)
% cmdline: txt2tags -t tex sujet-cifre.t2t
\end{document}