\documentclass[11pt]{article}
\usepackage{graphicx}
\usepackage{paralist} %% needed for compact lists
\usepackage[normalem]{ulem} %% needed by strike
\usepackage[urlcolor=blue,colorlinks=true,breaklinks]{hyperref}
\usepackage[utf8x]{inputenc}  %% char encoding
\usepackage{framed} %% frame multipages
\usepackage{fullpage}
\usepackage{a4wide}
\usepackage{mathpazo} %% math & rm
\linespread{1.05}        %% Palatino needs more leading (space between lines)
\usepackage[scaled]{helvet} %% ss
\usepackage{courier} %% tt
\normalfont
\usepackage[T1]{fontenc}
\usepackage[english]{babel} %% en englais
\usepackage{xspace} %% gestion des espaces après une macro
\usepackage{listings}
\lstset{breaklines}
\lstset{language=java}
\lstset{escapechar=§}
\usepackage{xcolor}


\usepackage{comment} %%%% comment env

%%%%%%%%%%%%%%
%% fancy et brouillon
%% Date en haut de page
%% A commenter pour la version finale
\usepackage[margin=2.5cm]{geometry}
\usepackage{fancyhdr}
%% Header and footer
\fancyhf{} %%clear head and footer
\fancyhead[C]{\thepage} %%draft
\renewcommand{\headrulewidth}{0pt} \renewcommand{\footrulewidth}{2pt}
\fancyfoot[C]{\textsc{SUJETCOURT}}
\fancypagestyle{premiere}{%% première page
\fancyhf{} %%clear head and footer
\fancyfoot[L]{\textbf{LIF}}
\renewcommand{\headrulewidth}{0pt} \renewcommand{\footrulewidth}{2pt}
\fancyfoot[C]{\textsc{SUJETCOURT}}
\fancyhead[C]{}%%\includegraphics[scale=0.25]{logo-lif.png}} %%UFR
}
\fancypagestyle{notete}{%% première page
\fancyhf{} %%clear head and footer
\renewcommand{\headrulewidth}{0pt} \renewcommand{\footrulewidth}{2pt}
\fancyfoot[C]{\textsc{Sujet}}
}

\newcommand{\myversion}{\textit{version du \today{}}}

\pagestyle{plain}

\title{Weak Consistency for zero-trust cloud}
\author{Research Subject}
\begin{document}

\date{Emmanuel Godard (LIS) -- Corentin Travers (LIS)\\emmanuel.godard@lis-lab.fr et corentin.travers@lis-lab.fr}
\maketitle

\textbf{Keywords:} Cloud, Security by design, Distributed Structures and Algorithms, Weak Consistencies, Byzantine systems

\section*{Summary}

Real-time collaborative applications are increasingly utilized in the context 
of remote work systems. These applications often rely on centralized client-server 
architectures, which pose security and privacy challenges. Data is stored on a 
centralized server, requiring users to trust a third party with their data management. 
Additionally, these architectures are often vulnerable to denial-of-service attacks 
and do not ensure data confidentiality.

To address these issues, we propose exploring information exchange solutions based 
on zero-trust and/or peer-to-peer architectures that eliminate the need for trusted 
third parties. These solutions would offer high-level security while ensuring system 
resilience. To maintain strong performance, especially in high availability scenarios, 
weak consistency models are frequently employed.

In this context, we propose studying weak consistency properties applied to 
cloud-related challenges. Initially, we will conduct a state-of-the-art review of 
Byzantine fault-tolerant solutions without cryptographic primitives, along with 
existing implementations (WP1). A second step will involve proposing more efficient 
solutions using cryptographic primitives (WP2). Finally, a proof-of-concept will be 
developed for a key-value storage solution using the algorithms selected in the 
previous stages (WP3).

\pagebreak

\section*{Problematic}

Since the pioneering work in the 1980s by Lamport \cite{LamportInterprocess1986} 
and Misra \cite{MisraAxioms1986}, replication management has been central to digital 
developments in terms of high availability. One of the fundamental challenges is 
to provide application developers with an abstraction of replicated memory that is 
both easy to use and enables flexible and fault-tolerant utilization of distributed resources.

This line of research has led to the concept of \textit{data consistency}, with its 
various forms tailored to suit the best compromises in usage and specificities of each application.

The current trend towards cloud-based deployment of software applications entails significant 
changes in usage patterns and development approaches for new applications. With the advent 
of user-friendly cloud services where infrastructure maintenance is outsourced to a provider, 
there's a noticeable centralization of resources. This reintroduces classic security issues, 
such as the need for trust/sovereignty or the risk of a \textit{single point of failure} (SPOF).

In response, new approaches termed \textit{zero-trust} have been proposed to continue using 
cloud resources without depending on any specific provider. These approaches require both 
multi-provider architectures and advanced cryptographic techniques. 

\medskip

From a programmer's perspective, it's often advantageous to consider cloud-based applications 
as a single centralized system. This requires that the data structures used exhibit a 
property known as \textit{strong consistency}.

In real-world conditions, servers may have to endure very challenging operating conditions. 
It is well-known to both theorists and practitioners, through the CAP theorem 
(Consistency, Availability, Partition tolerance), that operational compromises are often 
necessary. Specifically, if strong consistency is desired, the computation time is proportional 
to the latency of \textbf{the entire} network, which in practice reduces availability.

Referring to the CAP theorem, applying strong consistency makes it impossible to implement 
a highly resilient system while providing a highly available application. Yet, both of 
these aspects can be essential in building a collaborative application.

The peer-to-peer approach indeed implies significant system resilience against failures. 
Replicas may become disconnected from one another and experience significant and uneven latency 
differences. The lack of control over the client's system and execution environment compels 
us to envision systems capable of withstanding the worst possible scenarios. 

In the context of real-time collaboration applications, the need for high availability is 
intimately tied to the requirement of enabling different replicas to access the same 
shared data for real-time work. It would therefore be unacceptable to introduce significant 
latencies between two modifications.

Given the impossibility of fully satisfying both strong consistency and high availability, 
we turn to the study of weak consistencies, specifically focusing on convergence. We define 
a system as convergent if it adheres to the following property:

If replicas cease to propose modifications, then these same replicas must eventually 
reach a consistent state.

Convergence (or Eventual Consistency) has been extensively studied, leading to the development 
of various distributed data structures that aim to uphold convergence. However, convergence 
alone does not resolve our problem. This property does not guarantee behaviors during execution, 
where inconsistency within the system is permissible due to convergence. Simply achieving 
eventual consistency in a document does not suffice to make it a satisfactory collaborative 
editing application. We also need mechanisms to resolve conflicts, which are inevitable in 
collaborative approaches. This conflict resolution must be carried out optimally to maximize 
the preservation of the meaning intended by each modifying replica.

These issues have indeed been extensively studied, and the solutions proposed, particularly 
suitable in our context, are the \textit{Replicated Data Types} (RDTs). There are two classes of RDTs:

Commutative Replicated Data Types (CmRDTs): Operations on these types yield the same result 
regardless of the order of their local executions.

Convergent Replicated Data Types (CvRDTs): These types, for example, a system where data aims 
to continuously grow, converge towards a maximal structure.

Both classes fall under the umbrella term of Conflict-free Replicated Data Types (CRDTs) and are 
actually equivalent to each other \cite{ShapiroConflictFree2011}.
CRDTs provide a powerful framework for building distributed applications that require high availability 
and eventual consistency. By ensuring that operations are commutative and can be merged across 
eplicas without conflicts, CRDTs enable efficient conflict resolution and convergence of data 
across distributed systems.
The study of CRDTs has significantly advanced our ability to design collaborative and resilient 
distributed applications, offering a practical approach to dealing with the challenges posed by real-time 
collaboration over unreliable and latency-prone networks.

\medskip

Furthermore, to provide truly secure solutions in a zero-trust context, the most challenging operational
conditions to consider are when servers or participating clients have been compromised and do not 
strictly adhere to the protocol. In the literature, this is referred to as Byzantine behavior.

Given these difficult constraints of availability and security, ensuring strong consistency can be 
very computationally and time-intensive. Application requirements are sometimes not compatible with 
such operational conditions. Therefore, it becomes necessary to consider data with properties of 
so-called \textit{weak consistency}.

Weak consistency models, such as eventual consistency offered by CRDTs, become valuable in such scenarios. 
These models prioritize availability and partition tolerance while allowing for some degree of 
inconsistency that can be resolved over time. They are designed to cope with the challenges of distributed 
systems operating under non-ideal conditions, including the presence of Byzantine faults.

In zero-trust environments where malicious behaviors are a constant threat, adopting weak consistency models 
can strike a balance between functionality, security, and operational feasibility. They provide pragmatic 
solutions for building resilient and secure distributed applications that can withstand the challenges posed 
by compromised nodes and unreliable network conditions.

\section*{State of the art}

The landscape of weak consistency properties is relatively complex, with three major families of weak 
consistencies identified \cite{Raynal18}, \cite{MPBook}:

\begin{itemize}
  \item Serializability
  \item Causal Consistency
  \item Eventual Strong Consistency
\end{itemize}

While eventual strong consistency is typically desired for collaborative applications, it is particularly 
costly to achieve. Serializability, on the other hand, is simpler to implement but may result in transactions 
that do not complete, requiring application-level error handling.

Causal consistency maintains the causal order perceived by each process and generally allows for the efficient 
implementation of higher-level data structures.

For a comprehensive overview of these weak consistency models, readers can refer to M. Perrin's detailed 
mapping \cite{MPBook}. Each of these models offers a different trade-off between consistency guarantees, 
implementation complexity, and operational efficiency, making them suitable for different use cases and 
application requirements. Understanding and selecting the appropriate weak consistency model is crucial for 
designing effective and robust distributed systems, especially in the context of collaborative applications 
operating in dynamic and unreliable environments.

\subsection*{Algorithmic Results}

The earliest work on secure collaborative tools in a high availability context dates back to 2009; however, 
more systematic research on weak consistency security is quite recent. In 2009, Sing et al. introduced the 
Zeno system, which was the first to propose a Byzantine algorithm favoring availability over strong consistency. 
It provides Byzantine fault tolerance with potentially strong consistency \cite{SinghZeno2009}. The algorithm 
experimentally demonstrated better availability performance compared to classical Byzantine algorithms.

Currently, there are primarily partial studies and solutions for causal consistency \cite{TsengDistributed2019} 
and \cite{VanDerLindePractical2020}. Tseng et al. present exact computability bounds within a Byzantine 
framework on one hand and provide an algorithm whose performance is compared with that of the Google Compute 
platform. Van Der Linde et al. introduce a peer-to-peer system resilient to Byzantine attacks that offers causal 
consistency guarantees. Their evaluation suggests that despite a peer-to-peer architecture, performance, especially 
in terms of latency, is very good compared to a traditional client-server architecture.

In addition to these algorithms, Misra and Kshemkalyani demonstrated in \cite{MisraByzantine2021} that in an 
asynchronous context, it is not possible to achieve causal consistency even with a single Byzantine participant.

One of the notable features of \cite{VanDerLindePractical2020} is its exploration of Byzantine failures within 
the context of weak consistencies. A peer-to-peer system like that in \cite{MisraByzantine2021} prompts new 
considerations where a participant leverages information from lower layers of replication to create attacks at 
the application level.

Applying weak consistency criteria alone doesn't fully address the scope of our concerns. The cloud context 
raises significant questions regarding data centralization and governance, with a market dominated by a few 
major players to whom users must blindly entrust their data, posing substantial challenges to privacy and data 
sovereignty.

In this context, integrating the notion of a zero-trust cloud is essential, anchoring our discussions in a 
relevant approach from both industrial and regulatory perspectives. Zero-trust, as defined by NIST in SP 800-207 
\cite{RoseZero2020}, is a security model that trusts no one and makes no assumptions about network security. It 
helps guard against malicious behaviors by intermediaries, reducing the attack surface and confining Byzantine 
behaviors solely to clients who have access to the data.

Certainly, the consideration of data-centric security alongside communication security is crucial. Adopting 
"Data-Centric" approaches involves treating data itself as a dynamic entity within the system, assigning it 
processes for access control and monitoring \cite{BayukDatacentric2009}. These issues represent growing concerns 
and are addressed by state and inter-state actors, exemplified by NATO's stance on these matters through 
STANAG 4774 and 4778. These topics have been extensively studied since the 2010s with works such as 
\cite{GoyalAttributebased2006, MullerDistributed2009} defining solutions for attribute-based encryption, 
issuing encryption keys based on rights to establish security policies. Other works like \cite{YanFlexible2017} 
propose cloud-adapted solutions based on more flexible architectures with finer granularity in defining rights.

However, concerning zero-trust and data-centric security aspects, there is currently no academic consensus 
on the formalization of these notions. These terms are subject to various interpretations, necessitating a 
formal specification to understand which properties need to be satisfied to achieve weak consistency within 
a zero-trust context.

\subsection*{Existing Implementations}

Currently, there are ongoing projects aimed at implementing weak consistency protocols for real-time collaborative 
applications. One notable project is yjs \cite{Yjs2023}, which implements the YATA protocol \cite{NicolaescuRealTime2016}. 
This protocol ensures strong convergence (or SEC, according to the Perrin reference) through a CRDT 
(Conflict-free Replicated Data Type) system. 

On the other hand, older projects like Etherpad use simpler conflict resolution solutions, also ensuring strong 
convergence but employing more complex algorithmic operations in terms of memory and computation time compared 
to CRDTs \cite{AppJetEtherpad2011}.

\section*{Goals}

The objectives of this thesis encompass studying the three types of weak consistency in a Byzantine setting and 
defining efficient Byzantine algorithms for their implementation. Given that causal consistency is already well-studied, 
the main focus of this thesis will be on the other two types of weak consistency.

The first stage (WP1) will involve studying Byzantine solutions without cryptographic primitives or with reasonably
cost-effective primitives, specifically excluding homomorphic computation. An analysis of existing implementations will
be conducted to determine the guarantees provided by these solutions within the vocabulary of weak consistencies.

The second stage (WP2) will focus on developing more efficient solutions using cryptographic primitives that require 
advanced secret-sharing and/or homomorphic computation.

A final stage (WP3) will involve producing a proof-of-concept key/value storage solution using the algorithms 
selected in the preceding stages.

\section*{Methodology and Planning}

A detailed review of distributed computing models, particularly focusing on solutions for causal consistency, 
will be conducted to establish the set of theoretical and practical assumptions underlying these solutions. 
Concurrently, in collaboration with Parsec, a list of attacks on weakly consistent peer-to-peer architectures 
will be compiled. The emphasis will be on generating new knowledge, including novel solutions compared to the 
current state of the art, as well as identifying new attack vectors.

The algorithms will undergo formal validation initially, followed by the development of a proof of concept.

WP1 will take place in 2024, WP2 in 2025, and WP3 in ZO26.

\section*{Monitoring and Exchange Terms}

Le doctorant participe aux réunions hebdomadaires de suivi de
l'entreprise Parsec. Les partenaires se rencontreront tous les trois
mois pour un point d'avancée sur les travaux.

Il participera également aux réunions physiques de
l'entreprise tous les 6 mois.

\section*{Material resources}

The Phd student will participate in Parsec's weekly progress meetings. Additionally, partners will convene 
every three months for project status updates.

Furthermore, the student will attend in-person meetings at the company every six months.

\section*{Expected Benefits}

On the LIS laboratory side, the expected outcomes include the following scientific publications:

\begin{compactitem}
\item State-of-the-art review and synthesis concerning Byzantine fault tolerance in weak consistencies.
\item Proposals and proofs of new algorithms within the zero-trust context.
\end{compactitem}

For Parsec, the expected deliverables comprise a mini-model of cloud synchronization and collaboration, 
a proof of concept for the aforementioned algorithms, and consultancy and expertise in the scientific 
development of products created by Parsec.

\section*{Team}

\subsection*{Distributed Algorithmics Team (DALGO)}

The Distributed Algorithms team, led by Arnaud Labourel, is part of the Laboratory of Computer 
Science and Systems (LIS CNRS UMR 7020). This research team is internationally recognized at the 
highest level, comprising 8 permanent members whose interests span from reliable distributed 
algorithms and confidentiality in distributed systems to communication networks, graph algorithms, 
mobile agents, and IoT (Internet of Things).

\subsection*{Supervisors}

\textbf{Emmanuel Godard}  is a professor at Aix-Marseille University. His research interests 
  primarily focus on understanding and maximizing decentralization (in a broad sense) in 
  distributed systems. He is an expert in distributed algorithms and computability.

\textbf{Corentin Travers}  is an Associate Professor at Aix-Marseille University. His research 
  interests focus on robust and efficient distributed algorithms for shared-memory systems or 
  distributed networks. He is an expert in distributed algorithmics and complexity.

\textbf{Marcos Medrano} is an R\&D engineer at Parsec. He holds a master's degree in research 
  in computer science and applied mathematics. Marcos is responsible for the development 
  strategy of the Parsec product and facilitates collaboration between engineers and academic stakeholders.

\subsection*{Candidate Choice}

The DALGO team is involved in the "Reliability and Computer Security" Master's program at Aix-Marseille 
University. This master's track is certified as \textit{SecNumEdu} by ANSSI 
(National Cybersecurity Agency of France). In autumn 2022, a project in collaboration with the company 
Parsec was presented to all master's students. Following this call for applications, Mr. Amaury Joly 
was selected for a preliminary 6-month research internship on the topic of weak consistency at the 
LIS laboratory.

Mr. Amaury Joly has achieved excellent academic results, earning a good mention in the master's 
program. Additionally, he possesses a strong dual theoretical and technical profile, with a keen 
motivation for research activities related to cloud security. He is the ideal candidate for such 
a research topic.

{\footnotesize
  \nocite{*}

  \bibliography{sujet-cifre.bib}
  \bibliographystyle{alpha}
}

% LaTeX2e code generated by txt2tags 3.4 (http://txt2tags.org)
% cmdline: txt2tags -t tex sujet-cifre.t2t
\end{document}