0% found this document useful (0 votes)

9 views5 pages

Sgx-Pyspark: Secure Distributed Data Analytics: Do Le Quoc Franz Gregor Jatinder Singh Christof Fetzer

SGX-PySpark is a secure distributed data analytics system that integrates PySpark with Intel SGX to protect sensitive data during processing in public cloud environments. It addresses key challenges such as limited memory space, side-channel attacks, and the secure transfer of secrets, allowing users to run unmodified PySpark applications within secure enclaves. The system ensures data confidentiality and integrity while leveraging the performance benefits of cloud computing.

Uploaded by

selic62374

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views5 pages

Sgx-Pyspark: Secure Distributed Data Analytics: Do Le Quoc Franz Gregor Jatinder Singh Christof Fetzer

Uploaded by

selic62374

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

SGX-PySpark: Secure Distributed Data Analytics

Do Le Quoc Franz Gregor

TU Dresden, Scontain UG TU Dresden, Scontain UG

Jatinder Singh Christof Fetzer

University of Cambridge TU Dresden, Scontain UG

ABSTRACT potential serious legal consequences (and fines) for the data mis-
Data analytics is central to modern online services, particularly handling, mismanagement and leakage, and more generally, for
those data-driven. Often this entails the processing of large-scale failing to implement the appropriate security measures [9]. Ser-
datasets which may contain private, personal and sensitive informa- vice providers must ensure that data is always protected, i.e., at
tion relating to individuals and organisations. Particular challenges rest, during transmission, and computation. Many organisations
arise where cloud is used to store and process the sensitive data. In make use of public cloud services for the processing to reduce time
such settings, security and privacy concerns become paramount, and computation cost. This setting is vulnerable to many security
as the cloud provider is trusted to guarantee the security of the threats, e.g., data breaches [10]. Concerns are compounded when
services they offer, including data confidentiality. Therefore, the we consider attacks from inside the cloud provider, where attackers
issue this work tackles is “How to securely perform data analytics in might have root privileges and/or physical access to machines de-
a public cloud?” ployed at the service providers’ premises. Therefore, to protect the
To assist this question, we design and implement SGX-PySpark– sensitive data and the analytics computation over the data, service
a secure distributed data analytics system which relies on a trusted providers cannot rely solely on the operating system access control
execution environment (TEE) such as Intel SGX to provide strong nor their security policy-based mechanisms.
security guarantees. To build SGX-PySpark, we integrate PySpark An promising approach to helping resolve these security chal-
- a widely used framework for data analytics in industry to sup- lenges is to make use of Trusted Execution Environments (TEEs),
port a wide range of queries, with SCONE - a shielded execution such as Intel Software Guard Extensions (SGX). Intel SGX protects
framework using Intel SGX. the confidentiality and integrity of application code and data even
against privileged attackers with root access and physical access.
CCS CONCEPTS In general, Intel SGX provides an isolated secure memory area
called enclaves, where the code and data can be executed safely.
• Information systems → Data analytics; • Security and pri-
These security guarantees are solely provided by the CPU, thus
vacy → Distributed systems security.
even if system software is compromised, the attacker can never
KEYWORDS access the enclave’s content. This approach supports data analytics
at processor speeds while ensuring the security guarantee for both
Confidential computing; data analytics; security; distributed system computation and sensitive data.
ACM Reference Format: While promising at first glance, to build a practical secure data an-
Do Le Quoc, Franz Gregor, Jatinder Singh, and Christof Fetzer. 2019. SGX- alytics system using TEEs, e.g., Intel SGX, we need to deal with sev-
PySpark: Secure Distributed Data Analytics. In Proceedings of the 2019 World eral challenges. (A) In the current version, Intel SGX supports only
Wide Web Conference (WWW ’19), May 13–17, 2019, San Francisco, CA, USA. a limited memory space (∼ 94MB) for applications running inside
ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3308558.3314129
enclaves. Meanwhile, most big data analytics systems (e.g., Hadoop
and Apache Spark [1]) are extremely memory-intensive, since these
1 INTRODUCTION
systems are almost always based on Java Virtual Machine (JVM).
Cloud-based services are used to collect, process and analyze large (B) Intel SGX still suffers from side-channel attacks [12]. These
amounts of user’s personal data, some of it highly sensitive, such side-channel attacks happen both at memory level [12, 16] and
at that relating to personal finances, political views, health, and so network level [16]. (C) Deployment and bootstrapping of a data
forth. Indeed, we have seen increasing attention of regulators on analytics framework to run inside enclaves is not trivial, in fact,
issues regarding the way in which personal data is handled and challenging. Securely transferring configuration secrets such as
processed - the EU’s General Data Protection Regulation a case in certificates, encryption keys and passwords to start the framework
point. Thus, confidentiality and integrity of the data processing in inside enclaves is complicated because these secrets need to be
clouds are becoming more important, not least because of increased protected on the network as well as securely moved into the en-
demands for accountability regarding service providers, and the claves. (D) Typically, Intel SGX requires users to heavily modify
This paper is published under the Creative Commons Attribution 4.0 International the source code of their application to run inside enclaves. Thus,
(CC-BY 4.0) license. Authors reserve their rights to disseminate the work on their transparently supporting a unmodified distributed data analytics
personal and corporate Web sites with the appropriate attribution.
framework to run inside enclaves is not a trivial task.
WWW ’19, May 13–17, 2019, San Francisco, CA, USA
© 2019 IW3C2 (International World Wide Web Conference Committee), published In the context of building secure data analytics systems using
under Creative Commons CC-BY 4.0 License. Intel SGX, VC3 [13] is one of the first works that applied SGX tech-
ACM ISBN 978-1-4503-6674-8/19/05. nology for Hadoop MapReduce framework. VC3 handles challenges
https://doi.org/10.1145/3308558.3314129

3564
(A) and (D) by using C/C++ to implement the framework and The enclave memory is acquired from Enclave Page Cache
support unmodified Hadoop. However, the challenge (B) is outside (EPC)—a dedicated memory region protected by an on-chip Mem-
the scope of this work. Recently, Opaque [16] overcomes this issue ory Encryption Engine (MEE). The MEE transparently encrypts
by introducing a oblivious mechanism to hide access patterns at net- cache lines on cache-line evictions and decrypts and verifies cache
work level. Opaque provides secure data analytics on Apache Spark lines with on cache-line loads. The EPC cannot be directly accessed
framework using Intel SGX. It deals with (A) by reimplementing by non-enclave applications including operating systems. To sup-
SQL operators for Spark SQL Catalyst engine [6] using C++. These port multiple enclaves on a system, the EPC is partitioned into 4KB
operators run inside enclaves and communicate with Scala code of pages which can be assigned to various enclaves. Currently, the
Spark using a JNI interface. Opaque supports common operators size of EPC is limited to 128MB in which only ∼ 94MB can be used
including map, reduce, filter, sort, aggregation, and join, but not all for user applications and the rest is used to store SGX metadata.
operators of Apache Spark. This means that it does not handle the Fortunately, SGX supports a secure paging mechanism to an un-
challenge (D) completely. In addition, it does not support remote protected memory region even though the paging mechanism may
attestation to verify the integrity of the code running inside SGX introduce significant overheads.
enclaves. It also does not handle the challenge (C) since it does The EPC is managed as the rest of the physical memory by an
not provide a secrets transferring mechanism for execution inside operating system (or a hypervisor in virtualized environments). The
enclaves. Finally, Opaque requires to run Spark master/driver at operating system makes use of SGX instructions to allocate and
client side or in a trusted domain. This might affect significantly free EPC pages for enclaves. In addition, the operating system is
the performance of the system. supposed to expose the enclave services (creation and management)
In this work, we overcome these limitations by building a secure to applications. Since the operating system cannot be trusted, the
data analytics system called SGX-PySpark. We handle the chal- SGX hardware verifies the correctness of EPC pages allocations and
lenge (A) by using PySpark [3], a system built on top of Apache denies any operations that would violate the security guarantees.
Spark to support data analytics using Python processes. Instead of For example, the SGX hardware will not allow the operating system
running a whole JVM inside an enclave to secure Apache Spark or to allocate the same EPC page for different enclaves.
reimplementing operators in C/C++ as Opaque, we run only Python
processes inside enclaves since these processes perform analytics 2.2 SCONE
over encrypted data. Thus, our system supports out-of-the-box
Our system builds on SCONE [7] – a shielded execution framework
operators of PySpark (the challenge (D)), i.e., users do not need
to enable unmodified applications to run inside SGX enclaves. In the
to modify their source code. To run Python processes inside Intel
SCONE platform, the source code of an application is recompiled
SGX enclaves, our system makes use of SCONE [4, 7] a shielded
against a modified standard C library (SCONE libc) to facilitate
execution framework which enables unmodified applications to
the execution of system calls. The address space of the application
run inside Intel SGX enclaves.
stays within an enclave, and the application only can access the
In addition, SGX-PySpark, with the help of SCONE, supports a
untrusted memory via the system call interface.
remote attestation mechanism to ensure the code and data running
SCONE uses the compiler-based approach to prepare and build
inside enclaves are correct and not modified by an attacker. SGX-
native applications for executing inside SGX enclaves. SCONE ap-
PySpark also copes with challenge (C) by providing a mechanism to
plies its mechanism into GNU Compiler Collection (GCC) tool-chain
securely transfer secrets (keys and certificates) to Python processes
to change the compiling process such that it can build position in-
running inside enclaves. To handle challenge (B), SGX-PySpark
dependent, statically linked code, and eventually linked with the
protects its execution against side channel attacks at memory level
starter program. Therefore, SCONE natively supports C/C++ appli-
using a mechanism integrated with SCONE, called Varys [12]. Fi-
cations. For Python applications e.g., PySpark executors, we need
nally, the design of SGX-PySpark allows users to run the Spark
to compile the CPython/PyPy interpreter with SCONE to run these
driver/master in the same infrastructure as workers.
Python processes inside SGX enclaves. Similarly, to run a Java
application inside an enclave, we compile JVM with SCONE.

2.3 PySpark
2 BACKGROUND
PySpark is built on top of Apache Spark [1] to provide the Python
2.1 Intel SGX API for users. Thus, before explaining PySpark, it is useful to under-
Intel SGX is an ISA extension which is a set of special CPU instruc- stand what is Apache Spark. Apache Spark [1] is an open-source
tions for Trusted Execution Environments (TEE). These instructions large-scale data analytics framework. Today, it has become the most
enable applications to create enclaves – protected areas in the appli- popular and widely used big data framework in both academia and
cations address space to provide strong confidentiality and integrity industry. Comparing to earlier frameworks such as Hadoop MapRe-
guarantees against adversaries with privileged root access. Intel duce, Spark is much faster in processing large-scale datasets since
SGX enables trusted computing by isolating the environment of it enables the in-memory computing concept where intermediate
each enclave from untrusted applications outside the enclave. In data is cached in memory to reduce latency [11].
addition, by offering the remote attestation mechanism, Intel SGX For the in-memory computation, Spark introduces the core ab-
allows a remote party to attest the application executing inside an straction – Resilient Distributed Datasets (RDDs) [15] for distributed
enclave [8]. data-parallel computing. An RDD is an immutable and fault-tolerant

3565
Worker
the input data and upload the encrypted data into a distributed stor-
age in an untrusted infrastructure (e.g., a public cloud). Thereafter,
Pipe SGX-PySpark decrypts and processes the encrypted data inside
enclaves in a distributed manner.

Distributed Data Store

Driver Figure 2 illustrates SGX-PySpark’s architecture. SGX-PySpark
Py4J Pipe
consists of two main components: (i) Configuration and attestation
service (CAS) component and (ii) PySpark with integration with
the SCONE library to run inside enclaves. SGX-PySpark maintains
the native layout of PySpark (see §2.3).
: Java
To secure data analytics computations on top of PySpark, SGX-
PySpark runs the driver and the Python processes inside Intel
: Python Pipe
SGX enclaves using the SCONE library. We need to ensure the
confidentiality and integrity of the driver, since it is responsible
Figure 1: PySpark’s architecture. for splitting and scheduling tasks in the system. We protect the
Python processes since they are the sensitive computation parts
since they directly decrypt and process the input data. Note that
in SGX-PySpark, we encrypt not only the input sensitive data but
collection of elements (objects) that is distributed or partitioned also the computation over it (Python code of analytics jobs).
across a set of nodes in a cluster [1].
RDDs support two types of operations: transformations and 3.1 Trusted Enclave-based Driver
actions. Transformations return a new RDD, such as map() and
In PySpark, when we submit a job, the Python driver program makes
filter(), and actions are operations performed on RDDs that return
use of Py4J to start a JVM (Spark driver) and create a JavaSparkCon-
the output to the driver program or store it in a persistent storage
text. This JavaSparkContext orchestrates the job as a regular Spark
system (e.g reduce(), count(), collect(), saveAsTextFile(), etc). Trans-
framework. The JVM-based process takes the responsibility to con-
formations are performed lazily, i.e., they are computed whenever
vert a submitted job into tasks. In detail, it first converts the logical
an action operation is invoked [11].
DAG of operations in the submitted job into a physical execution
In a deployment, Spark contains a main program called driver
plan, i.e., it divides the DAG into a number of stages. Thereafter, it
which coordinates the processes (tasks) executed in other nodes
divides these stages into smaller tasks. Next, the Spark scheduler
(i.e., workers) in a cluster. When a Spark application is submitted
distributes these tasks to executors (Python processes) deployed on
to the driver, it will split the job into tasks and perform scheduling
worker nodes for execution (see §2.3).
to dispatch tasks to workers. The workers then spawn processes
(called executors) to handle received tasks. Worker
Since Python supports powerful libraries such as scipy, numpy, TLS
Conﬁguration &
scikit-learn, and pandas with a simple and concise syntax, it is a g LS
Attestation Service
din a T (CAS ) Pipe
favorite language of data scientists. For that reason, PySpark has loa vi
Up ation tio
n TLS SCONE Lib
r a SCONE Lib

Distributed Data Store

gu t
es
been proposed as a Spark programming model to Python. Figure 1 co
nﬁ
t e att
mo
Re TLS
shows the high-level architecture of PySpark.
Pipe
PySpark extends the Spark runtime to enable executing Python Py4J SCONE Lib

programs on top of Spark. Typically, a PySpark job (a Python pro- User

Jobs
( Encrypted TLS
cess) is submitted, and a JVM is started to communicate with the python code ) SCONE Lib

Python process using Py4J [2]. The Python process creates a Spark- Driver

Context object in the JVM and the SparkContext orchestrates the Pipe
computation as the regular Spark framework (to reuse almost the : Enclave : Java : Python SCONE Lib

same Spark infrastructure). However, the difference is that in PyS-

park, the executors are Python processes. Each Python process (per Figure 2: SGX-PySpark’s architecture.
CPU core) takes care of the execution of the assigned tasks. Each
worker (JVM-based program) submits received tasks to the pool of To provide secure data analytics, we need to protect the driver
Python processes and communicates with them using Pipes. The as it is the command center of the whole system. To achieve this,
Python processes perform the computation and store back resulting in SGX-PySpark, we recompile JVM running in the driver node
data to an RDD (as pickle objects) in the JVM. with the SCONE library to execute the whole driver inside an SGX
enclave.
3 SGX-PYSPARK
Our idea to design SGX-PySpark— a secure distributed data an- 3.2 Trusted Enclave-based Executors
alytics system using Intel SGX, is quite simple. We execute only To execute the analytics job (Python code), PySpark launches Python
sensitive parts, i.e., the computation parts that process the input processes (worker processes) and communicates with them using
sensitive data, inside enclaves. The computation parts outside of pipes to transfer the user-specified Python code and processed
enclaves can only access encrypted data. In general, we first encrypt data. Since these Python processes interact directly with the input

3566
data, we protect them against malicious activities, i.e., ensure the 4 DEMONSTRATIONS
integrity and confidentiality by running them inside enclaves with In this section, we demonstrate how a user can securely perform
the help of the SCONE platform. Note that, in SGX-PySpark both data analytics using SGX-PySpark1 . For demonstration purposes,
input data and computation (Python code) are encrypted before we consider a simple and classical workload of “wordcount”. As-
upload to the system. They are decrypted inside enclaves using sume that the user wants to perform data analytics (e.g. wordcount)
keys transparently obtained from CAS (see §3.3). over a sensitive input data.

3.3 Configuration and Remote Attestation 4.1 Protecting Data

Service A straightforward way to securely process the input data is that the
In SGX-PySpark, we need to make sure that the shared secrets user encrypts the data before uploading it to an untrusted domain
such as certificates, passwords to start PySpark, or keys for en- such as a public cloud. In this demo, we show that this mechanism
crypting/decrypting the input data and computations can never is not enough to protect secrets inside the input data, since the user
be revealed to untrusted components. Furthermore, we need to needs to decrypt the encrypted data and then process it in memory;
securely transfer these secrets to the driver and executors (Python an attacker, especially attackers with privileges, can just dump the
processes) running inside enclaves (see challenge (C) in §1). To memory content to steal the secrets. In SGX-PySpark, the input
achieve these goals, we extend SCONE with a configuration and data is encrypted using the file system shield, and then decrypted
attestation service (CAS) that transfers security secrets only to and processed inside SGX enclaves which cannot be accessed even
the components that have authenticated themselves successfully by strong attackers with root access.
against it. CAS enhances the Intel attestation service [8] to boot-
strap and establish trust across the machines running SGX-PySpark 4.2 Protecting Computation
and maintain a secure configuration for the system. In detail, CAS When the user submit a job (Python code) to the system, an attacker
remotely attests the driver and worker processes of PySpark run- might learn what kind of computation the user wants to perform.
ning inside enclaves, before providing encryption/decryption keys We show in this demo that by using SGX-PySpark, the user can
and other configuration parameters. CAS is itself launched inside further encrypt his computation using the same file system shield
an SGX enclave. A user of SGX-PySpark, first encrypts the in- before uploading to the untrusted domain (e.g., a cloud) for exe-
put data and his data analytics job and then uploads the secrets cution. The secret key for decryption is transferred into executors
(cryptographic keys) and system configurations to CAS. A basic running inside SGX enclaves using the same mechanism as in §4.1.
implementation of CAS is presented in [14]. In addition, to guaran-
tee the correctness of computation in SGX-PySpark, we design and 4.3 Benchmarks
implement an auditing service in CAS to keep result logs during Beside the wordcount workload, we also make use of a standard
runtime. This auditing service protects our data analytics system data analytics benchmark (i.e.,TPC-H [5]) to demonstrate that SGX-
against rollback attacks. PySpark supports a wide range of queries as native PySpark. Fig-
ure 3 presents the latency comparison between SGX-PySpark with
3.4 Network and File System Protection native PySpark in processing TPC-H queries. The performance
overhead incurred when running Python processes inside enclaves
File system shield. To protect integrity and confidentiality of the
is not significant compared to the native PySpark. The reason for
Python codes and the sensitive input data stored on disk, we design
this is that the main overhead of SGX-PySpark is introduced by
in SGX-PySpark a file system shield using SCONE library. In the
communication between Python processes and JVMs in workers.
case SGX-PySpark would write computation results to a file, the
shield encrypts the contents before writing. The shield ensures the
100
integrity of these files by keeping their metadata inside CAS (see SGX-PySpark
Latency [seconds]

§3.3). The keys to encrypt the computation content are different 80

Native PySpark
from the secrets used by the SGX implementation. They are instead
60
a part of configuration that is uploaded to CAS by users at the
startup time of the data analytics system. 40

Network shield. In SGX-PySpark, to protect the confidentiality of 20

secrets transfered from CAS to computation components running 0
inside enclaves, we need to protect the network communication Q1 Q3 Q4 Q5 Q6 Q7 Q10 Q12 Q13 Q14 Q16 Q18 Q19
TPC-H Queries
between CAS and the components to make sure that an attacker
cannot observe the network traffic to steal the secrets. To achieve
this security requirement, we enhance SGX-PySpark by designing Figure 3: TPC-H benchmark.
the network shield that wraps the communication between our
system components in TLS connections and ensures that all data Acknowledgments. This work is supported by the European
passes to the connection and is TLS-encrypted. The certificates for Unions Horizon 2020 research and innovation programme under
TLS connections are saved in a configuration file protected by our
file system shield. 1 The demo repository is available here: https://github.com/doflink/sgx-pyspark-demo

3567
grant agreements No. 777154 (ATOMSPHERE) and No. 780681 movement of such data, and repealing Directive 95/46. Official Journal of the
(LEGaTO). European Union (OJ) (2016).
[10] Tim Greene. Biggest data breaches of 2015. https://www.networkworld.com/
article/3011103/security/biggest-data-breaches-of-2015.html. Accessed: Jan,
REFERENCES 2019.
[1] Apache Spark. https://spark.apache.org. Accessed: Jan, 2019. [11] Holden Karau, Andy Konwinski, Patrick Wendell, and Matei Zaharia. 2015. Learn-
[2] Py4J. http://py4j.sourceforge.net. Accessed: Jan, 2019. ing Spark: Lightning-Fast Big Data Analysis. " O’Reilly Media, Inc.".
[3] PySpark. http://spark.apache.org/docs/2.2.0/api/python/pyspark.html. Accessed: [12] Oleksii Oleksenko, Bohdan Trach, Robert Krahn, Mark Silberstein, and Christof
Jan, 2019. Fetzer. 2018. Varys: Protecting SGX Enclaves from Practical Side-Channel Attacks.
[4] Scontain Technology. https://sconedocs.github.io/. Accessed: Jan, 2019. In 2018 USENIX Annual Technical Conference (USENIX ATC).
[5] TPC-H Benchmark. http://www.tpc.org/tpch/. Accessed: Jan, 2019. [13] Felix Schuster, Manuel Costa, Cédric Fournet, Christos Gkantsidis, Marcus
[6] Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu, Joseph K. Peinado, Gloria Mainar-Ruiz, and Mark Russinovich. 2015. VC3: Trustworthy
Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali Ghodsi, and Matei Data Analytics in the Cloud Using SGX. In Proceedings of the Symposium on
Zaharia. 2015. Spark SQL: Relational Data Processing in Spark. In Proceedings of Security and Privacy (SP).
the International Conference on Management of Data (SIGMOD). [14] Bohdan Trach, Alfred Krohmer, Franz Gregor, Sergei Arnautov, Pramod Bhato-
[7] Sergei Arnautov, Bohdan Trach, Franz Gregor, Thomas Knauth, Andre Martin, tia, and Christof Fetzer. 2018. ShieldBox: Secure Middleboxes using Shielded
Christian Priebe, Joshua Lind, Divya Muthukumaran, Dan O’Keeffe, Mark L. Execution. In Proceedings of the Symposium on SDN Research (SOSR).
Stillwell, David Goltzsche, Dave Eyers, Rüdiger Kapitza, Peter Pietzuch, and [15] Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma,
Christof Fetzer. 2016. SCONE: Secure Linux Containers with Intel SGX. In 12th Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Re-
USENIX Symposium on Operating Systems Design and Implementation (OSDI). silient Distributed Datasets: A Fault Tolerant Abstraction for In-Memory Cluster
[8] Victor Costan and Srinivas Devadas. 2016. Intel SGX Explained. Cryptology Computing. In Proceedings of the 9th USENIX Conference on Networked Systems
ePrint Archive, Report 2016/086. Design and Implementation (NSDI).
[9] General Data Protection Regulation. 2016. Regulation (EU) 2016/679 of the [16] Wenting Zheng, Ankur Dave, Jethro G. Beekman, Raluca Ada Popa, Joseph E.
European Parliament and of the Council of 27 April 2016 on the protection of Gonzalez, and Ion Stoica. 2017. Opaque: An Oblivious and Encrypted Distributed
natural persons with regard to the processing of personal data and on the free Analytics Platform. In Proceedings of the 14th USENIX Conference on Networked
Systems Design and Implementation (NSDI).

3568

Securing Big Data: New Access Control Challenges and Approaches
No ratings yet
Securing Big Data: New Access Control Challenges and Approaches
2 pages
Data Security and Privacy Concepts Appro
No ratings yet
Data Security and Privacy Concepts Appro
8 pages
Big Data Analytics For Security
No ratings yet
Big Data Analytics For Security
3 pages
SGX Attack Survey & Mitigation Analysis
No ratings yet
SGX Attack Survey & Mitigation Analysis
20 pages
Anjos-2019-Fast-Sec - An Approach To Secure Big
No ratings yet
Anjos-2019-Fast-Sec - An Approach To Secure Big
17 pages
Macedo 2017
No ratings yet
Macedo 2017
10 pages
2022 ICAIoT
No ratings yet
2022 ICAIoT
7 pages
Article 1412 PDF
No ratings yet
Article 1412 PDF
27 pages
Big Data in Cybersecurity
No ratings yet
Big Data in Cybersecurity
15 pages
Big Data Security and Privacy Challenges
No ratings yet
Big Data Security and Privacy Challenges
2 pages
Access Control Technologies For Big Data Management Systems: Literature Review and Future Trends
No ratings yet
Access Control Technologies For Big Data Management Systems: Literature Review and Future Trends
13 pages
Chapter 5 - Big Data Implementation Part 3 (Security)
No ratings yet
Chapter 5 - Big Data Implementation Part 3 (Security)
28 pages
Xeon Security Engines Product Brief
No ratings yet
Xeon Security Engines Product Brief
4 pages
References
No ratings yet
References
4 pages
Security Challenges of Big Data Computin
No ratings yet
Security Challenges of Big Data Computin
8 pages
Applsci 13 01183
No ratings yet
Applsci 13 01183
28 pages
Understanding SGX
No ratings yet
Understanding SGX
118 pages
Secure Sensitive Data Sharing On A Big Data Platform
No ratings yet
Secure Sensitive Data Sharing On A Big Data Platform
9 pages
(IJIT-V10I5P5) :nandish Shivaprasad
No ratings yet
(IJIT-V10I5P5) :nandish Shivaprasad
11 pages
17 Ijcse 01468
No ratings yet
17 Ijcse 01468
7 pages
GDPR Compliance with Dataguise
No ratings yet
GDPR Compliance with Dataguise
2 pages
Data Engineering Notes
No ratings yet
Data Engineering Notes
39 pages
Big Data Security and Privacy A Review On Issues C
No ratings yet
Big Data Security and Privacy A Review On Issues C
7 pages
Open Source Secure Infrastructure For Data Visiting
No ratings yet
Open Source Secure Infrastructure For Data Visiting
18 pages
Harnessing AI For Data Privacy Through A Multidimensional Framework
No ratings yet
Harnessing AI For Data Privacy Through A Multidimensional Framework
20 pages
7ffba6343cb7403fa95fe6c0fefc3930
No ratings yet
7ffba6343cb7403fa95fe6c0fefc3930
47 pages
Big Data Security Using System Logs
No ratings yet
Big Data Security Using System Logs
7 pages
Azure Confidential Computing Technologies Guide
No ratings yet
Azure Confidential Computing Technologies Guide
7 pages
Database Security Concepts Approaches An
No ratings yet
Database Security Concepts Approaches An
19 pages
Verification of Confidentiality Properties of Enclave Programs
No ratings yet
Verification of Confidentiality Properties of Enclave Programs
19 pages
Towards A Data-Centric View of Cloud Security
No ratings yet
Towards A Data-Centric View of Cloud Security
8 pages
Integrating Cybersecurity Into A Big Data Ecosystem
No ratings yet
Integrating Cybersecurity Into A Big Data Ecosystem
8 pages
The Design of Malware On Modern Hardware: Malware Inside Intel SGX Enclaves
No ratings yet
The Design of Malware On Modern Hardware: Malware Inside Intel SGX Enclaves
22 pages
Leadership Compass - Data Security Platforms - KuppingerCole
No ratings yet
Leadership Compass - Data Security Platforms - KuppingerCole
79 pages
Security Challenges of Big Data Computing
No ratings yet
Security Challenges of Big Data Computing
8 pages
Unit 1 Notes Final Part B
No ratings yet
Unit 1 Notes Final Part B
24 pages
Astesj 0405531
No ratings yet
Astesj 0405531
11 pages
Fundamentals of Data Engineering by Joe Reis and Matt Housley 81
No ratings yet
Fundamentals of Data Engineering by Joe Reis and Matt Housley 81
6 pages
Big Data Based Security Analytics For Protecting Virtualized Infrastructures in Cloud Computing
No ratings yet
Big Data Based Security Analytics For Protecting Virtualized Infrastructures in Cloud Computing
41 pages
U1 B CLSRM
No ratings yet
U1 B CLSRM
21 pages
Securing Blockchain and AI
No ratings yet
Securing Blockchain and AI
32 pages
Ensuring Distributed Accountability and Securing Data Sharing in The Cloud
No ratings yet
Ensuring Distributed Accountability and Securing Data Sharing in The Cloud
9 pages
Secure Processors: Intel SGX vs MIT Sanctum
No ratings yet
Secure Processors: Intel SGX vs MIT Sanctum
116 pages
Cloud Computing
No ratings yet
Cloud Computing
10 pages
Opensgx: An Open Platform For SGX Research: February 2016
No ratings yet
Opensgx: An Open Platform For SGX Research: February 2016
17 pages
African Journal of Engineering and Environment Research Vol.4 (2) 2023 - ISSN: 2992-2828
No ratings yet
African Journal of Engineering and Environment Research Vol.4 (2) 2023 - ISSN: 2992-2828
11 pages
Big Data Security & Privacy Review
No ratings yet
Big Data Security & Privacy Review
7 pages
Unit 1 Topic 6 Big Data Features - Security
No ratings yet
Unit 1 Topic 6 Big Data Features - Security
55 pages
Big Data Security I
No ratings yet
Big Data Security I
8 pages
Unit 3 Big Data
No ratings yet
Unit 3 Big Data
31 pages
Data Security Solution-Rev10.3
No ratings yet
Data Security Solution-Rev10.3
23 pages
Database Security and Computer Programming
No ratings yet
Database Security and Computer Programming
11 pages
Topic: A Comprehensive Framework For Secure Query Processing On Relational Data in The Cloud
No ratings yet
Topic: A Comprehensive Framework For Secure Query Processing On Relational Data in The Cloud
4 pages
Data Security Insights for Researchers
No ratings yet
Data Security Insights for Researchers
5 pages
Data Security in Cloud Computing PDF
75% (4)
Data Security in Cloud Computing PDF
324 pages
Apex Institute of Technology: Big Data Security
No ratings yet
Apex Institute of Technology: Big Data Security
13 pages
CP25
0% (1)
CP25
18 pages
Simulation Tabletop Exercise For Incident Response
No ratings yet
Simulation Tabletop Exercise For Incident Response
32 pages
Setup Guide-Cisco SPA122
No ratings yet
Setup Guide-Cisco SPA122
9 pages
Safety Switches
100% (1)
Safety Switches
62 pages
Rakesh Yadav Class Notes Math in Hindi PDF Free Download PDF
71% (7)
Rakesh Yadav Class Notes Math in Hindi PDF Free Download PDF
423 pages
Hacking The Android APK Training
No ratings yet
Hacking The Android APK Training
2 pages
Key To Unix - Notes
No ratings yet
Key To Unix - Notes
66 pages
Capstone Docu With Border Original Copy 1 ST Defence
No ratings yet
Capstone Docu With Border Original Copy 1 ST Defence
97 pages
Honeypot PDF
100% (1)
Honeypot PDF
9 pages
Card Authentication
No ratings yet
Card Authentication
1 page
MikroTik RouterOS Networking Quiz
100% (1)
MikroTik RouterOS Networking Quiz
43 pages
Smera Sarnath Sonkar
No ratings yet
Smera Sarnath Sonkar
2 pages
Information Assurance and Compliance
No ratings yet
Information Assurance and Compliance
11 pages
ECN Consent Form E02
No ratings yet
ECN Consent Form E02
2 pages
Ideatab A1000l-F Ug en v1.0 20130829 PDF
No ratings yet
Ideatab A1000l-F Ug en v1.0 20130829 PDF
25 pages
Employers Application For Registration-Nssf
No ratings yet
Employers Application For Registration-Nssf
2 pages
Chapter - 8 - Crypto
No ratings yet
Chapter - 8 - Crypto
163 pages
Wallet Collection 2022 2023 - Print
No ratings yet
Wallet Collection 2022 2023 - Print
20 pages
Lessons Learned
No ratings yet
Lessons Learned
16 pages
SQL Injection Guide for Hackers
No ratings yet
SQL Injection Guide for Hackers
7 pages
CV Semcheddine Ikram
No ratings yet
CV Semcheddine Ikram
2 pages
Software Security Notes
No ratings yet
Software Security Notes
4 pages
Read
No ratings yet
Read
3 pages
VPN Security: IPSec and GRE Tunnels
No ratings yet
VPN Security: IPSec and GRE Tunnels
4 pages
Encryption Lab Questions CRJ 362
No ratings yet
Encryption Lab Questions CRJ 362
2 pages
2023AIP Summary Form
No ratings yet
2023AIP Summary Form
5 pages
Birth/Death Certificate Request Form
No ratings yet
Birth/Death Certificate Request Form
1 page
Group 10 Conveyancing Assignment Final Document.
No ratings yet
Group 10 Conveyancing Assignment Final Document.
18 pages
Lab2 IAP301
No ratings yet
Lab2 IAP301
11 pages
Smart Home Solutions for Builders
No ratings yet
Smart Home Solutions for Builders
16 pages

Sgx-Pyspark: Secure Distributed Data Analytics: Do Le Quoc Franz Gregor Jatinder Singh Christof Fetzer

Uploaded by

Sgx-Pyspark: Secure Distributed Data Analytics: Do Le Quoc Franz Gregor Jatinder Singh Christof Fetzer

Uploaded by

SGX-PySpark: Secure Distributed Data Analytics

Do Le Quoc Franz Gregor

Jatinder Singh Christof Fetzer

Distributed Data Store

Distributed Data Store

programs on top of Spark. Typically, a PySpark job (a Python pro- User

same Spark infrastructure). However, the difference is that in PyS-

3.3 Configuration and Remote Attestation 4.1 Protecting Data

§3.3). The keys to encrypt the computation content are different 80

Network shield. In SGX-PySpark, to protect the confidentiality of 20

You might also like