0% found this document useful (0 votes)
63 views22 pages

417531: Distributed Computing: Final Year of AI & DS Engineering (2020 Course)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views22 pages

417531: Distributed Computing: Final Year of AI & DS Engineering (2020 Course)

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Final Year of AI & DS Engineering (2020 Course)

417531: Distributed Computing


Teaching/Examination Scheme
Teaching/Examination Scheme
Unit 5: Contents
Big data processing frameworks in distributed computing

• Hadoop
• Apache Spark

• Apache Storm

• Samza

• Flink
• Hadoop

• Apache Spark
• Apache Storm

• Samza

• Flink
• Hadoop

• Apache Spark

• Apache Storm
• Samza

• Flink
• Hadoop

• Apache Spark

• Apache Storm

• Samza
• Flink
• Hadoop

• Apache Spark

• Apache Storm

• Samza

• Flink
Parallel and distributed data processing techniques

● Single Instruction Single Data (SISD)

● Single Instruction Multiple Data (SIMD)

● Multiple Instruction Single Data (MISD)

● Multiple Instruction Multiple Data (MIMD)

● Single program multiple data (SPMD)

● Massively parallel processing (MPP)


Scalable data ingestion
• Types of data ingestion

1)Batch Ingestion: Mechanism: Data collection and processing is done using predefined batches.
Advantages: Simple to use, ease of processing large amount of data at a time.
Difficulties: Latency, not suits real-time analytics.
Tools to be used: Apache Hadoop, Apache Spark

2)Real-time/Stream Ingestion: Mechanism: Data is ingested and processed as it comes, in real-time scenarios.
Advantages: Low latency, suits real-time analytics and monitoring.
Difficulties: Complex, causes data loss.
Tools to be used: Apache Kafka, Apache Flink, Apache Storm

3)Change Data Capture (CDC):Mechanism: Captures and tracks changes made to data in source systems.
Advantages: useful for processing only modified data, reduces processing time.
Difficulties: Complex, chances for inconsistency.
Tools to be used: Debezium, Apache Nifi.
.
Real-time analytics and Streaming analytics

Real-Time Analytics refers to the utilization of data processing and analysis tools to derive
insights immediately after data creation or collection.
The primary objective is to furnish timely information for decision-making without notable delays.
Streaming analytics is defined as the ongoing processing of real-time data streams, where insights
are extracted and decisions are made based on the information flowing through the system.
Typically, streaming analytics deals with high-velocity data sources.
Unit 6: Contents
Security Challenges in Distributed Systems:
1)Network Security
2)Data Integrity
3)Authentication and Authorization
4)Consensus and Fault Tolerance
5)Distributed Denial of Service (DDoS)
6)Secure Communication
7)Dynamic and Heterogeneous Environment
8)Auditability and Forensics
9)Data Privacy
10)Software and Patch Management
11)Secure Code Execution
12)Trust Management
Insider Threats:

Insider attack is an attack that is started by someone with insider information.


These people could be former or present employees, contractors, business associates, or
security administrators who handled sensitive information in the past.

Insider Types:
Malicious Insider:
Careless Insider:
Mole:
Encryption and Secure Communication
• TLS/SSL:SSL stands for Secure Socket Layer, and TLS for Transport Layer Security. Secure Socket Layer and Transport Layer
Security are the technologies that provide security between web servers and web browsers.

• PKI:A framework known as public key infrastructure (PKI) makes it possible to use digital signatures and secure
communication over unreliable networks like the internet. PKI is based on asymmetric cryptography, in which each user has
two cryptographic keys: a private key that is kept private and a public key that is shared widely.

• VPN:A VPN provides a secure, encrypted connection between two sites. Before the VPN connection is configured, the two
ends of the connection generate a shared encryption key. This can be accomplished by giving the user a password or by using
a key sharing method.

• AMQP:A queueing technology is a must for creating a distributed system; the most recent version, the Advanced Message
Queue Protocol, offers significant additional capabilities that greatly improve its usability in a business setting.
Privacy Preservation Techniques:
• Differential Privacy
• Homomorphic Encryption
• Secure Multi-Party Computation (SMPC)
• Federated Learning
• Anonymization and Pseudonymization
• Access Control and Data Minimization
AI-based Intrusion Detection and Threat Mitigation Techniques

• Anomaly Detection

• Behavior-based Detection

• Threat Intelligence and Analysis

• Real-time Response and Mitigation

• Adaptive Security

• User and Entity Behavior Analytics (UEBA)

• Threat Hunting and Visualization.


2

You might also like