0% found this document useful (0 votes)
29 views15 pages

Big Data Processing Techniques Explained

Uploaded by

Shams AlHadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views15 pages

Big Data Processing Techniques Explained

Uploaded by

Shams AlHadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

Click to edit Master title style

Big Data Processing


Platforms

Chapter 6

1
Contents
Click to edit Master title style

• Parallel Data Processing

• Distributed Data Processing

• Speed Consistency Volume (SCV)

2 2
Parallel data processing
Click to edit Master title style
• Parallel data processing involves the simultaneous execution of
multiple sub-tasks that collectively comprise a larger task.

• The goal is to reduce the execution time by dividing a single larger


task into multiple smaller tasks that run concurrently.

• Although parallel data processing can be achieved through multiple


networked machines it is more typically achieved within the confines
of a single machine with multiple processors or cores.

• Parallel processing mostly based on share-every-things


architecture.
3 3
Parallel
Click Data
to edit Processing
Master cont..
title style

4 4
Distributed
Click to edit Data Processing
Master title style

• Distributed data processing is closely related to parallel data

processing in that the same principle of “divide-and-conquer” is

applied.

• However, distributed data processing is always achieved through

physically separate machines that are networked together as a cluster.

• The distributed system based on share-nothings architecture except

network switch.
5 5
Click to editWorkloads
Processing Master title style

• A processing workload in Big Data is defined as the


amount and nature of data that is processed within a
certain amount of time Workloads are usually divided
into two types:

• Batch

• Transactional

6 6
Batchto edit Master title style
Click

• Batch processing, also known as offline processing, involves


processing data in batches and usually imposes delays, which in
turn results in high-latency responses.

• Batch workloads typically involve large quantities of data with


sequential read/writes and comprise of groups of read or write
queries.

• Queries can be complex and involve multiple joins. Strategic BI


and analytics are batch-oriented as they are highly read-intensive
7 7
Batch
Click to edit Master title style

8 8
Transactional
Click to edit Master title style

• Transactional processing is also known as online processing.

• Transactional workload processing follows an approach whereby

data is processed interactively without delay, resulting in low-

latency responses.

• Transaction workloads involve small amounts of data with reads

and writes.
9 9
Processing
Click in Realtime
to edit Master Mode
title style

• Realtime mode addresses the velocity characteristic of Big Data


datasets.

• Within Big Data processing, realtime processing is also called


event or stream processing as the data either arrives
continuously (stream) or at intervals (event)

• Another related term, interactive mode, falls within the category


of realtime. Interactive mode generally refers to query
processing in realtime. Operational BI/analytics are generally 10
10
Speed
Click to Consistency Volume
edit Master title style (SCV)
• Speed – Speed refers to how quickly the data can be
processed once it is generated. In the case of realtime
analytics, data is processed comparatively faster than
batch analytics. This generally excludes the time taken to
capture data and focuses only on the actual data
processing, such as generating statistics or executing an
algorithm.
• Consistency – Consistency refers to the accuracy and the
precision of the results. Results are deemed accurate if they
are close to the correct value and precise if close to each
other. A more consistent system will make use of all
available data, resulting in high accuracy and precision as
compared to a less consistent system that makes use of 11
11
sampling techniques, which can result in lower accuracy
Click
Speed toConsistency
edit MasterVolume
title style
(SCV)

• Volume – Volume refers to the amount


of data that can be processed. Big
Data’s velocity characteristic results in
fast growing datasets leading to huge
volumes of data that need to be
processed in a distributed manner.
Processing such voluminous data in its
entirety while ensuring speed and
consistency is not possible.
12
12
• If speed (S) and consistency (C) are required, it is not possible to process
Click
high to
Speed editof MasterVolume
Consistency
volumes title style
(SCV)
data (V) because large amounts of data slow down data processing.

• If consistency (C) and processing of high volumes of data (V) are


required, it is not

possible to process the data at high speed (S) as achieving high speed data
processing

requires smaller data volumes.

• If high volume (V) data processing coupled with high speed (S) data
processing is

required, the processed results will not be consistent (C) since high-speed
processing of
13
13
large amounts of data involves sampling the data, which may reduce
Realtime
Click BigMaster
to edit Data processing
title style
Event Stream Processing

• During ESP, an incoming stream of events, generally from a single


source and ordered by time, is continuously analyzed.
• Other (memory resident) data sources can also be incorporated
into the analysis for performing richer analytics.
• The processing results can be fed to a dashboard or can act as a
trigger for another application to perform a preconfigured action or
further analysis.
• ESP focuses more on speed than complexity

14
14
Complex
Click Event
to edit Processing
Master title style

• During CEP, a number of realtime events often coming from


disparate sources and arriving at different time intervals are
analyzed simultaneously for the detection of patterns and
initiation of action.

• CEP focuses more on complexity, providing rich analytics.


However, as a result, speed of execution may be adversely
affected.

• In general, CEP is a superset of ESP and often the output of ESP


15
results in the generation of synthetic events that can be fed into

You might also like