0% found this document useful (0 votes)

24 views29 pages

Big Data Analytics Practical Guide

The document is a practical file for a Big Data Analytics course, detailing various aspects of Big Data, Hadoop architecture, and related tools. It covers concepts such as the characteristics of Big Data, types of data, and the implementation of MapReduce, alongside practical exercises on Hadoop ecosystem tools and NoSQL databases. Additionally, it discusses document similarity measures and nearest neighbor search algorithms.

Uploaded by

demo.972350

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views29 pages

Big Data Analytics Practical Guide

Uploaded by

demo.972350

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

SWARNIM STARTUP s INNOVATION UNIVERSITY

Swarrnim School of Computing s IT

Practical File

Course Name: MCA

Subject Code: 16110301

StudentName:Brahmbhatt

Dhruv y.

Enrollment No: 2414607025

Subject Name: Big Data Analytics

Academic Year:2025- 2026

Faculty Sign:
1) Practical 1:- Study of Big Data Concepts :

Introduction

Big Data refers to extremely large and complex datasets that cannot be
processed efficiently using traditional data processing techniques. With the
growth of social media, IoT, mobile devices, and digital services, massive
amounts of data are generated every second. Big Data technologies help in
storing, processing, and analyzing this data to extract valuable insights.

1. Characteristics of Big Data (5 V’s)

1. Volume

● Refers to the huge amount of data generated.

● Data size ranges from terabytes to petabytes.
● Example: Facebook generates terabytes of data daily.

2. Velocity

● Speed at which data is generated, processed, and analyzed.

● Real- time or near real- time processing is required.
● Example: Stock market data, live streaming data.

3. Variety

● Different types and formats of data.

● Includes:
o Structured (tables, databases)
o Semi- structured (XML, JSO N)
o Unstructured (images, videos, text)
● Example: Emails, tweets, images, sensor data.

4. Veracity

● Refers to the quality, accuracy, and reliability of data.

● Data may be noisy, incomplete, or inconsistent.
● Example: Fake reviews or incorrect sensor data.

5. Value
● Ability to extract meaningful insights from data.
● Data is useful only if it provides business or analytical value.
● Example: Customer behavior analysis for marketing.

2. Types of Big Data

1. Structured Data

● Data organized in rows and columns.

● Stored in relational databases.
● Easy to search and analyze.
● Example: Student records, bank transactions.

2. Semi- Structured Data

● Data does not follow a strict table structure.

● Uses tags or markers.
● Example: XML files, JSO N data, emails.

3. Unstructured Data

● No predefined format.
● Difficult to process using traditional tools.
● Example: Images, videos, audio files, social media posts.

3. Traditional Data vs Big Data

Feature Traditional Big Data

Data
Data Small to Very large
Size medium
Data Structured Structured, Semi C
Type Unstructured
Storage RDBMS HDFS, NoSQ L
Batch
Processi Real- time C Batch
processing
ng
Scalabilit Limited Highly scalable
y
Tools SQ L, Excel Hadoop, Spark
Cost- effective (commodity
Cost High
hardware)
2) Practical 2 :- Study of Hadoop Architecture :
Introduction

Apache Hadoop is an open- source framework used for storing and

processing Big Data in a distributed computing environment. It is designed
to run on commodity hardware and provides high scalability, fault tolerance,
and reliability. Hadoop is widely used for processing large volumes of
structured and unstructured data.

1. Hadoop Distributed File System (HDFS)

● Storage layer of Hadoop.

● Stores large files by dividing them into blocks.
● Provides fault tolerance through data replication.

2. YARN (Yet Another Resource Negotiator)

● Resource management layer.

● Manages cluster resources like CPU and memory.
● Schedules and executes applications.

3. MapReduce

● Data processing layer.

● Uses Map and Reduce functions to process large datasets in parallel.
● Suitable for batch processing.

4. Hadoop Common

● Contains shared libraries and utilities.

● Supports all Hadoop modules.

2. HDFS Architecture

1. NameNode (Master Node)

● Manages metadata (file name, size, block location).

● Does not store actual data.
● Maintains file system namespace.
2. DataNode (Slave Node)

● Stores actual data blocks.

● Performs read/write operations.
● Sends heartbeat signals to NameNode.

3. Secondary NameNode

● Performs checkpointing.
● Helps in recovery by merging metadata.
● Not a backup of NameNode.

3. Advantages of Hadoop

1. Scalable – Can add nodes easily.

2. Cost- effective – Uses commodity hardware.
3. Fault tolerant – Data replication ensures reliability.
4. High throughput – Suitable for batch processing.
5. Handles all data types – Structured and unstructured data.
6. O pen source – No licensing cost.

4. Limitations of Hadoop

1. Not suitable for real- time processing.

2. Complex setup and maintenance.
3. High latency due to batch processing.
4. NameNode is a single point of failure (in older versions).
5. Requires skilled professionals.
6. Inefficient for small datasets.

3) Practical 3 :- Study of Hadoop Ecosystem Tools :

Introduction

The Hadoop ecosystem consists of various tools that work together to provide
storage, processing, analysis, coordination, and management of Big Data. These
tools enhance Hadoop’s capability by supporting data querying, machine
learning, workflow scheduling, and data transfer.
1. Apache Hive

Description

● Data warehouse tool built on Hadoop.

● Provides SQ L- like query language (HiveQ L).
● Converts queries into MapReduce jobs.

Use Cases

● Data analysis and reporting.

● Q uerying large datasets stored in HDFS.
● Used by analysts familiar with SQ L.
● Example: Sales data analysis.

2. Apache Pig

Description

● High- level scripting platform.

● Uses Pig Latin language.
● Simplifies complex data processing tasks.

Use Cases

● ETL (Extract, Transform, Load) operations.

● Data cleansing and transformation.
● Rapid prototyping of data pipelines.
● Example: Log file processing.

3. Apache HBase

Description

● Distributed, column- oriented NoSQ L database.

● Built on top of HDFS.
● Supports real- time read/write access.
Use Cases

● Storing large sparse datasets.

● Real- time applications like messaging systems.
● Time- series data storage.
● Example: Facebook message storage.

4. Apache Sqoop

Description

● Tool for transferring data between Hadoop and RDBMS.

● Supports import and export operations.

Use Cases

● Importing data from MySQ L/O racle to HDFS.

● Exporting processed data back to databases.
● Data migration and backup.
● Example: Import customer data into Hadoop.

5. Apache O ozie

Description

● Workflow scheduling system for Hadoop jobs.

● Manages and coordinates Hadoop tasks.

Use Cases

● Automating Hadoop workflows.

● Scheduling MapReduce, Hive, Pig jobs.
● Managing complex data pipelines.
● Example: Nightly batch processing.
6. Apache Mahout

Description

● Machine learning library for Hadoop.

● Provides scalable algorithms.

Use Cases

● Recommendation systems.
● Clustering and classification.
● Collaborative filtering.
● Example: Product recommendation engine.

7. Apache ZooKeeper

Description

● Distributed coordination service.

● Manages configuration and synchronization.

Use Cases

● Maintaining configuration information.

● Leader election in distributed systems.
● Ensuring high availability.
● Example: Coordination of HBase services.

4) Practical 4 :- Implementation of MapReduce – Word Count Program :

Introduction

MapReduce is a programming model used in Hadoop for processing large

datasets in a
distributed and parallel manner. It divides the task into two main phases:

● Map Phase – Processes input data and generates key- value pairs.
● Reduce Phase – Aggregates and summarizes the output from the
mapper.
The Word Count program is the most basic and commonly used example to
understand MapReduce.
Algorithm (Word Count)
1. Read input text file from HDFS.
2. Mapper reads each line and splits it into words.
3. Mapper emits (word, 1) for each word.
4. Reducer receives (word, [ 1,1,1… ] ).
5. Reducer sums values and outputs (word, total count).

Flow Diagram: Word Count using MapReduce

Input Text File

|
Mappe
r
(word, 1)
|
Shuffle & Sort
| Reducer
(word, count)
|
O utput File

5) Practical 5 :- Study of NoSQL Databases:

Introduction

NoSQ L (Not O nly SQ L) databases are designed to handle large- scale,

unstructured, and semi- structured data. Unlike traditional relational databases,
NoSQ L provides high scalability, flexibility, and performance for Big Data
applications.

1. Types of NoSQ L Databases

1. Key- Value Store

● Stores data as key- value pairs.

● Extremely fast for simple lookups.
● Examples: Redis, Riak, DynamoDB.
● Use Case: Caching, session management.
Example:
Key: user123
Value: {"name":"John", "age":25}

2. Document Store

● Stores data in documents (JSO N, XML, BSO N).

● Each document is self- describing and flexible.
● Examples: MongoDB, CouchDB.
● Use Case: Content management, user profiles, blogging platforms.

Example:

{
"id": "101",
"name": "Alice",
"skills": [ "Python", "Hadoop"]
}

3. Column- Family Store

● Stores data in columns instead of rows.

● O ptimized for analytical queries and big data operations.
● Examples: Apache HBase, Cassandra.
● Use Case: Time- series data, event logging.

Example:

Row Key: user1

Columns: name=John, age=30, city=Delhi

4. Graph Database

● Represents data as nodes (entities) and edges (relationships).

● Efficient for relationship- heavy data.
● Examples: Neo4j, ArangoDB.
● Use Case: Social networks, recommendation systems, fraud detection.

Example:

Node: Alice
Node: Bob
Edge: Alice - > Friend - > Bob
2. Comparison of SQ L vs NoSQ L

Feature SQ L (Relational DB) NoSQ L (Non- relational DB)

Tables (Rows C Key- Value, Document, Column,
Data
Columns) Graph
Model
Schema Fixed schema Dynamic / Flexible schema
Q uery
SQ L Proprietary / API- based
Language
Scalability Vertical (scale- up) Horizontal (scale- out)
BASE compliant (eventual
Transaction ACID compliant
consistency)
s
Unstructured / Semi- structured
Best For Structured data
data
MySQ L, O racle, MongoDB, Cassandra, Redis,
Examples
PostgreSQ L Neo4j

6) Practical 6 :- Document Similarity using Distance Measures:

Introduction

Document similarity measures are used in text mining and information retrieval
to quantify how similar two documents are.

● Jaccard Distance compares the overlap of words between two

documents.
● Cosine Similarity measures the angle between document vectors in a
multi- dimensional space.

1. Jaccard Distance

Definition

Jaccard similarity coefficient measures similarity between two sets:

J(A,B)= A B A! B J(A, B) = \frac{|A \cap B|}{|A \cup B|}

J(A,B)= A! B A B Jaccard Distance:

DJ(A,B)=1−J(A,B)D_J(A, B) = 1 - J(A, B)DJ (A,B)=1−J(A,B)

Algorithm

1. Tokenize both documents into sets of words.

2. Find intersection and union of the two sets.
3. Calculate Jaccard similarity and distance.

Example

● Doc1 = {Big, Data, Hadoop, Spark}

● Doc2 = {Big, Data, NoSQ L, Hive}

A B =2, A! B =6|A \cap B| = 2, \quad |A \cup B| = 6 A B =2, A! B =6

JaccardSimilarity=26=0.33Jaccard Similarity = \frac{2}{6} =
0.33JaccardSimilarity=62
=0.33 JaccardDistance=1−0.33=0.67Jaccard Distance = 1 -
0.33 = 0.67JaccardDistance=1−0.33=0.67

2. Cosine Similarity

Definition

Cosine similarity measures the cosine of the angle between two document
vectors:

Cosine Similarity=cos(θ)=A∀B A × B \text{Cosine Similarity} =

\cos(\theta) =
\frac{\vec{A} \cdot \vec{B}}{||A|| \times ||B||}Cosine
Similarity=cos(θ)= A × B A∀B

● Values range from 0 (no similarity) to 1 (identical).

Algorithm

1. Represent each document as a vector of term frequencies.

2. Compute dot product of the vectors.
3. Divide by the product of their magnitudes.

Example

● Doc1 vector = 1, 1, 1, 0] (Big, Data, Hadoop, Spark,

[ 1, NoSQ L)
● Doc2 vector = 1, 0 0, 1] (Big, Data, NoSQ L, Hive, Spark)
[ 1, ,

Dot Product=1+1+0+0+0=2\text{Dot Product} = 1+1+0+0+0 = 2Dot

Product=1+1+0+0+0=2 A =1+1+1+1+0=2, B =1+1+0+0+1=1.732||A|| =
\sqrt{1+1+1+1+0} = 2, \quad ||B|| = \sqrt{1+1+0+0+1} = 1.732 A =1+1+1+1+0
=2, B =1+1+0+0+1 =1.732 Cosine Similarity=22×1.732=0.577\text{Cosine
Similarity}
= \frac{2}{2 \times 1.732} = 0.577Cosine Similarity=2×1.7322 =0.577

3. Application in Document Comparison

● Plagiarism Detection: Identify copied or similar text.

● Search Engines: Retrieve documents most similar to a query.
● Recommender Systems: Suggest articles or papers based on similarity.
● Text Clustering: Group similar documents together

7) Practical 7 :- Nearest Neighbor Search :

Introduction

The Nearest Neighbor (NN) algorithm is a distance- based method used to find
the closest data point(s) to a given query point.
It is widely used in pattern recognition, recommendation systems, and
classification tasks.

1. Algorithm: Basic Nearest Neighbor

Steps

1. Prepare a dataset of points (users, items, or feature vectors).

2. Choose a query point for which you want to find the nearest neighbor(s).
3. Compute the distance between the query point and all points in the
dataset.
4. Select the point(s) with the smallest distance.

Distance Formula (Euclidean Distance)

For two points P=(p1,p2,...,pn)P = (p_1, p_2, ..., p_n)P=(p1 ,p2 ,...,pn ) and
Q=(q1,q2,...,qn)Q = (q_1, q_2, ..., q_n)Q =(q1 ,q2 ,...,qn ):

d(P,Q)=∑i=1n(pi−qi)2d(P, Q) = \sqrt{\sum_{i=1}^{n} (p_i - q_i)^2}d(P,Q )=i=1∑n (pi

−qi )2
2. Example

Dataset (2D points)

Poin x
t
A 2 3

B 5 4

C 1 2

Query Point

● Q = (3, 3)

Distances

● d(Q , A) = √((3- 2)² + (3- 3)²) = √1 = 1

● d(Q , B) = √((3- 5)² + (3- 4)²) = √5 ≈ 2.236
● d(Q , C) = √((3- 1)² + (3- 2)²) = √5 ≈ 2.236

Nearest Neighbor: A (distance = 1)

3. Applications in Recommendation Systems

1. User- based Collaborative Filtering

a. Recommends items based on similar users.
b. Find nearest neighbors (users) with similar preferences.
2. Item- based Collaborative Filtering
a. Recommends items similar to the item a user likes.
b. Uses nearest neighbor search on item feature vectors.
3. Content Recommendation
a. Suggest movies, products, or articles based on similarity with previous
items.
4. Real- time Applications
a. Music recommendations (Spotify), product suggestions (Amazon), social
media friend suggestions.

4. Advantages
● Simple and intuitive.
● Works well with small datasets.
● Can be used for both classification and recommendation.

5. Limitations

● Inefficient for very large datasets (requires computing distance to all

points).
● Sensitive to irrelevant features (feature scaling needed).
● Performance decreases in high- dimensional spaces (curse of
dimensionality).

8) Practical 8 : - Stream Data Analysis :

Introduction

A data stream is a continuous flow of data generated over time, often too
large to store entirely in memory.
Stream data analysis is used in real- time applications like network monitoring,
sensor data processing, financial transactions, and social media analytics.

1. Data Stream Model

Key Concepts

1. Stream Elements: Individual data points in the stream (e.g., sensor

readings, tweets).
2. Stream Processing: Analyze elements as they arrive.
3. Sliding Window: Focus on the most recent subset of the stream.
4. Single- pass algorithms: Cannot store the entire stream, so
approximate methods are used.

2. Counting Distinct Elements

Problem

Given a data stream, count the number of distinct elements efficiently without
storing all elements.

Example Stream:
[ A, B, A, C, B, D]
Distinct Elements: A, B, C, D # Count = 4
Naive Approach

● Store all elements in a set and count its size.

● Works for small streams, but memory intensive for large streams.

Optimized Approach

● Use Hashing / Probabilistic Algorithms like HyperLogLog or Flajolet-

Martin
for large streams.

3. Applications of Counting Distinct Elements in

Data Streams
1. Network Traffic Monitoring: Count unique IP addresses.
2. Social Media Analytics: Count unique hashtags or users in real- time.
3. Recommendation Systems: Count unique items clicked by users.
4. Fraud Detection: Detect unusual activity by counting distinct transactions.

4. Advantages of Stream Data Analysis

● Enables real- time analytics.

● Works with large- scale, continuous data.
● Reduces memory usage by approximate algorithms.

5. Limitations

● Cannot store all historical data.

● Accuracy depends on approximation algorithms.
● Processing high- velocity streams can be challenging.

G) Practical G :- PageRank Algorithm:

Introduction

PageRank is an algorithm used by Google Search to rank web pages based on

their importance.
It measures the influence of a web page based on the number and quality
of links pointing to it.
Key Idea:

● A page is important if many important pages link to it.

● Each page distributes its rank evenly among its outgoing links.

PageRank is widely used in:

● Search engine ranking

● Social network analysis
● Citation analysis

1. PageRank Formula

The PageRank of page P is calculated as:

PR(P)=(1−d)+d∑i∃M(P)PR(i)L(i)PR(P) = (1 - d) + d \sum_{i \in M(P)}

\frac{PR(i)}{L(i)}PR(P)=(1−d)+di∃M(P)∑ L(i)PR(i)

Where:

● PR(P)PR(P)PR(P) = PageRank of page P

● ddd = damping factor (usually 0.85)
● M(P)M(P)M(P) = set of pages linking to P
● L(i)L(i)L(i) = number of outbound links from page i

2. Example: Simple Web Graph

Graph Structure

● Pages: A, B, C, D
● Links:
oA # B, C
oB # C
oC # A
oD # C

Step 1: Initialize PageRank

● Total pages = 4
● Initial PR for each page = 1 / 4 = 0.25
Step 2: Iterative Calculation

Using damping factor d=0.85d = 0.85d=0.85:

Iteration 1:

PR(A)=(1−0.85)/4+0.85×(PR(C)/1)=0.0375+0.85×0.25=0.25PR(A) = (1 - 0.85)/
4+
0.85 \times (PR(C)/1) = 0.0375 + 0.85 \times 0.25 =
0.25PR(A)=(1−0.85)/4+0.85×(PR(C)/1)=0.0375+0.85×0.25=0.25
PR(B)=0.0375+0.85×(PR(A)/2)=0.0375+0.85×0.125=0.14375PR(B) = 0.0375 +
0.85 \times (PR(A)/2) = 0.0375 + 0.85 \times 0.125 =
0.14375 PR(B)=0.0375+0.85×(PR(A)/2)=0.0375+0.85×0.125=0.14375
PR(C)=0.0375+0.85×(PR(A)/2+PR(B)/1+PR(D)/
1)=0.0375+0.85%(0.125+0.25+0.2 5)=0.0375+0.85%0.625=0.5625PR(C) =
0.0375 + 0.85 \times (PR(A)/2 + PR(B)/1 + PR(D)/1) = 0.0375 + 0.85*(0.125 +
0.25 + 0.25) = 0.0375 + 0.85*0.625 = 0.5625PR(C)=0.0375+0.85×(PR(A)/
2+PR(B)/1+PR(D)/1)=0.0375+0.85%(0.125+0.25+0.25)
=0.0375+0.85%0.625=0.5625 PR(D)=0.0375+0.85%0=0.0375PR(D) = 0.0375 +
0.85*0
= 0.0375 PR(D)=0.0375+0.85%0=0.0375

Iteration 2:

● Repeat until PageRank values converge.

3. Applications of PageRank

1. Search Engines: Rank web pages by importance.

2. Social Networks: Identify influential users.
3. Recommender Systems: Suggest items or content based on link analysis.
4. Scientific Citations: Rank research papers by citation importance.

10) Practical 10 :- Frequent Itemset Mining :

Introduction

Frequent Itemset Mining is a technique in data mining to find commonly

occurring sets of items in transaction databases.
Market Basket Analysis helps retailers understand customer buying patterns.

Applications:

● Product recommendations
● Store layout optimization
● Cross- selling strategies

1. Simple Dataset

Transaction ID Items Purchased

ead, Milk
2 Bread, Diaper, Beer, Eggs
lk, Diaper, Beer, Cola
ead, Milk, Diaper, Beer
ead, Milk, Diaper, Cola

2. Algorithm: Apriori (Simplified)

1. Identify all items in transactions.

2. Generate candidate itemsets (1- item, 2- item, 3- item...).
3. Count support (frequency) of each itemset.
4. Keep only itemsets whose support ≥ minimum support threshold.
5. Repeat for larger itemsets until no frequent itemsets are found.

3. Manual Calculation (Example)

Step 1: Count 1- itemsets (support ≥ 3)

Item Cou
nt
Brea 4
d
Milk 4
Diap 4
er
Beer 3
Cola 2
Eggs 1

Frequent 1- itemsets: Bread, Milk, Diaper, Beer

Step 2: Count 2-

itemsets Itemset
Count
Bread, Milk 3
Bread, Diaper 3
Bread, 2
Beer
Milk, 3
Diaper
Milk, Beer 2

Diaper, 3
Beer

Frequent 2- itemsets: Bread+Milk, Bread+Diaper, Milk+Diaper, Diaper+Beer

Step 3: Count 3- itemsets

Itemset

Count

Bread, Milk, Diaper

2 Milk, Diaper, Beer
2
Bread, Diaper,
Beer 1

Frequent 3- itemsets: None (threshold = 3)

4. Applications of Frequent Itemset Mining

1. Product Recommendations: Suggest products frequently bought

together.
2. Store Layout Planning: Place items together to increase sales.
3. Promotions: Bundle frequently purchased items.
4. Inventory Management: Predict high- demand item combinations.

Understanding Hadoop and Big Data Analysis
No ratings yet
Understanding Hadoop and Big Data Analysis
8 pages
Hadoop vs RDBMS: Key Differences Explained
No ratings yet
Hadoop vs RDBMS: Key Differences Explained
23 pages
Understanding Data and Big Data Concepts
No ratings yet
Understanding Data and Big Data Concepts
43 pages
NoSQL and Big Data Management Overview
No ratings yet
NoSQL and Big Data Management Overview
92 pages
Big Data Overview and Hadoop Insights
No ratings yet
Big Data Overview and Hadoop Insights
17 pages
Big Data Processing with Hadoop & Spark
No ratings yet
Big Data Processing with Hadoop & Spark
30 pages
Hadoop vs RDBMS: Data Processing Insights
No ratings yet
Hadoop vs RDBMS: Data Processing Insights
27 pages
Big Data Management Applications Overview
No ratings yet
Big Data Management Applications Overview
29 pages
Understanding Hadoop and NoSQL Databases
No ratings yet
Understanding Hadoop and NoSQL Databases
61 pages
Big Data Analytics Technology Overview
No ratings yet
Big Data Analytics Technology Overview
8 pages
GFS vs HDFS in Big Data Context
0% (1)
GFS vs HDFS in Big Data Context
55 pages
Understanding Big Data Analytics
No ratings yet
Understanding Big Data Analytics
47 pages
Hadoop Basics: Data Formats & Features
No ratings yet
Hadoop Basics: Data Formats & Features
47 pages
Introduction to Hadoop and Big Data Analytics
No ratings yet
Introduction to Hadoop and Big Data Analytics
83 pages
Understanding Big Data and Hadoop Essentials
No ratings yet
Understanding Big Data and Hadoop Essentials
17 pages
Big Data and Hadoop Essentials Guide
No ratings yet
Big Data and Hadoop Essentials Guide
19 pages
Big Data Overview and Technologies
No ratings yet
Big Data Overview and Technologies
8 pages
IBM Guardium Map Customer Journey
No ratings yet
IBM Guardium Map Customer Journey
24 pages
Overview of Hadoop Architecture
100% (1)
Overview of Hadoop Architecture
32 pages
Big Data and Hadoop Architecture Guide
50% (2)
Big Data and Hadoop Architecture Guide
168 pages
Spark and Hadoop Use Cases Explained
No ratings yet
Spark and Hadoop Use Cases Explained
48 pages
Hadoop Ecosystem Overview and Skills
No ratings yet
Hadoop Ecosystem Overview and Skills
970 pages
Overview of Hadoop Ecosystem Components
No ratings yet
Overview of Hadoop Ecosystem Components
6 pages
Mike Cafarella and Hadoop's Origins
No ratings yet
Mike Cafarella and Hadoop's Origins
10 pages
Understanding Big Data and Hadoop
No ratings yet
Understanding Big Data and Hadoop
36 pages
Hadoop Architecture and Setup Guide
No ratings yet
Hadoop Architecture and Setup Guide
11 pages
Overview of the Hadoop Ecosystem
No ratings yet
Overview of the Hadoop Ecosystem
34 pages
Chap3 OverviewOfBigDataEcosystem
No ratings yet
Chap3 OverviewOfBigDataEcosystem
91 pages
Hadoop Tools and Data Models Overview
No ratings yet
Hadoop Tools and Data Models Overview
19 pages
Big Data Analytics and Hadoop Overview
100% (3)
Big Data Analytics and Hadoop Overview
33 pages
Understanding the 5 V's of Big Data
No ratings yet
Understanding the 5 V's of Big Data
16 pages
Overview of Big Data Technologies
No ratings yet
Overview of Big Data Technologies
36 pages
Understanding Big Data and Apache Spark
No ratings yet
Understanding Big Data and Apache Spark
14 pages
Overview of Hadoop and Big Data Analytics
100% (1)
Overview of Hadoop and Big Data Analytics
25 pages
Big Data Characteristics and Hadoop Insights
No ratings yet
Big Data Characteristics and Hadoop Insights
15 pages
Big Data Analytics Challenges Overview
No ratings yet
Big Data Analytics Challenges Overview
24 pages
Types and Characteristics of Big Data
No ratings yet
Types and Characteristics of Big Data
53 pages
Big Data Analytics Course Overview
100% (1)
Big Data Analytics Course Overview
20 pages
Big Data Platforms and Use Cases
No ratings yet
Big Data Platforms and Use Cases
9 pages
Understanding Hadoop for Big Data
No ratings yet
Understanding Hadoop for Big Data
26 pages
Understanding Big Data Analytics
No ratings yet
Understanding Big Data Analytics
6 pages
Data Storage and Processing Frameworks
No ratings yet
Data Storage and Processing Frameworks
18 pages
Understanding Hadoop in Big Data Systems
No ratings yet
Understanding Hadoop in Big Data Systems
50 pages
Introduction to Hadoop and MapReduce
No ratings yet
Introduction to Hadoop and MapReduce
42 pages
Understanding Hadoop: Concepts & Analysis
No ratings yet
Understanding Hadoop: Concepts & Analysis
23 pages
HDFS Node Types and User Interfaces
No ratings yet
HDFS Node Types and User Interfaces
15 pages
Understanding Hadoop and Its Ecosystem
No ratings yet
Understanding Hadoop and Its Ecosystem
90 pages
Data Processing with Hadoop Overview
No ratings yet
Data Processing with Hadoop Overview
23 pages
Understanding MapReduce and Big Data
No ratings yet
Understanding MapReduce and Big Data
26 pages
Understanding Big Data Concepts
No ratings yet
Understanding Big Data Concepts
87 pages
Skybounds Server IP for Big Data in Astronomy
No ratings yet
Skybounds Server IP for Big Data in Astronomy
98 pages
Introduction to Big Data at IET Udaipur
No ratings yet
Introduction to Big Data at IET Udaipur
10 pages
Big Data and Hadoop Architecture Overview
No ratings yet
Big Data and Hadoop Architecture Overview
9 pages
Hadoop Basics for Big Data Analysis
No ratings yet
Hadoop Basics for Big Data Analysis
23 pages
Big Data and Hadoop Overview Guide
No ratings yet
Big Data and Hadoop Overview Guide
27 pages
Big Data Analytics Exam Guide
No ratings yet
Big Data Analytics Exam Guide
15 pages
Kontinuitet poslovanja i upravljanje krizom
100% (1)
Kontinuitet poslovanja i upravljanje krizom
35 pages
Software Testing by Yogesh Khairnar
No ratings yet
Software Testing by Yogesh Khairnar
13 pages
Mastercard Fraud Dispute Form
No ratings yet
Mastercard Fraud Dispute Form
3 pages
Windows Server 2016 SMB Vulnerabilities Audit
No ratings yet
Windows Server 2016 SMB Vulnerabilities Audit
16 pages
Modern Applications Built with Flutter
No ratings yet
Modern Applications Built with Flutter
4 pages
Hospital Management System SRS Document
No ratings yet
Hospital Management System SRS Document
8 pages
Transaction Properties in DBMS Explained
No ratings yet
Transaction Properties in DBMS Explained
7 pages
Penetration Testing Course Overview
No ratings yet
Penetration Testing Course Overview
203 pages
Collaborative Code Editor Overview
No ratings yet
Collaborative Code Editor Overview
7 pages
SQLite Database and Content Provider
No ratings yet
SQLite Database and Content Provider
35 pages
AZ-700 Study Guide for Azure Networking
100% (2)
AZ-700 Study Guide for Azure Networking
18 pages
F5 VIP Configuration and Setup Guide
No ratings yet
F5 VIP Configuration and Setup Guide
3 pages
Domino's Pizza: Enhancing Delivery Systems
No ratings yet
Domino's Pizza: Enhancing Delivery Systems
5 pages
AWS Security Group IP Whitelisting Guide
No ratings yet
AWS Security Group IP Whitelisting Guide
12 pages
COBIT 5 Foundation Course Outline
No ratings yet
COBIT 5 Foundation Course Outline
2 pages
Smart Proctoring System Overview
No ratings yet
Smart Proctoring System Overview
32 pages
DBMS Model Examination Questions
No ratings yet
DBMS Model Examination Questions
1 page
ISC2 CC Exam Questions & Answers Guide
No ratings yet
ISC2 CC Exam Questions & Answers Guide
13 pages
NIS2 Cybersecurity Risk Management Guidelines
No ratings yet
NIS2 Cybersecurity Risk Management Guidelines
27 pages
Implementing Digital Oilfield Solutions
No ratings yet
Implementing Digital Oilfield Solutions
20 pages
Heuristic Malware Detection for Android
No ratings yet
Heuristic Malware Detection for Android
6 pages
Sachin Bhardwaj: Product & Growth Expert
No ratings yet
Sachin Bhardwaj: Product & Growth Expert
2 pages
BIM vs. Digital Twin Explained
No ratings yet
BIM vs. Digital Twin Explained
11 pages
Product List API Overview
No ratings yet
Product List API Overview
3 pages
Data Scientist & Analyst Resume Summary
No ratings yet
Data Scientist & Analyst Resume Summary
2 pages
Digital Empowerment and Cyber Safety Course
No ratings yet
Digital Empowerment and Cyber Safety Course
194 pages
Allot Network Intelligence Overview
No ratings yet
Allot Network Intelligence Overview
31 pages
Verify BPM Capability in SAP ESR
No ratings yet
Verify BPM Capability in SAP ESR
5 pages
Web Security (CAT-309) - Unit 1 Lecture 5
No ratings yet
Web Security (CAT-309) - Unit 1 Lecture 5
11 pages
Kaspersky Endpoint Security 11.6 Overview
100% (1)
Kaspersky Endpoint Security 11.6 Overview
17 pages