0% found this document useful (0 votes)

14 views3 pages

Data Science Fundamentals and Concepts

Uploaded by

blessmillion434

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views3 pages

Data Science Fundamentals and Concepts

Uploaded by

blessmillion434

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

UNIT-2

Data science

- It extract knowledge and insights from structured, semi-structured and unstructured data.
- Data science is much more than simply analyzing data.
- It offers a range of roles and requires a range of skills.

Data

- representation of facts, concepts, which should be suitable for communication,

interpretation, or processing, by human or electronic machines.
- unprocessed facts and figures.
- Represented with the help of characters such as alphabets , digits or special characters.

Information

- Processed data on which decisions and actions are based.

- Data that has been processed into a form that is meaningful to the recipient
- It is Interpreted data; created from organized, structured, and processed data

Data processing cycle

- It is the re-structuring or re-ordering of data by people or machines to increase their

usefulness
- It consists of the following basic steps.

Input: recorded on hard disk, CD, flash disk and so o

Processing: The input data is changed in a more useful form.

Output: The result of the processing step is collected.

Data type and there representation

1, Data types from Computer programming perspective

Common data types include:

• Integers(int)- is used to store whole numbers
• Booleans(bool)- true or false
• Characters(char)- store a single character
• Floating-point numbers- store real numbers
• Alphanumeric strings- store a combination of characters and number

2. Data types from Data Analytics perspective:

- There are three common types of data types or structures:
I. structured data: conforms to a tabular format with a relationship between the different rows
and columns.
Eg: Excel files or SQL.

II. semi structured : does not conform with the formal structure of data models
- but nonetheless, contains tags or other markers to separate semantic elements
Eg: JSON and XML

III. unstructured data

- Either does not have a predefined data model or is not organized in a pre-defined manner.
- It is typically text-heavy but may contain data such as dates, numbers, and facts results in
irregularities and ambiguities.
Eg: audio, video files or No-SQL

Meta data –data about data

- It provides additional information about a specific set of data.

- provides fields for dates and locations which, by themselves, can be considered structured
data.

Data value chain

- describe the information flow within a big data system as a series of steps needed to
generate value and useful insights from data.

1. Data Acquisition
- process of gathering, filtering, and cleaning data before it is put in a data warehouse
- challenges include :infrastructure requirement for high transaction volumes

2. Data Analysis
- making the raw data useful for decision making.
- Involves exploring, transforming, and modeling data with the goal of highlighting relevant data.

3. Data Curation
- managing data throughout its lifecycle to ensure quality and usability.
- Activities include content creation, selection, classification, validation.

4. Data Storage
-Managing data in a scalable way.
-RDBMS may not handle big data efficiently.
- However, the ACID (Atomicity, Consistency, Isolation, and Durability) properties that
guarantee database transactions lack flexibility with regard to schema changes.
- NoSQL technologies have been designed with the scalability goal.
5. Data Usage: applying data analysis to business activities to improve performance, reduce cost
and enhance value

Basic concepts of big data

-Big data is characterized by 3V and more
• Volume: large amounts of data Zeta bytes/Massive datasets
• Velocity: Data is live streaming or in motion
• Variety: data comes in many different forms from diverse sources
• Veracity: can we trust the data? How accurate is it?

Clustered computing

- combine multiple computers to handle large data volumes and computational tasks.

Benefits:

- Resource Pooling: combine available storage space, CPU, …

- High Availability; fault tolerance and availability
- Easy Scalability: expansion in resource requirement
- The good example of clustering software is Hadoop’s YARN

Hadboop and its ecosystem

- four core components: data management, access, processing, and storage

Big Data Life Cycle with Hadoop

1. Ingesting data into the system

- Sqoop transfers data from RDBMS to HDFS, whereas Flume transfers event data.

2. Processing the data in storage

- In this stage, the data is stored and processed.
- HDFS, and the NoSQL distributed data, HBase.
- Spark and Map Reduce perform data processing.

3. Computing and analyzing data

- the data is analyzed by processing frameworks such as Pig, Hive, and Impala.
- Pig converts the data using a map and reduce and then analyzes it.

4. Visualizing the results

- performed by tools such as Hue and Cloud era Search.

Traditional DBMS vs Big Data Comparison
No ratings yet
Traditional DBMS vs Big Data Comparison
55 pages
Understanding Big Data: Key Concepts
No ratings yet
Understanding Big Data: Key Concepts
1 page
Overview of Data Science Concepts
No ratings yet
Overview of Data Science Concepts
20 pages
Understanding Data Science Basics
No ratings yet
Understanding Data Science Basics
31 pages
Understanding Data Science Concepts
No ratings yet
Understanding Data Science Concepts
41 pages
Introduction to Big Data Analytics
No ratings yet
Introduction to Big Data Analytics
30 pages
Big Data Concepts and Data Engineering
No ratings yet
Big Data Concepts and Data Engineering
3 pages
Understanding Data and Data Science Concepts
No ratings yet
Understanding Data and Data Science Concepts
10 pages
Understanding Data Science Concepts
No ratings yet
Understanding Data Science Concepts
29 pages
Chapter 2: Data Science Overview
No ratings yet
Chapter 2: Data Science Overview
20 pages
Understanding Data Science Concepts
No ratings yet
Understanding Data Science Concepts
22 pages
Data Processing in Hadoop Ecosystem
No ratings yet
Data Processing in Hadoop Ecosystem
31 pages
Understanding Aggregate Data Models
No ratings yet
Understanding Aggregate Data Models
1 page
Overview of Data Science Concepts
No ratings yet
Overview of Data Science Concepts
52 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
22 pages
Understanding Data Science Concepts
No ratings yet
Understanding Data Science Concepts
32 pages
Overview of Data Science Concepts
No ratings yet
Overview of Data Science Concepts
32 pages
Overview of Data Science Concepts
No ratings yet
Overview of Data Science Concepts
52 pages
Overview of Data Science Concepts
No ratings yet
Overview of Data Science Concepts
23 pages
Data Science Overview and Big Data Concepts
No ratings yet
Data Science Overview and Big Data Concepts
23 pages
Understanding Data Science Fundamentals
No ratings yet
Understanding Data Science Fundamentals
8 pages
Data Science Overview and Concepts
No ratings yet
Data Science Overview and Concepts
23 pages
Data Science Fundamentals in Emerging Tech
No ratings yet
Data Science Fundamentals in Emerging Tech
30 pages
Understanding Big Data Concepts and Structures
No ratings yet
Understanding Big Data Concepts and Structures
3 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
28 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
37 pages
Understanding Data Science Fundamentals
No ratings yet
Understanding Data Science Fundamentals
38 pages
Overview of DBMS: Types and Benefits
No ratings yet
Overview of DBMS: Types and Benefits
1 page
Big Data Overview and Frameworks Guide
No ratings yet
Big Data Overview and Frameworks Guide
14 pages
Understanding Data Science Fundamentals
No ratings yet
Understanding Data Science Fundamentals
27 pages
Data Engineering Course Overview
No ratings yet
Data Engineering Course Overview
36 pages
Data Science Overview and Concepts
No ratings yet
Data Science Overview and Concepts
20 pages
Chapter Two
No ratings yet
Chapter Two
14 pages
Data Science Overview and Big Data Insights
No ratings yet
Data Science Overview and Big Data Insights
24 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
27 pages
Understanding Data Science Fundamentals
No ratings yet
Understanding Data Science Fundamentals
8 pages
Big Data Traits and Tools Overview
No ratings yet
Big Data Traits and Tools Overview
29 pages
Understanding Data Science Fundamentals
No ratings yet
Understanding Data Science Fundamentals
28 pages
NoSQL Database Types and Benefits
No ratings yet
NoSQL Database Types and Benefits
1 page
Data Structures Overview and Types
No ratings yet
Data Structures Overview and Types
4 pages
Data Engineering 101
No ratings yet
Data Engineering 101
1 page
Data Science and Hadoop Overview
No ratings yet
Data Science and Hadoop Overview
32 pages
Data Engineering: Overview and Applications
No ratings yet
Data Engineering: Overview and Applications
88 pages
Understanding Data Science Concepts
No ratings yet
Understanding Data Science Concepts
35 pages
Introduction to Data Science Concepts
No ratings yet
Introduction to Data Science Concepts
28 pages
BDA MTech 1st Sem Exam Questions 2024
No ratings yet
BDA MTech 1st Sem Exam Questions 2024
98 pages
Data Science Overview and Concepts
No ratings yet
Data Science Overview and Concepts
24 pages
Database Design and Management Guide
No ratings yet
Database Design and Management Guide
21 pages
Data Science Overview and Concepts
No ratings yet
Data Science Overview and Concepts
23 pages
Big Data Storage Technologies Explained
No ratings yet
Big Data Storage Technologies Explained
2 pages
Comprehensive DBMS Notes and Overview
No ratings yet
Comprehensive DBMS Notes and Overview
121 pages
Overview of Data Science Concepts
No ratings yet
Overview of Data Science Concepts
20 pages
Understanding Database Normalization
No ratings yet
Understanding Database Normalization
1 page
Data Science Fundamentals and Big Data Insights
No ratings yet
Data Science Fundamentals and Big Data Insights
24 pages
Data Science and Big Data Overview
No ratings yet
Data Science and Big Data Overview
4 pages
Database Systems Overview - Spring 2025
No ratings yet
Database Systems Overview - Spring 2025
55 pages
Data Science Overview and Concepts
No ratings yet
Data Science Overview and Concepts
27 pages
Rival Theories of State Explained
No ratings yet
Rival Theories of State Explained
9 pages
Civics and Ethics Education Paper
No ratings yet
Civics and Ethics Education Paper
1 page
Factors Impacting Disability Experience
No ratings yet
Factors Impacting Disability Experience
89 pages
Applied Mathematics Assignment II
No ratings yet
Applied Mathematics Assignment II
1 page
Inclusiveness Fieldwork Assignment Guide
No ratings yet
Inclusiveness Fieldwork Assignment Guide
3 pages
Understanding Impairment and Disability Terms
No ratings yet
Understanding Impairment and Disability Terms
3 pages
Physics Worksheet: Motion and Acceleration
No ratings yet
Physics Worksheet: Motion and Acceleration
1 page
Understanding Limits and Continuity
No ratings yet
Understanding Limits and Continuity
15 pages
Digital Marketing Project by Yash Arora
No ratings yet
Digital Marketing Project by Yash Arora
22 pages
Comprehensive Magento Features Overview
No ratings yet
Comprehensive Magento Features Overview
10 pages
Python Bootcamp with AI Curriculum 2024
No ratings yet
Python Bootcamp with AI Curriculum 2024
16 pages
AI CLass XII Chapter 7 - Generative AI
No ratings yet
AI CLass XII Chapter 7 - Generative AI
7 pages
Random Text Document for Testing
No ratings yet
Random Text Document for Testing
10 pages
NPTEL January 2026 Course Schedule
No ratings yet
NPTEL January 2026 Course Schedule
5 pages
2nd Quarter Examination E Tech
No ratings yet
2nd Quarter Examination E Tech
3 pages
R07 B.Tech Exam Dates May 2013
No ratings yet
R07 B.Tech Exam Dates May 2013
5 pages
Understanding GUI Layouts and Events
No ratings yet
Understanding GUI Layouts and Events
66 pages
Cybersecurity Threats and Protection
No ratings yet
Cybersecurity Threats and Protection
92 pages
Smart Parking System for Urban Efficiency
No ratings yet
Smart Parking System for Urban Efficiency
7 pages
Variable Message Signs: A Comprehensive Guide
No ratings yet
Variable Message Signs: A Comprehensive Guide
58 pages
Latitude-E6420-Xfr - Service Manual - En-Us
No ratings yet
Latitude-E6420-Xfr - Service Manual - En-Us
81 pages
B.Sc. Computer Science Results 2018
No ratings yet
B.Sc. Computer Science Results 2018
6 pages
Understanding SQL and DBMS Basics
No ratings yet
Understanding SQL and DBMS Basics
38 pages
Web Technologies Course Outline
No ratings yet
Web Technologies Course Outline
6 pages
GTU PMMS Portal Registration Guide
No ratings yet
GTU PMMS Portal Registration Guide
2 pages
Introduction to Raspberry Pi 4
No ratings yet
Introduction to Raspberry Pi 4
25 pages
Design of A Wireless Active Sensing Unit For Structural Health Monitoring
No ratings yet
Design of A Wireless Active Sensing Unit For Structural Health Monitoring
12 pages
Quickroute 3.6 PRO+ Overview
No ratings yet
Quickroute 3.6 PRO+ Overview
2 pages
Word Basics: Interface to Review Tools
No ratings yet
Word Basics: Interface to Review Tools
12 pages
Evaluating Stacks and Queues in Java
No ratings yet
Evaluating Stacks and Queues in Java
6 pages
QGIS Network Analysis: Shortest Path Guide
No ratings yet
QGIS Network Analysis: Shortest Path Guide
5 pages
Test Bank For Accounting Information Systems 9th Edition by James A Hall
No ratings yet
Test Bank For Accounting Information Systems 9th Edition by James A Hall
61 pages
T1 Security Policy Overview
No ratings yet
T1 Security Policy Overview
14 pages
Ademco Alarm Output Codes
No ratings yet
Ademco Alarm Output Codes
8 pages
Identifying Video Ports and Connectors
No ratings yet
Identifying Video Ports and Connectors
65 pages
CCI NewsGate Product Setup Overview
No ratings yet
CCI NewsGate Product Setup Overview
14 pages
Essential OOPs Interview Questions
No ratings yet
Essential OOPs Interview Questions
13 pages
Ultimate Arena Manual Overview
No ratings yet
Ultimate Arena Manual Overview
60 pages

Data Science Fundamentals and Concepts

Uploaded by

Data Science Fundamentals and Concepts

Uploaded by

UNIT-2

- representation of facts, concepts, which should be suitable for communication,

- Processed data on which decisions and actions are based.

Data processing cycle

- It is the re-structuring or re-ordering of data by people or machines to increase their

Input: recorded on hard disk, CD, flash disk and so o

Processing: The input data is changed in a more useful form.

Output: The result of the processing step is collected.

Data type and there representation

1, Data types from Computer programming perspective

Common data types include:

2. Data types from Data Analytics perspective:

III. unstructured data

Meta data –data about data

- It provides additional information about a specific set of data.

Data value chain

Basic concepts of big data

- Resource Pooling: combine available storage space, CPU, …

Hadboop and its ecosystem

- four core components: data management, access, processing, and storage

Big Data Life Cycle with Hadoop

1. Ingesting data into the system

2. Processing the data in storage

3. Computing and analyzing data

4. Visualizing the results

You might also like