Unit: I
Introduction of Analytics
Content
◆ Data and Data Science
◆ Data analytics and data analysis
◆ Classification of Analytics
◆ Application of analytics in business
◆ Types of data
● Nominal
● Ordinal
● Scale
◆ Big Data and It's Characteristics
◆ Applications of Big data
◆ Challenges in data analysis
CWB: Class With Brother
Unit: I
Introduction of Analytics
Data and Data Science
❖Data
➢ Data refers to a collection of discrete, objective facts, figures, and
statistics that are recorded, stored, and processed to represent
information. Data can be qualitative or quantitative, structured or
unstructured, and can be collected from various sources, including
sensors, surveys, transactions, and observations.
➢Characteristics of Data
■ Facts and Figures: Data consists of facts and figures that
are objective and verifiable.
■ Recorded and Stored: Data is recorded and stored in a
digital or physical format.
■ Processed and Analyzed: Data is processed and analyzed to
extract insights and meaning.
■ Qualitative and Quantitative: Data can be qualitative (text,
images, audio) or quantitative (numbers, amounts).
■ Structured and Unstructured: Data can be structured
(organized, formatted) or unstructured (unorganized,
unformatted).
➢Importance of Data
■ Decision-Making: Data informs decision-making in various
fields, including business, healthcare, education, and
government.
■ Insights and Patterns: Data analysis reveals insights and
patterns that can improve processes, products, and services.
■ Innovation: Data drives innovation, enabling the development
of new products, services, and technologies.
■ Competitive Advantage: Data provides a competitive
advantage, enabling organizations to outperform their peers.
➢Limitations of Data
■ Accuracy: Data may be inaccurate or incomplete.
■ Bias: Data can be biased due to sampling errors or cultural
influences.
■ Quality: Poor data quality can lead to incorrect insights and
decisions.
■ Context: Data lacks context, making it difficult to understand
the underlying circumstances.
■ Interpretation: Data requires interpretation, which can be
subjective and influenced by personal opinions.
❖Data Science
➢ Data Science is a study of multidisciplinary field that extracts insights
and knowledge from structured and unstructured data using various
CWB: Class With Brother
Unit: I
Introduction of Analytics
techniques, such as machine learning, statistics, and data
visualization.
➢Characteristics of Data Science
■ Interdisciplinary: Combines statistics, computer science, and
domain-specific knowledge.
■ Data-driven: Extracts insights from data to inform decisions.
■ Analytical: Uses statistical and machine learning techniques
to analyze data.
■ Computational: Relies on computational power and
algorithms to process data.
■ Iterative: Involves iterative cycles of data collection, analysis,
and interpretation.
➢Importance of Data Science
■ Informed Decision-Making: Enables data-driven
decision-making.
■ Innovation: Drives innovation in various industries.
■ Competitive Advantage: Provides a competitive edge in
business.
■ Social Impact: Can address social and environmental
challenges.
■ Economic Growth: Contributes to economic growth and
development.
➢Limitations of Data Science
■ Data Quality: Poor data quality can lead to inaccurate
insights.
■ Bias: Models can perpetuate existing biases if not addressed.
■ Interpretability: Complex models can be difficult to interpret.
■ Overfitting: Models can become overly complex and fail to
generalize.
■ Ethics: Raises ethical concerns, such as privacy and fairness.
Data Analysis and Analytics
★Data Analysis
○ Data analysis is the process of extracting insights and meaningful
information from data, typically using statistical and analytical
techniques. It involves examining, transforming, and modeling data to
uncover patterns, trends, and correlations.
○ Descriptive Statistics: Data analysis uses descriptive statistics to
summarize and describe data, such as means, medians, and
standard deviations.
○ Data Visualization: Data analysis often employs data visualization
techniques to communicate insights and findings, such as charts,
graphs, and tables.
○ Hypothesis Testing: Data analysis involves hypothesis testing to
validate assumptions and conclusions, using statistical methods to
CWB: Class With Brother
Unit: I
Introduction of Analytics
determine significance and confidence intervals.
○ Insight Generation: Data analysis aims to generate actionable
insights that inform business decisions, optimize processes, and
solve problems.
★Data Analytics
○ Data analytics is the process of examining data sets to draw
conclusions about the information they contain, using a combination
of statistical analysis, machine learning, and data visualization
techniques. It involves using data to drive business decisions,
optimize processes, and predict future outcomes.
○ Predictive Modeling: Data analytics uses predictive modeling
techniques, such as regression and decision trees, to forecast future
outcomes and identify potential risks.
○ Machine Learning: Data analytics employs machine learning
algorithms to identify patterns and relationships in data, and to make
predictions or recommendations.
○ Data Mining: Data analytics involves data mining techniques to
discover hidden patterns and relationships in large datasets.
○ Business Decision-Making: Data analytics provides insights and
recommendations to inform business decisions, drive strategy, and
optimize operations.
Classification of Analytics
❖ Analytics can be classified into four main categories based on their
functionality and purpose. This classification helps organizations understand
the different types of analytics and how they can be applied to drive
business decisions.
❖Descriptive Analytics (What Happened)
➢ Meaning:- Descriptive analytics focuses on analyzing historical data
to understand what happened. It provides insights into past
performance, trends, and patterns.
➢ Example:
■ Analyzing customer purchase data to identify top-selling
products.
➢Characteristics of Descriptive Analytics
■ Data Aggregation: Descriptive analytics involves aggregating
data from various sources to provide a comprehensive view.
■ Data Visualization: It uses data visualization techniques to
present complex data in a clear and concise manner.
■ Key Performance Indicators (KPIs): Descriptive analytics
helps track KPIs to measure performance and identify areas
for improvement.
❖Diagnostic Analytics (Why did it Happen)
➢ Meaning: Diagnostic analytics examines data to determine why
something happened. It uses techniques such as data mining,
CWB: Class With Brother
Unit: I
Introduction of Analytics
statistical analysis, and data visualization to identify causes and
relationships.
➢ Example:
■ Investigating why sales of a particular product decreased by
20% in the last quarter.
➢Characteristics of Diagnostic Analytics
■ Root Cause Analysis: Diagnostic analytics helps identify the
root cause of problems or issues.
■ Correlation Analysis: It examines relationships between
variables to understand how they impact each other.
■ Drill-Down Analysis: Diagnostic analytics involves drilling
down into data to examine specific details and patterns.
❖Predictive Analytics (What likely to Happen)
➢ Meaning: Predictive analytics uses statistical models, machine
learning algorithms, and data mining techniques to forecast what may
happen in the future. It helps organizations anticipate and prepare for
potential outcomes.
➢ Example:
■ Forecasting future sales of a new product based on historical
data, market trends, and customer behavior.
➢Characteristics of Prescriptive Analytics
■ Statistical Modeling: Predictive analytics uses statistical
models to identify patterns and trends in data.
■ Machine Learning Algorithms: It employs machine learning
algorithms to analyze data and make predictions.
■ Forecasting: Predictive analytics helps forecast future events
or outcomes based on historical data and trends.
❖4. Prescriptive Analytics (What Should be Done)
➢ Meaning: Prescriptive analytics provides recommendations on what
actions to take to achieve a desired outcome. It uses optimization
techniques, simulation modeling, and decision analysis to identify the
best course of action.
➢ Example:
■ Recommending the optimal price for a new product to
maximize profits based on predicted demand, production
costs, and market conditions.
➢ Characteristics of Prescriptive Analytics
■ Optimization Techniques: Prescriptive analytics uses
optimization techniques to identify the most effective solutions.
■ Simulation Modeling: It employs simulation modeling to test
different scenarios and predict outcomes.
■ Decision Analysis: Prescriptive analytics helps analyze
different options and choose the best course of action.
CWB: Class With Brother
Unit: I
Introduction of Analytics
Application of Analytics in Business
★Customer Analytics
○ Definition: Analyzing customer data to understand behavior,
preferences, and needs.
○ Example: A retail company uses customer analytics to identify
high-value customers and offer personalized promotions.
★2. Financial Analytics
○ Definition: Analyzing financial data to inform investment, risk, and
financial planning decisions.
○ Example: A bank uses financial analytics to predict credit risk and
optimize loan portfolios.
★3. Business Intelligence
○ Definition: Analyzing business data to inform strategic decisions and
drive business growth.
○ Example: A company uses business intelligence to track key
performance indicators (KPIs) and adjust business strategies.
★4. Operational Analytics
○ Definition: Analyzing operational data to optimize processes and
improve efficiency.
○ Example: A manufacturing company uses operational analytics to
predict equipment failures and schedule maintenance.
★5. Risk Analytics
○ Definition: Analyzing data to identify, assess, and mitigate business
risks.
○ Example: A financial institution uses risk analytics to detect and
prevent fraudulent transactions.
★6. Marketing Analytics
○ Definition: Analyzing marketing data to measure campaign
effectiveness and optimize marketing strategies.
○ Example: An e-commerce company uses marketing analytics to
track website traffic and optimize ad spend.
Type of Data
❖ Data is classified into three different categories basis of Measurement and
Structure level.
❖Nominal Data (Categorical, No Order)
➢ Nominal data is used for labeling or categorizing variables without
implying any sort of order. This type of data is often used for
classification purposes.
➢ Example:-
■ Classifying customers as "New" or "Existing", or categorizing
products as "Electronics", "Clothing", or "Home Goods".
CWB: Class With Brother
Unit: I
Introduction of Analytics
➢Characteristics
■ Categorical: Data is grouped into categories.
■ No order: Categories have no order or ranking.
■ Labels: Values are names or labels.
➢Limitations
■ Non mathematical: Can't do mathematical operations like
addition or subtraction.
■ Limited analysis: Only shows frequency and percentage.
■ No comparison: Can't compare or rank categories.
❖Ordinal Data (Order Categories)
➢ Ordinal data represents a ranked series of values where the order is
significant, but the intervals between the ranks may not be equal.
This type of data is often used for rating or ranking purposes.
➢ Example:-
■ Rating a restaurant as "Good", "Better", or "Best", or ranking
employees as "Junior", "Senior", or "Executive".
➢Characteristics
■ Categorical: Data is grouped into categories.
■ Order exists: Categories have a natural order or ranking.
■ Ranking possible: Can rank or compare categories.
➢Limitations
■ Intervals not equal: Differences between categories are not
equal.
■ Limited mathematics: Can't do advanced mathematical
operations.
■ Assumes consistent differences: Assumes differences
between categories are consistent.
❖ Scale Data (Interval and Ratio Data)
➢ Scale data represents measurable quantities with equal intervals
between consecutive levels and a true zero point. This type of data is
often used for numerical measurements.
➢ Examples:-
■ Measuring the temperature in Celsius or Fahrenheit, or
counting the number of customers in a store.
➢Characteristics
■ Quantitative: Data is measured in numbers.
■ Equal intervals: Differences between values are equal.
■ True zero point: Has a true zero point for accurate
comparison.
➢Limitations
■ Sensitive to outliers: Extreme values affect analysis.
■ Assumes normal distribution: Assumes data follows a
normal curve.
■ Requires precise measurement: Needs accurate and
precise data collection.
CWB: Class With Brother
Unit: I
Introduction of Analytics
Big Data & Its Characteristics
★Big Data
○ Big Data refers to the vast and large amounts of structured,
semi-structured, and unstructured data that organizations collect and
analyze to gain insights and make informed decisions.
★Sources of Big Data
○ Big Data comes from various sources, including:
○ Financial transactions: Banking transactions, stock market
transactions, and online payments.
○ Social Media Posts: Facebook, Twitter, Instagram, and other social
media platforms.
○ E-Commerce transactions: Online shopping transactions, customer
reviews, and ratings.
○ Sensor Data: IoT devices, GPS data, and sensor data from industrial
equipment.
○ Machine Data: Machine logs, error messages, and performance
metrics.
★Impact of Big Data on Businesses
★ Make better decisions: By analyzing insights from data, businesses can
make informed decisions.
★ Create customer-centric products: Big Data helps businesses understand
customer behavior and preferences.
★ Improve processes: Big Data analysis helps businesses optimize
processes and policies.
★ Solve business problems: Big Data can help businesses identify and
solve complex problems.
★
★Analysis of Big Data
○ Big Data analysis is the process of examining large, complex
datasets to uncover hidden patterns, correlations, and insights.Big
Data analysis involves various tools Including-
○ Machine Learning: Algorithms that enable machines to learn from
data.
○ Predictive Modeling: Statistical models that predict future
outcomes.
○ Mathematical Analysis: Statistical and mathematical techniques to
analyze data.
○ Data Mining: Discovering patterns and relationships in large
datasets.
★Technologies for Big Data
○ Big Data technologies are specialized tools and frameworks
designed to handle the unique challenges of large, complex datasets,
enabling organizations to store, process, analyze, and visualize data
insights.
CWB: Class With Brother
Unit: I
Introduction of Analytics
○ Python Programming Language: A popular language for data
analysis and machine learning.
○ Hadoop: A distributed computing framework for processing large
datasets.
○ Data Visualization: Data Visualization creates graphical
representations of data to understand and communicate insights,
identifying patterns, trends, and anomalies.
○ Data Warehousing: Data Warehousing designs, builds, and
maintains a centralized repository for storing and analyzing large
datasets, integrating data from multiple sources.
○ Cloud Computing: Cloud Computing provides scalable, on-demand
computing services over the internet, offering cost savings, flexibility,
and enhanced collaboration.
○ Spark: An in-memory computing framework for fast data processing.
○ NoSQL Databases: Databases designed for handling large amounts
of unstructured data.
★Characteristics of Big Data (5Vs)
○ Volume [Amount of Data]
○ Velocity [Speed of Data Processing]
○ Variety [Different Data Types]
○ Veracity [Data Accuracy and Reliability]
○ Value [Business Impact of Data]
○ Volume (Amount of Data)
■ Definition:- The massive scale of data generated every
second, characterized by large amounts of data from various
sources, requiring specialized tools and techniques.
■ Example:-
● Walmart generates 2.5 petabytes of data every hour.
○ Velocity (Speed of Data Processing)
■ Definition:- The speed at which data is generated and
processed, requiring real-time processing and analysis to keep
up with the high speed of data generation.
■ Example:-
● Twitter processes 500 million tweets daily in real-time.
○ Variety (Different Types of Data)
■ Definition:- Data comes in structured, semi-structured, and
unstructured formats, requiring specialized tools and
techniques to integrate and analyze the diverse data types.
■ Example:-
● Hospital patient records include structured,
semi-structured, and unstructured data.
○ Veracity (Data Accuracy and Reliability)
■ Definitions:- Ensuring data integrity and accuracy, requiring
data validation, cleaning, and normalization to prevent errors,
biases, and inaccuracies.
■ Example:-
CWB: Class With Brother
Unit: I
Introduction of Analytics
● Bank transaction data requires accuracy and reliability.
○ Value (Business Impact of Data)
■ Definition:- Extracting useful insights for decision-making,
requiring data analysis, visualization, and expertise in
data-driven decision-making to uncover hidden patterns and
trends.
■ Example:-
● E-commerce company analyzes customer data to
inform marketing strategies.
★Big Data's Affect in Businesses
○ Improved decision-making: Big Data provides insights that inform
business decisions.
○ Increased efficiency: Big Data helps optimize business processes
and operations.
○ Enhanced customer experience: Big Data helps personalize
customer interactions and improve customer satisfaction.
○ Competitive advantage: Big Data provides a competitive edge by
enabling businesses to innovate and stay ahead of the competition.
★Limitations of Big Data
○ Data Quality Issues: Inaccurate, incomplete, or inconsistent data
can lead to flawed insights and poor decision-making.
○ Data Security Concerns: Big Data's sheer volume and variety
increase the risk of data breaches, cyber attacks, and unauthorized
access.
○ High Storage and Processing Costs: Managing and analyzing
massive datasets can be expensive, requiring significant investments
in infrastructure and technology.
○ Complexity in Data Analysis: Big Data's complexity can overwhelm
traditional analytics tools, requiring specialized skills and expertise to
extract insights.
○ Potential for Bias and Errors: Biased algorithms, flawed models, or
incorrect assumptions can lead to inaccurate predictions and poor
decision-making.
Application of Big Data
❖ Big Data applications transform industries, enhancing decision-making,
efficiency, and innovation in healthcare, finance, retail, marketing,
transportation, and more, driving business growth and improvement.
❖Healthcare
➢ Big Data in healthcare involves analyzing large amounts of medical
data to improve patient outcomes, reduce costs, and enhance the
overall quality of care.
➢ Some examples include:
■ Personalized medicine: Analyzing genetic data to tailor
treatments to individual patients.
CWB: Class With Brother
Unit: I
Introduction of Analytics
■ Disease diagnosis and prediction: Using machine learning
algorithms to identify high-risk patients and predict disease
progression.
■ Patient outcome analysis: Analyzing large datasets to
identify best practices and improve patient outcomes.
❖Finance
➢ Big Data in finance involves analyzing large amounts of financial data
to identify trends, manage risk, and optimize investments.
➢ Some examples include:
■ Risk management: Analyzing market data and transactional
data to identify potential risks and opportunities.
■ Fraud detection: Using machine learning algorithms to
identify suspicious transactions and prevent financial losses.
■ Credit scoring: Analyzing credit history and other data to
assign credit scores and determine loan eligibility.
❖Retail
➢ Big Data in retail involves analyzing large amounts of customer data
to personalize marketing, optimize operations, and improve customer
satisfaction.
➢ Some examples include:
■ Customer segmentation: Analyzing customer data to identify
segments and tailor marketing campaigns.
■ Recommendation systems: Using machine learning
algorithms to recommend products based on customer
behavior and preferences.
■ Supply chain optimization: Analyzing data on inventory
levels, shipping, and logistics to optimize supply chain
operations.
❖Marketing
➢ Big Data in marketing involves analyzing large amounts of customer
data to personalize marketing, measure campaign effectiveness, and
optimize marketing strategies.
➢ Some examples include:
■ Customer behavior analysis: Analyzing data on customer
behavior, such as website interactions and social media
activity.
■ Sentiment analysis: Using natural language processing to
analyze customer sentiment and opinions.
■ Targeted advertising: Using data on customer behavior and
preferences to deliver targeted ads.
❖5. Transportation
➢ Big Data in transportation involves analyzing large amounts of data
on traffic patterns, vehicle performance, and logistics to optimize
routes, reduce congestion, and improve safety.
➢ Some examples include:
■ Traffic management: Analyzing data on traffic patterns to
CWB: Class With Brother
Unit: I
Introduction of Analytics
optimize traffic signal timing and reduce congestion.
■ Route optimization: Using data on traffic patterns and road
conditions to optimize routes for delivery trucks and other
vehicles.
■ Autonomous vehicles: Analyzing data from sensors and
cameras to enable self-driving cars to navigate roads safely.
Challenges in Data Analysis
★ Challenges in data analysis include data quality, volume, security,
complexity, and interpretation issues, requiring specialized skills, tools, and
techniques to overcome.
★Data Quality Issues
○ Data quality issues refer to problems with the accuracy,
completeness, and consistency of data.
○ This can include:
■ Inaccurate or incomplete data: Data may be incorrect,
missing, or duplicated, which can lead to incorrect insights and
decisions.
■ Noisy or inconsistent data: Data may contain errors,
outliers, or inconsistencies that can affect analysis results.
■ Missing values or outliers: Data may contain missing values
or outliers that can impact analysis results and require
additional processing.
★ Data quality issues can be addressed through data cleaning, data
validation, and data normalization.
★Data Volume and Complexity
○ Data volume and complexity refer to the challenges of handling large
and complex datasets.
○ This can include:
■ Large datasets: Datasets may be too large to process using
traditional analysis tools, requiring specialized big data tools
and techniques.
■ Complex data structures: Data may have complex
structures, such as graphs or networks, that require
specialized analysis techniques.
■ High-dimensional data: Data may have many variables,
making it difficult to analyze and visualize.
★Data Security and Privacy
○ Data security and privacy refer to the challenges of protecting
sensitive data from unauthorized access and ensuring compliance
with regulations.
○ This can include:
■ Protecting sensitive data: Data may contain sensitive
information, such as personal identifiable information (PII), that
requires special protection.
CWB: Class With Brother
Unit: I
Introduction of Analytics
○ Ensuring compliance with regulations: Organizations must comply
with regulations, such as GDPR and HIPAA, that govern data
protection and privacy.
○ Preventing data breaches: Organizations must take steps to
prevent data breaches and cyber attacks that can compromise
sensitive data.
★Limited Resources
○ Limited resources refer to the challenges of conducting data analysis
with limited computing power, storage, budget, or personnel.
○ This can include:
■ Insufficient computing power: Analysis may require
specialized computing resources, such as high-performance
computing or cloud computing.
■ Limited budget: Organizations may have limited budget for
data analysis tools, training, and personnel.
■ Difficulty scaling analysis: Analysis may need to be scaled
to handle large datasets or complex analysis tasks.
★Interpretation and Communication
○ Interpretation and communication refer to the challenges of
interpreting complex analysis results and communicating insights
effectively to stakeholders.
★ This can include:
○ Interpreting complex results: Analysis results may be complex and
require specialized expertise to interpret.
○ Communicating insights effectively: Insights must be
communicated effectively to stakeholders, including non-technical
stakeholders.
○ Avoiding biases: Analysts must avoid biases in interpretation and
presentation of results.
★Keeping Up with Technology
○ Keeping up with technology refers to the challenges of staying
current with new data analysis tools, techniques, and methodologies.
○ This can include:
■ Staying current with new tools: New data analysis tools and
technologies are emerging rapidly, requiring analysts to stay
current.
■ Integrating new technologies: New technologies must be
integrated into existing workflows and systems.
■ Ensuring compatibility: New technologies must be
compatible with existing systems and infrastructure.
★Data Integration and Governance
○ Data integration and governance refer to the challenges of integrating
data from multiple sources and systems, and ensuring data
consistency and standardization.
○ This can include:
■ Integrating data from multiple sources: Data may come
CWB: Class With Brother
Unit: I
Introduction of Analytics
from multiple sources, requiring integration and
standardization.
■ Ensuring data consistency: Data must be consistent across
different systems and sources.
■ Establishing data governance policies: Organizations must
establish policies and procedures for data governance,
including data quality, security, and privacy.
CWB: Class With Brother