Jamalpur Science & Technology University, Jamalpur
Assignment on
Big data uses in Industries, Hash table, Machine Learning, Bias & Fairness
Course Title: Business Analytics
Course Code: MGT-5205
Submitted By Submitted to
Md. Mehedi Hasan Sujit Roy
ID: 19113103 Assistant Professor
Session: 2022-2023 Department of CSE
MBA, Department of Management Jamalpur Science & Technology
Jamalpur Science & Technology University, Jamalpur
University, Jamalpur
Date of Submission: 06-05-2025
1
Big Data use cases in different industries
Definition of Big Data:
Big data is a combination of structured, semi-structured and unstructured data that organizations
collect, analyze and mine for information and insights. It's used in machine learning projects,
predictive modeling and other advanced analytics applications.
Uses of Big Data in different industries:
Big data is used across industries to drive insights, improve decision-making, and enhance
performance. Here are some key use cases across various sectors:
(Figure: Big Data use cases in different industries)
1. Healthcare
Use Cases:
Predictive Analytics & Disease Prevention: By analysing electronic health records
(EHRs), social determinants of health, and wearable device data, hospitals can predict
patient deterioration and prevent readmissions.
Personalised Treatment: Genomic data and patient history are used to customise cancer
therapies and other treatments (e.g., IBM Watson Health).
Operational Efficiency: Real-time data helps optimize bed occupancy, staff allocation,
and emergency department performance.
2
Technologies:
Hadoop, Spark, Natural Language Processing (NLP), Machine Learning, IoT (wearables)
2. Retail & E-commerce
Use Cases:
Customer Behavior & Sentiment Analysis: Big data is used to track clicks, time spent
on pages, and social media sentiment for targeted advertising.
Dynamic Pricing & Recommendation Engines: Real-time pricing adjustments (like
Amazon) and tailored suggestions based on browsing and purchase history.
Inventory & Supply Chain Management: Demand forecasting and real-time tracking
of products.
Technologies:
Predictive analytics, real-time processing (Kafka), AI recommendation engines
3. Financial Services
Use Cases:
Fraud Detection & Risk Analysis: Analysing transaction patterns, geolocation, and
device fingerprints in real time to detect fraud (e.g., credit card fraud detection).
Customer Segmentation & Personalisation: Financial institutions use big data for
credit scoring and personalising loan/insurance offers.
Algorithmic Trading: Real-time data streams are used for high-frequency trading.
Technologies:
Real-time analytics (Apache Storm, Flink), AI/ML, blockchain (for transaction
verification)
4. Manufacturing
Use Cases:
Predictive Maintenance: Using sensor data from machinery to detect early signs of wear
and schedule maintenance.
3
Production Optimisation: Analysing production line data to identify bottlenecks and
reduce waste.
Supply Chain Visibility: End-to-end tracking of materials and components in real time.
Technologies:
Industrial IoT, Digital Twins, Edge Computing, Machine Learning
5. Agriculture
Use Cases:
Precision Agriculture: Drones and sensors collect data on soil moisture, crop health, and
fertilizer needs.
Weather-Based Forecasting: Big data helps model crop yield based on seasonal and
environmental factors.
Livestock Monitoring: RFID and sensor data help track animal health and optimize
feeding.
Technologies:
GIS, satellite imaging, remote sensing, AI, cloud-based data platforms
6. Transportation & Logistics
Use Cases:
Fleet Tracking & Route Optimisation: GPS and traffic data are used for real-time route
planning and fuel efficiency (e.g., UPS, FedEx).
Predictive Maintenance of Vehicles: Telemetry data is used to prevent breakdowns.
Supply Chain Forecasting: Forecast demand to manage shipping schedules and
inventory.
Technologies:
Telematics, IoT, AI routing engines, cloud logistics platforms
4
7. Energy & Utilities
Use Cases:
Smart Metering: Analyse consumer usage patterns to optimise energy distribution.
Grid Monitoring & Load Forecasting: Detect faults and balance supply and demand
more efficiently.
Renewable Energy Integration: Manage variable sources like solar/wind with demand
predictions.
Technologies:
Smart grids, IoT, cloud analytics, ML-based forecasting tools
8. Education
Use Cases:
Learning Analytics: Track student engagement and performance to identify at-risk
students.
Curriculum Personalisation: Adaptive learning systems adjust course content based on
performance data.
Resource Planning: Analyse enrolment trends for staffing and facility management.
Technologies:
Technologies:
LMS data analytics, AI tutors, NLP (for essay grading)
So, as technology advances, big data will continue to play a critical role in innovation, helping
organisations stay competitive, responsive, and prepared for the challenges and opportunities of
the digital age.
Hash table
A hash table (also known as a hash map) is a data structure that stores key-value pairs, where
each key is unique, and each key is associated with a value. The primary purpose of a hash table
is to allow for efficient retrieval, insertion, and deletion.
5
Features of Hash table
Hash Function: A hash table uses a hash function to convert the key into an index (a "hash
value") of an array where the corresponding value is stored. The hash function computes a hash
value by processing the key (often through mathematical operations) and outputs an index in the
array that helps locate the value quickly.
Array: The hash table internally uses an array to store values. The size of the array is typically
fixed or dynamically resized as needed.
Collision Handling: Collision occurs when two different keys hash to the same index in the
array. There are various methods to handle collisions:
Chaining: Store multiple key-value pairs in a list (or another data structure) at the same index.
Open Addressing: Find another open slot in the array for the new key-value pair.
How It Works:
1. A key is passed to a hash function.
2. The function returns an index in an array.
3. The value is stored at that index.
For example:
python
hash("apple") = 3 # index in array
table [3] = "A fruit"
Collision in Hash Tables
A collision occurs when two different keys produce the same hash index.
Example:
python
6
hash("cat") = 5
hash("dog") = 5 # Collision at index 5
Collision Resolution Techniques
There are two main ways to handle collisions:
1. Chaining (Separate Chaining)
Each array index stores a linked list of entries.
Multiple elements at the same index are stored in the list.
Diagram:
pgsql
Index | Elements
---------------------
5 | [("cat", x) → ("dog", y)]
2. Open Addressing
If a collision occurs, the algorithm searches for the next available slot.
Types of open addressing:
o Linear probing: Check next slot: index + 1, index + 2, ...
o Quadratic probing: Use a quadratic function to find slots.
o Double hashing: Use a second hash function to decide step size.
Example in Python-like Pseudocode
python
# Chaining
table = [[] for _ in range (10)]
def insert (key, value):
7
index = hash(key) % 10
table[index]. append ((key, value))
Summary
Hash Table: Stores data with key-value pairs using a hash function.
Collision: Happens when keys map to the same index.
Solutions: Use chaining or open addressing.
Solving technique of hash table:
The process of solving hash table is called hashing. It is a technique used to map data (keys) to a
fixed-size value (called a hash code or hash index), which determines where the data should be
stored in a hash table. It uses a hash function to calculate an index into an array of buckets, from
which the desired value can be found.
Hash Functions:
Division method:
A hash function must guarantee that the number it returns is a valid index to one of the table
cells. A simple way is to use h(k) modulo Table size. Division Modulo Method is the simplest
method of hashing. In this method, we divide the element with the size of the hash table and use
the remainder as the index of the element in the hash table.
Example 1: Suppose the table is to store strings. A very simple hash function would be to add up
ASCII values of all the characters in the string and take modulo of table size, say 97. Thus, cobb
would be stored at the location (64+3 + 64+15 + 64+2 + 64+2) % 97 = 88 hike would be stored
at the location (64+8 + 64+9 + 64+11 + 64+5) % 97 = 2
Folding Method:
In this method the key is divided in several parts (that is, the key is HASHED). The parts are
combined or folded together in certain manner. For example, a social security number 123-
456789 can be broken in three parts 123, 456,789 and then these parts can be added to yield the
position as 1368 % Table size. The folding can be done in number of ways. Another possibility
is to divide the number in 4 parts 12, 34, 56, 789 and added together.
8
Mid-Square function:
The key is squared, and the middle part of the result is used as address for the hash table. If the
key is a string, it is converted to a number. Here the entire key participates in generating the
address so that there is a better chance that different addresses are generated even for keys close
to each other. For example,
Key Squared value Middle part
3121 9740641 406
3122 9746884 468
3123 9753129 531
In practice it is more efficient to choose a power of 2 for the size of the table and extract the
middle part of the bit representation of the square of a key. If the table size is chosen in this
example as 1024, the binary representation of square of 3121 is 1001010-0101000010-
[Link] middle part can be easily extracted using a mask and a shift operation.
Machine Learning
Machine Learning (ML) is a subfield of artificial intelligence (AI) that enables computers to
learn from data and improve their performance over time without being explicitly programmed.
Instead of relying on hard-coded rules, ML algorithms identify patterns, make predictions, and
adapt to new information based on the data they process.
Machine learning (ML) can be broadly classified into several types based on how the model
learns and the kind of data used. The three main categories of machine learning are:
Supervised Learning: Supervised learning is a type of machine learning where the
model is trained using labeled data. In this approach, the model learns from the input-
output pairs in the training dataset, where the correct output (label) is known for each
input.
9
Unsupervised Learning: Unsupervised learning involves training a model using data
that has no labeled outputs. The model tries to identify patterns, structures, or
relationships in the data without predefined labels.
Reinforcement Learning: Reinforcement learning (RL) is a type of machine learning
where an agent learns to make decisions by interacting with an environment. The agent
takes actions and receives feedback in the form of rewards or penalties.
Semi-supervised Learning: Semi-supervised learning is a hybrid approach where the
model is trained using a small amount of labelled data and a large amount of unlabelled
data. This approach is useful when labelling data is expensive or time-consuming, but
there is an abundance of unlabelled data available.
Self-supervised Learning: Self-supervised learning is a type of unsupervised learning
where the model generates labels from the input data itself. The model is typically trained
on tasks like predicting part of the data.
What is Fairness and Bias in Machine Learning?
In machine learning, fairness refers to the principle that models should make decisions without
favoring or discriminating against individuals based on sensitive attributes like race, gender, or
age. Bias occurs when a model produces unfair outcomes due to skewed data, flawed
assumptions, or historical inequalities present in the training dataset.
For example, if a job screening algorithm is trained on historical data where men were more
frequently hired than women, it may learn to favour male applicants even when female
candidates are equally qualified. This results in biased decisions and violates fairness, as the
model unintentionally replicates past discrimination.
10
(Picture: Biases and Fairness in Machine Learning)
Biases and Fairness in Machine Learning
Machine learning systems can exhibit or amplify bias, often unintentionally, resulting in unfair
or discriminatory outcomes. These concerns apply across both supervised and unsupervised
learning methods.
1. Supervised Learning
Overview:
Supervised learning uses labelled data to train models for tasks like classification and
regression.
Bias in Supervised Learning:
Label Bias: Labels can be biased if they reflect historical prejudice (e.g., loan approvals
favouring one group).
Data Imbalance: Underrepresented groups in training data lead to poor performance for
those groups.
Overfitting on Majority: Models may generalize poorly for minority populations.
Fairness Issues:
Biased models may make unfair predictions, such as lower job recommendations for
women or racial minorities.
11
Fairness metrics like equal opportunity and demographic parity are typically used to
evaluate fairness in supervised settings.
Example:
An algorithm trained on past hiring data may learn to prefer male candidates if historical hiring
was gender-biased.
2. Unsupervised Learning
Overview:
Unsupervised learning identifies patterns and structures in unlabelled data (e.g., clustering,
anomaly detection).
Bias in Unsupervised Learning:
Cluster Bias: Groups formed may align with sensitive features like race or gender
unintentionally.
Representation Bias: Biased feature selection or embeddings can skew groupings.
Interpretability Issues: It’s often difficult to detect and correct bias because there are no
labels.
Fairness Challenges:
Harder to define and measure fairness without labelled outcomes.
Risk of creating biased groupings or segmentation (e.g., in marketing or policing).
Example:
A clustering algorithm used to segment customers might separate them based on income in a
way
that reflects socio-economic bias, affecting access to services or offers.
Common Techniques to Address Bias
Stage Techniques
Pre-processing Balance datasets, remove bias in features or
12
labels
In-processing Use fairness-aware models
Post-processing Calibrate outputs to achieve fairness
Fairness Metrics (Mainly in Supervised ML):
Demographic Parity: Outcome should be independent of sensitive attributes.
Equal Opportunity: Equal true positive rates across groups.
Equalized Odds: Equal true and false positive rates.
Individual Fairness: Similar individuals should be treated similarly.
Bias can emerge in both supervised and unsupervised learning through unrepresentative data or
model design. Ensuring fairness requires thoughtful metrics, preprocessing, and algorithmic
choices to build equitable and responsible AI systems.
13