Bda Case Study UBER

user bda

Uploaded by

btsislifeduh7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views7 pages

Bda Case Study UBER

user bda

Uploaded by

btsislifeduh7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Uber Case-Study

Uber, as we all might have heard about it.

But does any one know how huge amount of data handling does uber
have to do?
What made uber jump from traditional database to Hadoop ?

End user 100 petrabytes

details( driver, of analytical
customer, data
employees
etc.)

Forecasting, traffic
jams
Uber Using Traditional
database (Before 2014)
To leverage the data, their engineers had to access each database or table
individually. At that time, they didn’t have global access or a global view of all
their stored data. In fact, our data was scattered across different OLTP
databases.

You might get a question, then why did they

desired to switch for Data warehouse ?
Elastic mapReduce

Extract, transform
and load

Limitations:
• Since data was ingested through ad hoc ETL jobs and they lacked a formal schema communication
mechanism, data reliability became a concern. Most of their source data was in JSON format, and
ingestion jobs were not resilient to changes in the producer code.
• As company was in emerging state, scaling data warehouse became increasingly expensive for them.
To cut down on costs, they started deleting older, obsolete data to free up space for new data.
• The same data could be ingested multiple times if different users performed different transformations
during ingestion. this resulted in multiple copies of almost the same data being stored in our
warehouse, further increasing storage costs.
Introduces Hadoop

Limitations:
❑ As the company continued scaling and with tens of petabytes of data stored in our ecosystem, we
faced a new set of challenges.
❑ Data latency was still far from what the business needed. New data was only accessible to users
once every 24 hours, which was too slow to make real-time decisions.
❑ Since HDFS and Parquet do not support data updates, all ingestion jobs needed to create new
snapshots from the updated source data, ingest the new snapshot into Hadoop, convert it into
Parquet format, and then swap the output tables to view the new data.
A big part of each job involved converting both historical and new data from the latest snapshot.
While only over 100 gigabytes of new data was added every day for each table, each run of the
ingestion job had to convert the entire, over 100 terabyte dataset for that specific table. This was
also true for ETL and modeling jobs that recreated new derived tables on every run. These jobs had
to rely on snapshot-based ingestion of the source data because of the high ratio of updates on
historical data. By nature, the data contains a lot of update operations (i.e., rider and driver ratings or
support fare adjustments a few hours or even days after a completed trip).
What’s
next ?
Generation 4

HDFS scalability limitation, Faster data in Hadoop, Support of updates and deletes in Hadoop
and Parquet, Faster ETL and modeling. They built Hadoop Upserts anD
Using the Hudi library, they were able to move away from Incremental(Hudi), an open source Spark
the snapshot-based ingestion of raw data to an incremental library that provides an abstraction layer on top
ingestion model that enabled them to reduce data latency of HDFS and Parquet to support the required
from 24 hours to less than one hour. update and delete operations.

They also formalized the hand-over of upstream data

store changes between the storage and big data
teams through Apache Kafka for generic data
ingestion.

Uber
No ratings yet
Uber
14 pages
Uber's Big Data Evolution Case Study
No ratings yet
Uber's Big Data Evolution Case Study
17 pages
ECS765P - W6 - Big Data Ingestion and Storage
No ratings yet
ECS765P - W6 - Big Data Ingestion and Storage
34 pages
Big Data
No ratings yet
Big Data
28 pages
Big Data & Hadoop Training Material 0 1 PDF
50% (2)
Big Data & Hadoop Training Material 0 1 PDF
168 pages
DocScanner Jan 12, 2023 2-29 PM
No ratings yet
DocScanner Jan 12, 2023 2-29 PM
32 pages
Big Data: Hadoop Framework Guide
No ratings yet
Big Data: Hadoop Framework Guide
3 pages
Fundamentals of Data Engineering by Joe Reis and Matt Housley 81
No ratings yet
Fundamentals of Data Engineering by Joe Reis and Matt Housley 81
6 pages
Big Data Training
No ratings yet
Big Data Training
244 pages
Big Data Lab File
No ratings yet
Big Data Lab File
49 pages
Introduction To Spark
No ratings yet
Introduction To Spark
30 pages
Spark For Python Developers - Sample Chapter
100% (6)
Spark For Python Developers - Sample Chapter
32 pages
Hadoop Dealing Big Data
No ratings yet
Hadoop Dealing Big Data
3 pages
GCP Data
No ratings yet
GCP Data
6 pages
Big Data Analytics
100% (1)
Big Data Analytics
14 pages
Big Data
No ratings yet
Big Data
3 pages
Hadoop: Extending Your Data Warehouse: An Ovum White Paper For Cloudera
No ratings yet
Hadoop: Extending Your Data Warehouse: An Ovum White Paper For Cloudera
15 pages
Cloud Comp Techno
No ratings yet
Cloud Comp Techno
5 pages
Ingestion Layer PDF
No ratings yet
Ingestion Layer PDF
11 pages
Jifs223295 2
No ratings yet
Jifs223295 2
25 pages
Big Data and Hadoop: Senior Product Specialist
No ratings yet
Big Data and Hadoop: Senior Product Specialist
40 pages
In9040 PHD Presentation Selimozcan 2
No ratings yet
In9040 PHD Presentation Selimozcan 2
36 pages
Dhan Singh Big Data File - 7
No ratings yet
Dhan Singh Big Data File - 7
1 page
Analyzing Big Data in Hadoop Spark
No ratings yet
Analyzing Big Data in Hadoop Spark
30 pages
BIG DATA Class 1 1741496163
No ratings yet
BIG DATA Class 1 1741496163
108 pages
Big Data Challenges and Solutions
No ratings yet
Big Data Challenges and Solutions
36 pages
Spark Tutorial
No ratings yet
Spark Tutorial
77 pages
Big Data Deals With Large Data Sets
No ratings yet
Big Data Deals With Large Data Sets
4 pages
Data Engineering - Session 03
No ratings yet
Data Engineering - Session 03
26 pages
Hadoop and Spark for Big Data Analysis
No ratings yet
Hadoop and Spark for Big Data Analysis
48 pages
Big Data Processing With Apache Spark - Part 1 - Introduction - InfoQ
No ratings yet
Big Data Processing With Apache Spark - Part 1 - Introduction - InfoQ
18 pages
GCP - DataPlex - Building A Data Lakehouse
No ratings yet
GCP - DataPlex - Building A Data Lakehouse
19 pages
Introduction to Big Data Tools
No ratings yet
Introduction to Big Data Tools
40 pages
19 Databricks
No ratings yet
19 Databricks
28 pages
Unified Big Data Lambda Architecture Wit
No ratings yet
Unified Big Data Lambda Architecture Wit
13 pages
Practical Hadoop by Example: For Relational Database Professioanals
No ratings yet
Practical Hadoop by Example: For Relational Database Professioanals
55 pages
Spark Development for Developers
No ratings yet
Spark Development for Developers
172 pages
Guha Roy 2017
No ratings yet
Guha Roy 2017
3 pages
BDA Unit 3
No ratings yet
BDA Unit 3
7 pages
ReductStore - White Paper - Review
No ratings yet
ReductStore - White Paper - Review
7 pages
Data Science
No ratings yet
Data Science
87 pages
Ashish Presentation Stage1 Modify LR
No ratings yet
Ashish Presentation Stage1 Modify LR
24 pages
Q. What Is Big Data?
No ratings yet
Q. What Is Big Data?
8 pages
BDA Final
No ratings yet
BDA Final
23 pages
Hadoop Job Runner UI Tool
No ratings yet
Hadoop Job Runner UI Tool
10 pages
Cloud Computing Unit 3
No ratings yet
Cloud Computing Unit 3
10 pages
Data Engineering Life Cycle
No ratings yet
Data Engineering Life Cycle
33 pages
IoT Module 5
No ratings yet
IoT Module 5
9 pages
BDA I Unit
No ratings yet
BDA I Unit
44 pages
EoDA Open QA
No ratings yet
EoDA Open QA
1 page
Data Intensive Computing
No ratings yet
Data Intensive Computing
18 pages
GCP Technologies
No ratings yet
GCP Technologies
12 pages
Big Data - Hadoop
No ratings yet
Big Data - Hadoop
38 pages
Big Data Processing With Apache Spark - Infoqdotcom
No ratings yet
Big Data Processing With Apache Spark - Infoqdotcom
16 pages
SPARK
No ratings yet
SPARK
27 pages
4 Big Data Architectures, Data Streaming, Lambda Architecture, Kappa Architecture, and Unifield Architecture
No ratings yet
4 Big Data Architectures, Data Streaming, Lambda Architecture, Kappa Architecture, and Unifield Architecture
7 pages
Bdhs - Ebook
No ratings yet
Bdhs - Ebook
970 pages
Radar Manual EN V011531312026193
No ratings yet
Radar Manual EN V011531312026193
20 pages
Quality Control & Reliability Course
No ratings yet
Quality Control & Reliability Course
2 pages
Samsung Focus Manual - ATT Windows Phone 7
No ratings yet
Samsung Focus Manual - ATT Windows Phone 7
148 pages
Helios PV's Innovative Solar Container Solution
No ratings yet
Helios PV's Innovative Solar Container Solution
3 pages
Codigos 3412C
No ratings yet
Codigos 3412C
6 pages
Kaspersky Endpoint Detection and Response Optimum 2.0 Help
No ratings yet
Kaspersky Endpoint Detection and Response Optimum 2.0 Help
34 pages
33 Ford 3D 4F27E FN4A-EL FNR5 PDF
100% (5)
33 Ford 3D 4F27E FN4A-EL FNR5 PDF
14 pages
Hiq 200 Ea
No ratings yet
Hiq 200 Ea
3 pages
0006 - 035 - MANAGEMENT OF CHANGE (MOC) Up-To-Date
No ratings yet
0006 - 035 - MANAGEMENT OF CHANGE (MOC) Up-To-Date
7 pages
User Manual: Precision Scales - WLC Series
No ratings yet
User Manual: Precision Scales - WLC Series
78 pages
N5k N2k Ethernet FCoE Lab Guide Student v2 1
No ratings yet
N5k N2k Ethernet FCoE Lab Guide Student v2 1
30 pages
AngularJS UI Development
100% (2)
AngularJS UI Development
258 pages
ABB Ransomware Resilience - Whitepaper - 9AKK108469A3563 - Rev A
No ratings yet
ABB Ransomware Resilience - Whitepaper - 9AKK108469A3563 - Rev A
23 pages
Cloud Computing
No ratings yet
Cloud Computing
25 pages
VLSI Design: MOS Transistor Theory
No ratings yet
VLSI Design: MOS Transistor Theory
19 pages
Wohaib Mohsin CV With Dae
No ratings yet
Wohaib Mohsin CV With Dae
5 pages
Appliances: Application Form
No ratings yet
Appliances: Application Form
2 pages
Power Script Reference Manual
No ratings yet
Power Script Reference Manual
60 pages
GCEM-cctv Camera & NVR STOCK Details
No ratings yet
GCEM-cctv Camera & NVR STOCK Details
12 pages
Level 3 - Project Schedule - Gambat South EWT Project
100% (1)
Level 3 - Project Schedule - Gambat South EWT Project
8 pages
MTCRE Presentation Material-English
No ratings yet
MTCRE Presentation Material-English
157 pages
A 12-Bit SAR ADC With A DAC-Configurable Window Switching Scheme
No ratings yet
A 12-Bit SAR ADC With A DAC-Configurable Window Switching Scheme
11 pages
PPS Practical 1
No ratings yet
PPS Practical 1
4 pages
Tkinter GUI Programming Essentials
No ratings yet
Tkinter GUI Programming Essentials
26 pages
Certificado Dos Inversores
No ratings yet
Certificado Dos Inversores
2 pages
Pumps: Key Concepts
No ratings yet
Pumps: Key Concepts
8 pages
Pega 123
No ratings yet
Pega 123
2 pages
Networking Devices: Hubs vs. Switches
No ratings yet
Networking Devices: Hubs vs. Switches
3 pages
IT Management & Infrastructure Expert
No ratings yet
IT Management & Infrastructure Expert
3 pages
Power Meter Specs for Engineers
No ratings yet
Power Meter Specs for Engineers
1 page

Bda Case Study UBER

Uploaded by

Bda Case Study UBER

Uploaded by

Uber Case-Study

Uber, as we all might have heard about it.

End user 100 petrabytes

You might get a question, then why did they

They also formalized the hand-over of upstream data

You might also like