1.
Modernizing Data Lakes and Data Warehouses with
Google cloud
1.1. Introduction to Data Engineering
1.1.1. The role of a data engineer
One example of Data lake = Cloud Storage bucket
Example of Data Warehouse is Big Query
Batch data
Data proc manages Hadoop and spark services.
Hadoop = processed to several servers instead of
1 single machine.
Streaming data
1.1.2. Data Engineering Challenges
1.1.3. Introduction to Big Query
1.1.4. Data Lakes and Data Warehouses
1.1.5. Transactional Databases vs Data warehouses
A database stores the current data required to power an
application. A data warehouse stores current and
historical data from one or more systems in a predefined
and fixed schema, which allows business analysts and
data scientists to easily analyze the data.
1.1.6. Partner effectively with other data teams
1.1.7. Manage data access and governance
1.1.8. Build production ready pipeline
1.1.9. Customer case study
1.1.10. Recap
1.2. Building Data lakes
1.2.1. Introduction to Data lakes
Data lakes –-- data pipelines - data warehouses
Orchestration workflows = kick off data pipeline when
new raw data is available.
1.2.2. Data Storage and ETL Options on Google cloud
Federated queries let you send a query statement to
AlloyDB, Spanner, or Cloud SQL databases and get the
result back as a temporary table.
1.2.3. Build a data lake using cloud storage
1.2.4. Secure cloud storage
1.2.5. Store all sorts of data types
1.2.6. Cloud SQL as your OLTP system
1.3. Building Data Warehouse