0% found this document useful (0 votes)

5 views

CCD chapter 3 notes

Uploaded by

Ishwari khebade

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

CCD chapter 3 notes

Uploaded by

Ishwari khebade

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

CCD chapter 3

• cloud data governance

− Cloud data governance refers to the set of policies, procedures, and
technologies used to manage, protect, and ensure the quality, security, and
compliance of data stored in cloud environments.
− Data governance is a principled approach to managing data during its life
cycle
− Data governance is everything you do to ensure data is secure, private,
accurate, available, and usable.
− It helps organizations maintain control over their data
− It Identifies and classifies data based on its sensitivity, importance, and
requirements. Examples: Public, confidential
− It defines access control that is who can access the data
− It includes Applying encryption, firewalls, and security measures to protect
data from breaches or unauthorized access.
− It implements procedure to maintain data accuracy , consistency and
monitors for duplicate and in complete data

advantages :
1) Better Data Security: Ensures sensitive data is protected from breaches
and unauthorized access.
2) Improved Data Quality: Keeps data accurate and reliable for better
decision-making.
3) Access Control: Ensures only authorized users can access specific data.
4) Disaster Recovery: Provides backups and recovery plans to ensure data is
safe during failures.
5) Cost Management: Optimizes storage and processing costs by removing
redundant data.
6) continuous monitoring : Tracks who accessed or modified data, making it
easy to audit.
disadvantages :
1) Hard to Set Up: It takes time and effort to create and manage rules for
data in the cloud.
2) dependency on cloud provider : Organizations must trust cloud providers
to maintain security
3) Training Needs: Employees must learn how to handle data properly,
which takes time and resources.
4) Need Constant Monitoring: You have to keep an eye on the data all the
time, which can be challenging.

• key value databases

− A key-value database is a type of NoSQL database that stores data as a
collection of key-value pairs.
− Each key is a unique identifier, and the associated value can be any type of
data, such as a string, JSON object, or binary file.
− Key: A unique identifier for the data (e.g., userID123).
Value: The actual data associated with the key (e.g., { "name": "Ishwari",
"age": 20 }).
− In this database the values or data is retrieved or manipulated using key

advantages.
1) scalability : Easily expands to handle increasing amounts of data.
2) easy to use : as there is a key value simple structure it is easy to use
3) speed : It is extremely faster for simple queries
4) flexible : can store different types of data
disadvantages
1) limited searching : You can only search for data using keys, not the
values.
2) No Data Relationships: Cannot link data like relational databases
3) not ideal for large values : Performance can slow down if the stored
values are too large.
4) harder to manage complex data : Managing and retrieving data becomes
challenging for complex data
use cases :
1) E-commerce applications : Manages shopping carts and user
preferences.
2) Real-Time Analytics: Tracks events and metrics for monitoring
dashboards.
Examples of Key-Value Databases in Cloud Computing
1) Amazon DynamoDB
2) Azure Table Storage
3) Google Cloud Datastore
• batch and streaming data in machine learning
Batch data :
− Batch data refers to a collection of data that is processed in groups or
batches at specific intervals.
− It involves analyzing large datasets that have already been collected and
stored.
− Typically has higher latency since data is not processed immediately.
− This approach is widely used in machine learning (ML) for tasks that do not
require real-time processing and focus on analyzing historical data to train
predictive models.
workflow :
1) Data Collection: Collect data from various sources over a fixed period.
2) Storage: Store the collected data in a database, data warehouse, or file
system (e.g., Amazon S3)
3) Preprocessing: Clean, filter, and prepare the data for analysis. Common
preprocessing tasks include handling missing values, removing duplicates,
and normalizing data.
4) Batch Processing: Analyze or process the data in bulk using tools like
Hadoop or Python libraries.
5) Model Training: Use the processed data to train machine learning models
6) Evaluation and Deployment: Evaluate the trained model's performance
using test datasets. Deploy the model to make predictions on new data.
Advantages :
1) Efficient for Large Data Volumes: Handles large datasets in a single run,
making it suitable for big data applications.
2) Cost-Effective: it optimizes resource usage and costs.
3) Supports Complex Workflows: Handles complex transformations over large
datasets.
4) Data Consistency: Processes entire datasets at once, ensuring uniformity
and reducing inconsistencies.
5) Suitable for Model Training: Allows machine learning models to be trained
on large, historical datasets.
Disadvantages :
1) have to wait until all the data is collected and processed, so it’s not good for
instant results.
2) Processing large amounts of data at once requires a lot of computer
resources.
3) need sufficient space to store the data before it’s processed.
4) If there’s an error, you might need to start the process all over again, which
wastes time.
5) It is not suitable for real time data
Examples :
1) Weather Data
2) Customer Feedback Analysis
3) Image or Video Processing
4) generating Employee monthly salary
5) generating academic records at the end of semester

Streaming data :
− Streaming data refers to the continuous flow of data generated and
processed in real time.
− it involve analysing data immediately as it arrives
− Offers lower latency because data is processed immediately as it flows
into the system
− It is used in scenarios where data arrives in small increments over time
Workflow
1) data generation : Data is generated from sources like sensors, user
clicks
2) Ingestion: Data is ingested using tools like Google Pub/Sub.
3) preprocessing : Data is cleaned and transformed in real time to prepare
it for analysis.
4) Model Application: pretrained models are used to make predictions
5) Output: Results are used for real-time decision-making
advantages :
1) Real-Time Insights: can get results and take action immediately as data
arrives.
2) Always Updated: Works with the latest data, ensuring decisions are
based on current information.
3) Ideal for Quick Decisions: it is used when the decision to be make
immediately on the spot
disadvantages :
1) Needs Complex Systems: it needs complex tools and setup and expertise
for real time processing
2) Expensive: Requires more computing power, which can increase costs.
3) Hard to Ensure Data Quality: It’s challenging to clean or verify the data
quickly because processing happens immediately.
4) Limited for Historical Analysis: As it focuses on real time data it is
difficult to process historical data
examples
1) Social Media Monitoring
2) Fraud Detection
3) Stock Market Analysis
4) Traffic Updates

• cloud data warehouse – AWS redshift

− A cloud data warehouse is a database that stores, processes, and integrates
data in a public cloud environment.
− A Cloud Data Warehouse is a database stored in the cloud that is designed
for analyzing and querying large datasets to support business intelligence
and decision-making.
− It provides a scalable, cost-effective, and flexible alternative to traditional
on-premises data warehouses.
− a cloud data warehouse is hosted on the cloud and managed by a service
provider, eliminating the need for physical hardware.
− The system can handle increasing data volumes and workloads by scaling
resources up or down based on demand.
− Cloud data warehouses easily integrate with various data sources, business
intelligence tools, and third-party applications.
− Providers implement robust security measures such as encryption, access
control, and compliance certifications to safeguard data.
cloud data warehouse function :
− Data Ingestion: Data from multiple sources (e.g., databases, applications,
IoT devices) is collected and ingested into the cloud data warehouse using
pipelines or ETL (Extract, Transform, Load) processes.
− Data Storage: The data is stored in an optimized format for analytical
queries, often leveraging columnar storage for faster read operations.
− Data Processing: Data is processed and transformed to make it suitable for
analysis. Cloud warehouses use distributed processing to handle large
datasets efficiently.
− Query Execution: Users can run SQL or similar queries on the stored data.
The system leverages parallel processing to provide quick results.
− Analytics and Reporting: The queried data is visualized using business
intelligence tools or dashboards to derive actionable insights.
advantages :
− Reduced Setup Time: Organizations can quickly deploy a cloud data
warehouse without the need for extensive hardware or software
installations.
− Performance Optimization: Cloud data warehouses use advanced
technologies like in-memory computing and massively parallel processing
(MPP) for high-speed analytics.
− Global Accessibility: Data can be accessed from anywhere with an
internet connection.
− Support for Big Data: Cloud data warehouses are built to handle
massive datasets and complex analytics workloads.
challenges:
− Moving data from on-premises systems to the cloud can be time-
consuming.
− Mismanagement of resources can lead to higher costs.
− Ensuring compliance with data regulations can be complex.
− it needs high bandwidth network
− Although providers implement robust security measures, storing
sensitive data in the cloud can expose it to potential breaches.
− vendor lock in : Switching providers or transitioning back to an on-
premises system can be difficult due to proprietary technologies and data
formats.

AWS Redshift:
− Amazon Redshift is a fully managed, cloud-based data warehousing
service provided by AWS.
− It is designed for storing and analyzing large volumes of structured and
semi-structured data using SQL-based tools and business
intelligence applications.
− Redshift is optimized for high-performance analytics, supporting
complex queries on large datasets with scalability, speed, and cost-
effectiveness.
workflow :

1. Data Loading:
Data is uploaded to Redshift from sources like files, databases, or other
cloud services (e.g., Amazon S3).
2. Data Storage:
Data is organized in a column format, compressed, and stored in the
cloud for easy access.
3. Querying Data:
You can run SQL queries to analyze data, generate reports, or find
insights.
4. Processing Queries:
Redshift splits tasks across multiple servers to process large queries
faster.

Architecture :
1. Leader Node: The leader node acts like a manager or controller. It:
• Receives queries from users or applications.
• Creates a plan to execute those queries.
• Coordinates the work with compute nodes.
• Returns the final results to the user.
2. Compute Nodes : Compute nodes are like the workers. They:
• Store the actual data in the form of slices (chunks of data).
• Perform the heavy lifting by processing queries sent by the leader
node.
• Work in parallel to speed up tasks.
3. Storage Layer: Redshift stores data in a columnar format
(organized by columns, not rows).
This format is faster for analytical queries because only the relevant
columns are read.
4. Network and Integration: Redshift is connected to AWS services
and external tools to ingest and export data easily. Data can be loaded
from Amazon S3, databases, or streaming services.
features :

1. Fast Data Processing:

Redshift divides work among multiple servers (computers) to
process data quickly.
2. Column-Based Storage:
Instead of storing data row by row, it stores data in columns. This
makes it faster to find and analyze data.
3. Scalable:
You can start small and add more storage or processing power as
your data grows.
4. SQL Support:
You can use SQL (a standard language for databases) to query and
analyze your data.
5. Integration with AWS Tools:
Works well with other AWS services like S3 (cloud storage) and
Glue (data preparation).
6. Cost-Effective:
You only pay for what you use, and it offers ways to save money by
optimizing storage and computing.
7. Data Security:
Your data is encrypted (protected) when it's being stored and
transferred.

Advantages :

− Easy to Use: You don’t need to manage servers; AWS handles that
for you.
− Faster Analytics: Optimized for analyzing big datasets quickly.
− Flexible Scaling: Adjust resources (storage and compute) as needed.
− Affordable: Pay only for what you use, and store large amounts of
data cost-effectively.

Disadvantage :

− Complex Queries Can Slow Down: Very complex queries or too

many users may affect performance.
− Costs Can Add Up: Without proper monitoring, costs can increase
for large-scale usage.
− No Real-Time Processing: It’s designed for analyzing data in bulk,
not for real-time applications.
• various cloud based tools used for data science in ML – GCP BigQuery.
various cloud based tools for data science are :
1) Amazon sagemaker : A fully managed service to build, train, and deploy
machine learning models.
2) AWS Lambda: Can run ML inference tasks triggered by events without
managing servers. For serverless computing.
3) Amazon Redshift: A fast data storage and analytics tool for large datasets.
4) Azure Machine Learning: A platform for training and deploying ML
models with simple tools.
5) BigQuery: A tool for fast data analytics, with built-in ML capabilities.

GCP BigQuery :
− BigQuery is a fully-managed, serverless, and highly scalable data
warehouse built on Google Cloud Platform (GCP) for performing fast
and SQL-based queries on massive datasets.
− It's widely used in data science and machine learning (ML) for storing,
analyzing, and managing large datasets quickly and efficiently.

key features :

1. Serverless and Scalable:

o Serverless means you don't need to manage infrastructure (no
servers to configure or scale).
o BigQuery automatically scales resources based on the size of your
data and the complexity of your queries.
2. Fast Querying:
o BigQuery is optimized for fast querying over large datasets, thanks to
its columnar storage format and distributed architecture.
o It supports SQL queries, which makes it accessible for data scientists
familiar with SQL.
3. Fully Managed:
o No need to worry about data backups, updates, or hardware failures.
Google handles everything, making it easy to focus on analytics.
4. Integrated with Google Cloud Tools:
o BigQuery seamlessly integrates with other Google Cloud services
like Google Cloud AI, Google Cloud Storage, and Google Cloud
Dataproc for advanced analytics and ML.

workflow :
1. Data Storage
o Data is stored in datasets, which are collections of tables.
o Tables contain data organized in columns and rows (like a typical
relational database).
o BigQuery stores data in the columnar format, making it fast for
analytical queries.
2. Loading Data
o You can load data into BigQuery from multiple sources like
Google Cloud Storage (GCS), Google Sheets, or external
databases.
o It supports both batch loading (uploading large amounts of data
at once) and streaming (real-time data insertion).
3. Running Queries
o Use SQL-like queries to interact with data stored in BigQuery.
o It uses distributed computing to run queries in parallel,
speeding up the processing time for large datasets.
4. Query Optimization
o BigQuery automatically optimizes queries by determining the
most efficient way to execute them, which minimizes the need
for manual tuning.
5. Storage and Pricing
o BigQuery has two main pricing components:
1. Storage: Charges based on the amount of data stored.
2. Queries: Charges based on the amount of data processed
when executing queries.
o The cost of querying is calculated by the amount of data read
during query execution, not the number of queries.

BigQuery for Data Science and ML

1. Data Storage:
BigQuery stores huge amounts of data (like millions of customer
records) in an organized way (tables and columns).
2. Data Querying:
You can query that data using SQL, so if you want to see which
customers are most likely to leave, you can write a simple SQL query
to filter and analyze the data.
3. Machine Learning Models:
Instead of exporting the data to a separate tool, you can build
machine learning models directly in BigQuery using BigQuery ML.
For example, you can predict customer behavior based on past data
using a simple SQL query.
4. Integration with Other Tools:
BigQuery can send and receive data from other Google Cloud tools
like Google Cloud Storage (for storing data), TensorFlow (for deep
learning), and Google AI (for additional machine learning models),
making it a central hub for your data science and ML needs.

Advantages :

1. Speed:
BigQuery’s architecture is designed to quickly analyze and process
large datasets, saving time during data exploration, model training,
and evaluation.
2. Cost-Effective:
BigQuery uses a pay-as-you-go pricing model, meaning you only
pay for the data you query. This is useful for data scientists as they
can run analysis and ML tasks without paying upfront for storage or
processing.
3. Seamless Integration:
BigQuery works well with other cloud-based tools and machine
learning frameworks, providing a seamless workflow for data science
projects.
4. Ease of Use:
BigQuery uses standard SQL, which is a familiar language for data
analysts and data scientists. The built-in BigQuery ML feature
makes machine learning accessible without requiring extensive
coding.
5. Scalable:
As your data grows, BigQuery automatically scales to handle larger
datasets, making it ideal for enterprises or applications with massive
data storage and processing needs.

Use cases : customer segmentation , real time analytics , large scale

data processing , predictive analytics

Carol Yacht Matthew Lowenkron: Computer Accounting With Quickbooks Online: A Cloud-Based Approach
0% (1)
Carol Yacht Matthew Lowenkron: Computer Accounting With Quickbooks Online: A Cloud-Based Approach
2 pages
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
From Everand
Google Cloud Platform for Data Engineering: From Beginner to Data Engineer using Google Cloud Platform
alasdair gilchrist
5/5 (1)
Cloud Migration Services Soo Template For Phase 4 Final 20120920
No ratings yet
Cloud Migration Services Soo Template For Phase 4 Final 20120920
11 pages
Unit3 - Cloud Data Storage
No ratings yet
Unit3 - Cloud Data Storage
7 pages
CCD UNIT 3
No ratings yet
CCD UNIT 3
8 pages
BDA Module-1
No ratings yet
BDA Module-1
9 pages
Fundamentals of Big Data and Business Analytics
No ratings yet
Fundamentals of Big Data and Business Analytics
6 pages
BIG DATA Notes
No ratings yet
BIG DATA Notes
11 pages
Unit 3 Big Data
No ratings yet
Unit 3 Big Data
31 pages
21CS71-SOLUTIONS
No ratings yet
21CS71-SOLUTIONS
24 pages
21CS71-SOLUTIONS
No ratings yet
21CS71-SOLUTIONS
24 pages
Seminar_Report kiran
No ratings yet
Seminar_Report kiran
14 pages
ACC IT APP MIdterm Bigdata
No ratings yet
ACC IT APP MIdterm Bigdata
12 pages
dwh
No ratings yet
dwh
34 pages
Module 1.ppt
No ratings yet
Module 1.ppt
29 pages
Big Data Question Bank
No ratings yet
Big Data Question Bank
38 pages
DSBDA EndSem2023 12F FlyHigh
No ratings yet
DSBDA EndSem2023 12F FlyHigh
20 pages
3 Assignment
No ratings yet
3 Assignment
5 pages
ucPDF (14)
No ratings yet
ucPDF (14)
10 pages
Hadoop Report
No ratings yet
Hadoop Report
110 pages
Data Analytics in Cloud (Bragas, Romo) Data Analytics
No ratings yet
Data Analytics in Cloud (Bragas, Romo) Data Analytics
9 pages
BCE Report
No ratings yet
BCE Report
14 pages
Bda Test1 Key Answers
No ratings yet
Bda Test1 Key Answers
7 pages
BDA
No ratings yet
BDA
52 pages
Big Data - Cloud - AI
No ratings yet
Big Data - Cloud - AI
45 pages
Hadoop - MapReduce
No ratings yet
Hadoop - MapReduce
51 pages
Presentation 20
No ratings yet
Presentation 20
31 pages
A Comprehensive Meta Model For The
No ratings yet
A Comprehensive Meta Model For The
61 pages
Advanced Manufacturing Process Analysis (Course 4)-Key Takeaways
No ratings yet
Advanced Manufacturing Process Analysis (Course 4)-Key Takeaways
4 pages
Hadoop & BigData (UNIT - 2)
No ratings yet
Hadoop & BigData (UNIT - 2)
22 pages
Data Cube on Cloud Computing
No ratings yet
Data Cube on Cloud Computing
10 pages
Big Data Analytics - Project
50% (2)
Big Data Analytics - Project
27 pages
iot copy
No ratings yet
iot copy
17 pages
Team 8, Monika Kashyap, 085
No ratings yet
Team 8, Monika Kashyap, 085
11 pages
Unit 5
No ratings yet
Unit 5
68 pages
Simple Data Infrastructure Using Google Cloud B09K5R1CW2
No ratings yet
Simple Data Infrastructure Using Google Cloud B09K5R1CW2
47 pages
155928-Turn Big Data
No ratings yet
155928-Turn Big Data
8 pages
C - B D A - A S C R F D: Loud Based IG ATA Nalytics Urvey of Urrent Esearch and Uture Irections
No ratings yet
C - B D A - A S C R F D: Loud Based IG ATA Nalytics Urvey of Urrent Esearch and Uture Irections
12 pages
Two Marks BDA
No ratings yet
Two Marks BDA
15 pages
UNIT II (1) (1)
No ratings yet
UNIT II (1) (1)
20 pages
Ds Notes
No ratings yet
Ds Notes
88 pages
Big Data Technology Report With Pages Removed
No ratings yet
Big Data Technology Report With Pages Removed
32 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
124 pages
Big Data Unit 1 Notes - 240311 - 100703
No ratings yet
Big Data Unit 1 Notes - 240311 - 100703
15 pages
Unit 1
No ratings yet
Unit 1
11 pages
Imp Answers CCD Ut
No ratings yet
Imp Answers CCD Ut
14 pages
1_introduction_to_big_data_management_and_processing
No ratings yet
1_introduction_to_big_data_management_and_processing
46 pages
Big Data Analytics
100% (1)
Big Data Analytics
14 pages
Big Data
0% (1)
Big Data
2 pages
big data unit 1
No ratings yet
big data unit 1
24 pages
Chapter Two Data Science: by Abdulaziz Oumer
No ratings yet
Chapter Two Data Science: by Abdulaziz Oumer
29 pages
40833 OR
No ratings yet
40833 OR
29 pages
Assignment - Fundamentals of Big Data and Business Analytics
No ratings yet
Assignment - Fundamentals of Big Data and Business Analytics
9 pages
2 emerging
No ratings yet
2 emerging
10 pages
BDA QB Answers 8 To 15
No ratings yet
BDA QB Answers 8 To 15
18 pages
Data Mining 1
No ratings yet
Data Mining 1
13 pages
Big Data PDF
No ratings yet
Big Data PDF
18 pages
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
No ratings yet
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
4 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
Database Management System
From Everand
Database Management System
Knowledge Flow
No ratings yet
Huawei Fusionsphere Vs Vmware Vsphere PDF
No ratings yet
Huawei Fusionsphere Vs Vmware Vsphere PDF
8 pages
Development of Halal Pharmaceuticals Traceability Systems For Used With Mobile Devices
No ratings yet
Development of Halal Pharmaceuticals Traceability Systems For Used With Mobile Devices
5 pages
Iot - MCQ
No ratings yet
Iot - MCQ
16 pages
Magnum HIS
No ratings yet
Magnum HIS
9 pages
Chaptar 1 PDF
No ratings yet
Chaptar 1 PDF
15 pages
Veeam Backup 12 0 Whats New
No ratings yet
Veeam Backup 12 0 Whats New
32 pages
KMS Solutions - Infor WMS - SCE - Overview Final PDF
No ratings yet
KMS Solutions - Infor WMS - SCE - Overview Final PDF
40 pages
Hpe Proliant Gen11
No ratings yet
Hpe Proliant Gen11
20 pages
Accenture Latest Syllabus & Pattern 2023-2024 - Talent Battle
No ratings yet
Accenture Latest Syllabus & Pattern 2023-2024 - Talent Battle
7 pages
CC&BD Unit 2
No ratings yet
CC&BD Unit 2
26 pages
IBM Planning Analytics With Watson
No ratings yet
IBM Planning Analytics With Watson
7 pages
Amit Kumar Resume Java
No ratings yet
Amit Kumar Resume Java
3 pages
Smart Manufacturing For Next-Generation Enterprises
No ratings yet
Smart Manufacturing For Next-Generation Enterprises
10 pages
HAND GESTURE MAGIC Capstone Project Repo
No ratings yet
HAND GESTURE MAGIC Capstone Project Repo
35 pages
Rama Narasimha Swamy (1)
No ratings yet
Rama Narasimha Swamy (1)
1 page
Prisma Cloud IAM Security - Assessment
No ratings yet
Prisma Cloud IAM Security - Assessment
5 pages
Self Destruction
No ratings yet
Self Destruction
90 pages
(FREE PDF Sample) Machine Learning Governance For Managers 1st Edition Francesca Lazzeri Alexei Robsky Ebooks
100% (2)
(FREE PDF Sample) Machine Learning Governance For Managers 1st Edition Francesca Lazzeri Alexei Robsky Ebooks
79 pages
Be Confident. Be Compliant. Be Certain.: Automate
No ratings yet
Be Confident. Be Compliant. Be Certain.: Automate
2 pages
License Summary - 2023-05-18-17-39-36
No ratings yet
License Summary - 2023-05-18-17-39-36
2 pages
AWS Cloud Architect: Nanodegree Program Syllabus
No ratings yet
AWS Cloud Architect: Nanodegree Program Syllabus
14 pages
National Bank of Pakistan: Human Resource Management Group
No ratings yet
National Bank of Pakistan: Human Resource Management Group
5 pages
Venture Dive Profile
No ratings yet
Venture Dive Profile
20 pages
Cloud Security
No ratings yet
Cloud Security
2 pages
Az-305 0
No ratings yet
Az-305 0
33 pages
Dell SWOT Analysis
No ratings yet
Dell SWOT Analysis
10 pages
Nur Muhamad Assignment 5
No ratings yet
Nur Muhamad Assignment 5
3 pages
Unit 4 Databases, Cloud & Snowflake: Prof. Thushara Weerawardane
No ratings yet
Unit 4 Databases, Cloud & Snowflake: Prof. Thushara Weerawardane
50 pages

CCD chapter 3 notes

Uploaded by

CCD chapter 3 notes

Uploaded by

CCD chapter 3

• cloud data governance

• key value databases

• cloud data warehouse – AWS redshift

1. Fast Data Processing:

− Complex Queries Can Slow Down: Very complex queries or too

1. Serverless and Scalable:

BigQuery for Data Science and ML

Use cases : customer segmentation , real time analytics , large scale

You might also like