0% found this document useful (0 votes)

46 views64 pages

Exam Topics - PDE - Questions-7w1dhd9jefy8p8w9ucpjurqidy

The document outlines various Google Cloud services and their corresponding use cases based on specific keywords. It includes services like Cloud Bigtable for IoT data, BigQuery for analytics, and Cloud Spanner for relational databases with ACID guarantees. Additionally, it discusses data processing tools like Dataproc and Dataflow, as well as storage options such as multi-regional and nearline storage.

Uploaded by

Kasani Ravikumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views64 pages

Exam Topics - PDE - Questions-7w1dhd9jefy8p8w9ucpjurqidy

Uploaded by

Kasani Ravikumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Important Keywords and Answers:

IF any question has this keywords , then we can conclude this as answer

keywords --- answer

Google Stackdriver Monitoring → Ans : performance NOT missing data

IOT, Sensor, clicks, low latency,hbase api,ingest large volumes -- Ans : Cloud Bigtable

warehouse,analytics,sql ---- Ans : Bigquery

Sql, relational database, multiple regions, transactions, ACID(atomicity, consistency, isolation, and
durability ) guarantee, global ( all regional names listed) ,txn tables scale horizontally -- Cloud
spanner

sql,relational database, transaction,postgresql single or particular region specified ,upto 30TB

stroage– Cloud SQL

spark,hadoop -- cloud data proc

messaging service,queuing service,subscription --- cloud pub/sub

images and container -- cloud build

data transformation pipeline -- cloud data fusion

quick checks,within a minute , one second -- cloud functions

ruby, minimal code, serverless -- App engine

backend, app mobile -- firestore

analytics and ML - cloud dataprep

sdk,batch & stream processing - cloud dataflow

data access high availability in multi regional /global - multi regional storage

data access in single region - regional storage

data access once per month / once per 30 days - nearline line storage

data access once per quarter/ 90 days - coldline

data should be stored permanently in storage but accessing is rare - Archive storage
AutoML -> custom specific models trained for specific use case.

INTERNAL USE
Cloud Vision API -> pre-trained models to detect labels, faces, words.

INTERNAL USE
INTERNAL USE
INTERNAL USE
INTERNAL USE
Stackdriver is used to track access logs for Bigquery

Two key points: Persist data beyond the life of the cluster → (GC) storage
Managed hadoop cluster - dataproc
Persistent storage: GCS (dataproc uses gcs connector to connect to gcs)

Description: Dataproc is used to migrate Hadoop and Spark jobs on GCP. Dataproc with GCS connected through
Google Cloud Storage connector helps store data after the life of the cluster. When the job is high I/O intensive, then
we need to create a small persistent disk.

INTERNAL USE
Ans: B,C,D

B - Not labelled as Fraud or not. So Unsupervised.

C - Clustering can be done based on location, amount etc.
D - Location is already given. So labelled. Hence supervised.

First rule of dataproc is to keep data in GCS.

dataproc - storage - cost effective is cloud storage

The custom endpoint is not acknowledging the message, that is the reason for Pub/Sub to send the message again
and again. When you do not acknowledge a message before its acknowledgement deadline has expired, Pub/Sub
resends the message. As a result, Pub/Sub can send duplicate messages.

INTERNAL USE
Datalab before it get deprecated now Vertex AI

INTERNAL USE
Keywords : API , project sink

INTERNAL USE
reading only relevant cols

perform analytics → BigQuery

INTERNAL USE
Stack driver could tell us about performance but not logging of missing data.

Apache Spark is faster than Hadoop/Pig/MapReduce

SPARK > hadoop, pig, hive

INTERNAL USE
INTERNAL USE
ML →BQ

Keywords → gsutil , TAR

INTERNAL USE
INTERNAL USE
INTERNAL USE
Hadoop/Spark jobs are run on Dataproc, and the pre-emptible machines cost 80% less

INTERNAL USE
INTERNAL USE
A & B - Need to build your own model, so discarded as options.
C or D can do the job here using Cloud Video Intelligence API. BigTable is better option. So C is correct.
IoT – Bigtable

Answer: C - best suitable for the purpose with autoscaling and google recommended transform engine between
pubsub & bq

INTERNAL USE
Entity analysis -> Identify entities within documents receipts, invoices, and contracts and label them by types such
as date, person, contact information, organization, location, events, products, and media.

Sentiment analysis -> Understand the overall opinion, feeling, or attitude sentiment expressed in a block of text. --
Avoid Custom models

INTERNAL USE
Spanner allows transaction tables to scale horizontally and secondary indexes for range queries.

BigTable can take in data from dataproc, spark and Hadoop.

hbase api,ingest large volumes -- Ans : Cloud Bigtable Apache/Hadoop → BigTable

The link on authorized views (https://cloud.google.com/bigquery/docs/share-access-views) explicitly states

"Authorized views should be created in a different dataset from the source data. That way, data owners can give
users access to the authorized view without simultaneously granting access to the underlying data." therefore B is
the correct answer because we are to create a new dataset and view within that dataset.

INTERNAL USE
By SubSampling the training data, you will reduce the training time. In case of D, if you increase the
number of layers, then the model's accuracy will be increased. But it will not reduce the time required to
train the model.

Speed of data transfer depends on Bandwidth

INTERNAL USE
INTERNAL USE
INTERNAL USE
Cloud SQL cheap and relational DB.

INTERNAL USE
Cloud Dataprep by Trifacta is an intelligent data service for visually exploring, cleaning, and preparing structured and
unstructured data for analysis, reporting, and machine learning.

INTERNAL USE
? A- 56% and C – 44%

INTERNAL USE
highly available = multi-regional
recovery strategy of this data that minimizes cost = point-in-time snapshot

Since we do not know when the load job will finish, we cannot use a fixed scheduler or cron job. With composer
we can define logic and dependencies to first check if the load job has finished and then run the dataprep job.
Dataprep can be run on Dataflow using template and cloud composer will create dependency on previous job. For
dependency creation the only valid option from below is Cloud Composer (Apache Airflow). The Cloud Dataprep
job when it executes creates a dataflow template which is stored in GCS. The same can be exported from there
and used in creating the workflow in Cloud Composer.

INTERNAL USE
Managed Service - Cloud Composer

INTERNAL USE
Cloud Composer is a fully managed workflow orchestration service, enabling you to create, schedule, monitor, and
manage workflows that span across clouds and on-premises data centers.

Add a ParDo transform in Cloud Dataflow to discard corrupt elements

INTERNAL USE
INTERNAL USE
Cloud Composer is a fully managed workflow orchestration service, enabling you to create, schedule, monitor, and
manage workflows that span across clouds and on-premises data centers.

By creating an authorized view one assures that the data is current and avoids taking more storage space (and
cost) in order to share a dataset. B and D are not cost optimal and C does not guarantee that the data is kept
updated.

The table is already partitioned with ingestion date.So package-tracking ID

INTERNAL USE
Dataflow → streaming and batch . Dataproc →Hadoop

.A Good ROW KEY has to be an ID followed by timestamp. Stock symbol in this case works as an ID

INTERNAL USE
subscription, increase – decrease

Alterative to Kafka in google cloud native service is Pub/Sub and Dataflow punched with Pub/Sub is the google
recommended option

INTERNAL USE
Denormalization will help in performance by reducing query time.Append has better performance than update.

Multi-region increases high availability and pdf can be stored in gcs

INTERNAL USE
This is a case of underfitting - not overfitting (for over fitting the model will have extremely low training error but a
high testing error) - so we need to make the model more complex

INTERNAL USE
INTERNAL USE
Bigtable provides lowest latency. requirement to serve predictions within 100 ms.

INTERNAL USE
instance n1-standard-1 is low configuration and hence need to be larger configuration, definitely B should be one of the option.
Increase max workers will increase parallelism and hence will be able to process faster given larger CPU size and multi core
processor instance type is chosen. Option A can be a better step.

INTERNAL USE
"The maximum number of Compute Engine instances to be made available to your pipeline during execution. Note that this can
be higher than the initial number of workers (specified by num_workers to allow your job to scale up, automatically or
otherwise." "Adding nodes to the original cluster: You can add 3 nodes to the cluster, for a total of 6 nodes. The write
throughput for the instance doubles, but the instance's data is available in only one zone:"

INTERNAL USE
Aggregated log sink will create a single sink for all projects, the destination can be a google cloud storage, pub/sub
topic, bigquery table or a cloud logging bucket. without aggregated sink this will be required to be done for each
project individually which will be cumbersome.

INTERNAL USE
Transfer Appliance for moving offline data, large data sets, or data from a source with limited bandwidth. Transfer
Appliance is a high-capacity storage device that enables you to transfer and securely ship your data to a Google
upload facility, where we upload your data to Cloud Storage.

Vote for 'A', because of requirement - Enabling non-developer analysts to modify transformations.
Dataprep by Trifacta is an intelligent data service for visually exploring, cleaning, and preparing structured and
unstructured data for analysis, reporting, and machine learning. Because Dataprep is serverless and works at any
scale, there is no infrastructure to deploy or manage. Your next ideal data transformation is suggested and
predicted with each UI input, so you don’t have to write code.

INTERNAL USE
AutoML -> custom specific models trained for specific use case.
Cloud Vision API -> pre-trained models to detect labels, faces, words.

INTERNAL USE
It is now feasible to provide table level access to user by allowing user to query single table and no other table will
be visible to user in same dataset.

For I/O intensive jobs, increasing the disk size resolves the issue.

INTERNAL USE
Geospatial and ML functionality is with bigquery.

Reasons:-
a) Kafka IO and Dataflow is a valid option for interconnect (needless where Kafka is located - On Prem/Google
Cloud/Other cloud)
b) Sliding Window will help to calculate average.

INTERNAL USE
These are the functionalities which are currently lagging/not-available with Pub/Sub. Pub sub can retain message
only for 31 days max.

Ask for cost effective so persistent disk are HDD which are cheaper in comparison to SSD.

INTERNAL USE
If you create a Dataproc cluster with internal IP addresses only, attempts to access the Internet in an initialization
action will fail unless you have configured routes to direct the traffic through a NAT or a VPN gateway. Without
access to the Internet, you can enable Private Google Access, and place job dependencies in Cloud Storage; cluster
nodes can download the dependencies from Cloud Storage from internal IPs.

It specifically asks for scaling up which can be done in Cloud SQL and can be queried using SQL.
Cloud SQL continues to add storage until it reaches the maximum of 30 TB.

Cloud SQL (30TB)

INTERNAL USE
A tall and narrow table has a small number of events per row, which could be just one event, whereas a short and
wide table has a large number of events per row. As explained in a moment, tall and narrow tables are best suited
for time-series data. For time series, you should generally use tall and narrow tables. This is for two reasons:
Storing one event per row makes it easier to run queries against your data. Storing many events per row makes it
more likely that the total row size will exceed the recommended maximum (see Rows can be big but are not
infinite).

AAD is used to decrypt the data so better to keep it outside GCP for safety

INTERNAL USE
Monitoring does not only provide you with access to Dataflow-related metrics, but also lets you to create alerting
policies and dashboards so you can chart time series of metrics and choose to be notified when these metrics
reach specified values.

ACID compliance for Spanner.

Spanner supports read-write transactions for use cases, as handling bank transactions.

INTERNAL USE
Clustered tables in BigQuery are tables that have a user-defined column sort order using clustered columns.
Clustered tables can improve query performance and reduce query costs. table has already created with ingest-
date partitioning.

When you want to move your Apache Spark workloads from an on-premises environment to Google Cloud, we
recommend using Dataproc to run Apache Spark/Apache Hadoop clusters. Dataproc is a fully managed, fully
supported service offered by Google Cloud. It allows you to separate storage and compute, which helps you to
manage your costs and be more flexible in scaling your workloads.
Migrating Hive data from your on-premises or other cloud-based source cluster to BigQuery has two steps: 1.
Copying data from a source cluster to Cloud Storage 2. Loading data from Cloud Storage into BigQuery.

INTERNAL USE
The Seek feature extends subscriber functionality by allowing you to alter the acknowledgement state of
messages in bulk. For example, you can replay previously acknowledged messages or purge messages in
bulk. In addition, you can copy the state of one subscription to another by using seek in combination
with a Snapshot.

INTERNAL USE
Using the TRANSFORM clause, you can specify all preprocessing during model creation. The preprocessing is
automatically applied during the prediction and evaluation phases of machine learning.

moving average → sliding window

INTERNAL USE
TUMBLE=> fixed windows. A tumbling window represents a consistent, disjoint time interval in the data stream.
HOP=> sliding windows.
SESSION=> session windows.

A partitioned table is a special table that is divided into segments, called partitions, that make it easier to manage
and query your data. By dividing a large table into smaller partitions, you can improve query performance, and you
can control costs by reducing the number of bytes read by a query.

INTERNAL USE
INTERNAL USE
Keywords: You want to ensure that the sensitive data is masked but still maintains referential integrity.
Part1- data is masked-Create a pseudonym by replacing PII data with a cryptographic token.
Part 2-still maintains referential integrity- with a cryptographic format-preserving token.

Denormalization is a common strategy for increasing read performance for relational datasets that were previously
normalized. The recommended way to denormalize data in BigQuery is to use nested and repeated fields. It's best
to use this strategy when the relationships are hierarchical and frequently queried together, such as in parent-child
relationships.

if the call takes on average 1 sec, that would cause massive backpressure on the pipeline. In these circumstances
you should consider batching these requests, instead.

INTERNAL USE
The gsutil tool is the standard tool for small- to medium-sized transfers (less than 1 TB) over a typical enterprise-
scale network, from a private data center to Google Cloud.

INTERNAL USE
While your job is running, you might encounter errors or exceptions in your worker code. These errors generally
mean that the DoFns in your pipeline code have generated unhandled exceptions, which result in failed tasks in
your Dataflow job. Exceptions in user code (for example, your DoFn instances) are reported in the Dataflow
monitoring interface.

INTERNAL USE
Cloud Storage as restricted API

INTERNAL USE
When you create a Cloud Spanner instance, you must configure it as either regional (that is, all the resources are
contained within a single Google Cloud region) or multi-region (that is, the resources span more than one region).
You can change the instance configuration to multi-regional (or global) at anytime.

Like gsutil, Storage Transfer Service for on-premises data enables transfers from network file system (NFS) storage
to Cloud Storage. Although gsutil can support small transfer sizes (up to 1 TB), Storage Transfer Service for on-
premises data is designed for large-scale transfers (up to petabytes of data, billions of files).

INTERNAL USE
There 2 parts and they are relevant to each other 1. Overfit is fixed by decreasing the number of input features
(select only essential features) 2. Accuracy is improved by increasing the amount of training data examples.

Dialogflow is a natural language understanding platform that makes it easy to design and integrate a
conversational user interface into your mobile app, web application, device, bot, interactive voice response
system, and so on. Using Dialogflow, you can provide new and engaging ways for users to interact with your
product. Dialogflow can analyze multiple types of input from your customers, including text or audio inputs (like
from a phone or voice recording). It can also respond to your customers in a couple of ways, either through text or
with synthetic speech.

INTERNAL USE
The source is a proprietary format. Dataflow wouldn't have a built-in template to read the file. You will have to
create something custom.

INTERNAL USE
We exclude [C] as non ACID and [D] for being invalid (location is configured on Dataset level, not Table). Then, let's
focus on "minimal human intervention in case of a failure" requirement in order to eliminate one answer among
[A] and [B]. Basically, we have to compare point-in-time recovery with high availability. It doesn't matter whether
it's about MySQL or PostgreSQL since both databases support those features. - Point-in-time recovery logs are
created automatically, but restoring an instance in case of failure requires manual steps - High availability, in case
of failure requires no human intervention: "If an HA-configured instance becomes unresponsive, Cloud SQL
automatically switches to serving data from the standby instance.

Shared VPC enables organizations to establish budgeting and access control boundaries at the project level while
allowing for secure and efficient communication using private IPs across those boundaries. In the Shared VPC
configuration, Cloud Composer can invoke services hosted in other Google Cloud projects in the same organization
without exposing services to the public internet. Shared VPC requires that you designate a host project to which
networks and subnetworks belong and a service project, which is attached to the host project.

INTERNAL USE
In BigQuery, materialized views are precomputed views that periodically cache the results of a query for increased
performance and efficiency. BigQuery leverages precomputed results from materialized views and whenever
possible reads only delta changes from the base tables to compute up-to-date results. Materialized views can be
queried directly or can be used by the BigQuery optimizer to process queries to the base tables. Queries that use
materialized views are generally faster and consume fewer resources than queries that retrieve the same data only
from the base tables. Materialized views can significantly improve the performance of workloads that have the
characteristic of common and repeated queries.

INTERNAL USE
INTERNAL USE
INTERNAL USE

Google Cloud Solutions & Labs Guide
No ratings yet
Google Cloud Solutions & Labs Guide
3 pages
GCP Notes For Certification
No ratings yet
GCP Notes For Certification
24 pages
Finding Employee SSN in BigQuery Datasets - 05032025
No ratings yet
Finding Employee SSN in BigQuery Datasets - 05032025
2 pages
GCP Data
No ratings yet
GCP Data
6 pages
GCP Technologies
No ratings yet
GCP Technologies
12 pages
Exam Overview: GCP Data Engineer
100% (1)
Exam Overview: GCP Data Engineer
12 pages
Associate Cloud Engineer - Session 5
No ratings yet
Associate Cloud Engineer - Session 5
119 pages
Professional Cloud Architect
No ratings yet
Professional Cloud Architect
255 pages
Exam Overview: GCP Data Engineer
100% (5)
Exam Overview: GCP Data Engineer
12 pages
? What Is Big Data
No ratings yet
? What Is Big Data
14 pages
Dataengieer
No ratings yet
Dataengieer
23 pages
GCP Fund Module 8 Big Data and Machine Learning in The Cloud
No ratings yet
GCP Fund Module 8 Big Data and Machine Learning in The Cloud
41 pages
4.4 - Managed Services
No ratings yet
4.4 - Managed Services
17 pages
Google Cloud Fund M8 Big Data and Machine Learning in The Cloud
No ratings yet
Google Cloud Fund M8 Big Data and Machine Learning in The Cloud
44 pages
ACE Sim 03 PDF
No ratings yet
ACE Sim 03 PDF
32 pages
File Module 5 - en - en
No ratings yet
File Module 5 - en - en
16 pages
Pca1 HTML
No ratings yet
Pca1 HTML
66 pages
GCP Ace Questions
No ratings yet
GCP Ace Questions
43 pages
Associate Cloud Engineer - Study Notes
No ratings yet
Associate Cloud Engineer - Study Notes
14 pages
GCP Data Storage & BigQuery Guide
No ratings yet
GCP Data Storage & BigQuery Guide
15 pages
GCP CDE Services
No ratings yet
GCP CDE Services
2 pages
Notes GCP PCA Preparation
No ratings yet
Notes GCP PCA Preparation
7 pages
Google Cloud Storage Solutions
No ratings yet
Google Cloud Storage Solutions
69 pages
GCP Fund Module 8 Big Data and Machine Learning in The Cloud Coursera
No ratings yet
GCP Fund Module 8 Big Data and Machine Learning in The Cloud Coursera
38 pages
06 Sample Exam Questions
No ratings yet
06 Sample Exam Questions
79 pages
GCP Storage Database CheatSheet
No ratings yet
GCP Storage Database CheatSheet
2 pages
BDA Unit 2 1
No ratings yet
BDA Unit 2 1
42 pages
Ace3 HTML
No ratings yet
Ace3 HTML
41 pages
PCA Exam
No ratings yet
PCA Exam
19 pages
GCP - DataPlex - Building A Data Lakehouse
No ratings yet
GCP - DataPlex - Building A Data Lakehouse
19 pages
Associate Cloud Engineer - 8
No ratings yet
Associate Cloud Engineer - 8
22 pages
Google - Cloud Digital Leader.v2023 06 22.q106
No ratings yet
Google - Cloud Digital Leader.v2023 06 22.q106
50 pages
Google Cloud Data Platform & Services: Gregor Hohpe
No ratings yet
Google Cloud Data Platform & Services: Gregor Hohpe
35 pages
Data Engineering Notes
No ratings yet
Data Engineering Notes
11 pages
11 Managed Services
No ratings yet
11 Managed Services
25 pages
GCP - Architect Certification 002 Flashcards - Quizlet
No ratings yet
GCP - Architect Certification 002 Flashcards - Quizlet
67 pages
Cloud Architect Practice Exam Guide
No ratings yet
Cloud Architect Practice Exam Guide
28 pages
Prakash, Chandra - Google Cloud Professional Data Engineer Practice Tests 2019 - GCP Data Engineer Dumps 2019. 100 - Unconditional Pass Guarantee Ex (2019, 万千书友聚集地) - Libgen.li
No ratings yet
Prakash, Chandra - Google Cloud Professional Data Engineer Practice Tests 2019 - GCP Data Engineer Dumps 2019. 100 - Unconditional Pass Guarantee Ex (2019, 万千书友聚集地) - Libgen.li
141 pages
GCP Storage Compute
No ratings yet
GCP Storage Compute
378 pages
Professional Cloud Architect - 6
No ratings yet
Professional Cloud Architect - 6
10 pages
Examtopics Dumps
No ratings yet
Examtopics Dumps
174 pages
Module 4
No ratings yet
Module 4
14 pages
Data-Intensive Computing
No ratings yet
Data-Intensive Computing
88 pages
Google Cloud Architect Exam Guide
0% (1)
Google Cloud Architect Exam Guide
138 pages
Professional Cloud Architect Exam - Free Actual Q&As, Page 1 - ExamTopics
No ratings yet
Professional Cloud Architect Exam - Free Actual Q&As, Page 1 - ExamTopics
4 pages
(English (Auto-Generated) ) (Cloud Forum) Understanding BigQuery - Use Cases and Best Practices (DownSub - Com)
No ratings yet
(English (Auto-Generated) ) (Cloud Forum) Understanding BigQuery - Use Cases and Best Practices (DownSub - Com)
42 pages
Leveraging Cloud-Native Data Engineering For Big Data Analytics-1972
No ratings yet
Leveraging Cloud-Native Data Engineering For Big Data Analytics-1972
10 pages
Internship
No ratings yet
Internship
17 pages
Sertif GCP
No ratings yet
Sertif GCP
177 pages
Google Passguide Cloud-Digital-Leader Actual Test 2023-Jul-21 by Marcus 91q Vce
100% (2)
Google Passguide Cloud-Digital-Leader Actual Test 2023-Jul-21 by Marcus 91q Vce
29 pages
Data Engineering Essentials
No ratings yet
Data Engineering Essentials
61 pages
Hadoop: What Is Data Engineering? Hadoop Overview Hadoop Ecosystem
No ratings yet
Hadoop: What Is Data Engineering? Hadoop Overview Hadoop Ecosystem
9 pages
Web-Based Collaborative Big Data Analytics On Big Data As A Service Platform
No ratings yet
Web-Based Collaborative Big Data Analytics On Big Data As A Service Platform
4 pages
2025 04 Power Bi On Databricks Best Practices Cheat Sheet
No ratings yet
2025 04 Power Bi On Databricks Best Practices Cheat Sheet
1 page
Google Cloud Platform Migration & Optimization Guide
No ratings yet
Google Cloud Platform Migration & Optimization Guide
4 pages
Google Cloud Developer Cheat Sheet
No ratings yet
Google Cloud Developer Cheat Sheet
17 pages
Professional Cloud Architect Certification Exam Questions
No ratings yet
Professional Cloud Architect Certification Exam Questions
29 pages
MySQL Restricting and Sorting Data - Exercises, Practice, Solution
No ratings yet
MySQL Restricting and Sorting Data - Exercises, Practice, Solution
26 pages
DBMS Unit 4 R22
No ratings yet
DBMS Unit 4 R22
7 pages
Problem Statement
No ratings yet
Problem Statement
14 pages
IGL 7.2.1 Database Setup and Management Guide
No ratings yet
IGL 7.2.1 Database Setup and Management Guide
35 pages
Chapter 3 - Review of Database Concepts and SQL - PPT
No ratings yet
Chapter 3 - Review of Database Concepts and SQL - PPT
121 pages
SQL Commands
No ratings yet
SQL Commands
7 pages
Lab Report 402 Lab6 1
No ratings yet
Lab Report 402 Lab6 1
14 pages
L4 It Record
No ratings yet
L4 It Record
21 pages
DISA Data Lifecycle Management Guidebook FINAL
No ratings yet
DISA Data Lifecycle Management Guidebook FINAL
29 pages
1 DBMS Practical Data
No ratings yet
1 DBMS Practical Data
10 pages
Introduction to Database Management Systems
No ratings yet
Introduction to Database Management Systems
79 pages
Shard Manager
No ratings yet
Shard Manager
17 pages
DBMS Workbook
No ratings yet
DBMS Workbook
30 pages
SecureSphere Database Monitoring Guide
No ratings yet
SecureSphere Database Monitoring Guide
20 pages
Additional Q2 Questions
No ratings yet
Additional Q2 Questions
7 pages
Type of Backup
No ratings yet
Type of Backup
2 pages
Peer-Graded Assignment - Capstone Project
No ratings yet
Peer-Graded Assignment - Capstone Project
3 pages
18CSL58 DBMS Lab Manual
No ratings yet
18CSL58 DBMS Lab Manual
66 pages
Full Stack Data Science Guide 2023
No ratings yet
Full Stack Data Science Guide 2023
17 pages
5 Documentdatabases
No ratings yet
5 Documentdatabases
25 pages
Data Warehouse Concepts: Avinash Kanumuru Diya Jana Debyajit Majumder
100% (1)
Data Warehouse Concepts: Avinash Kanumuru Diya Jana Debyajit Majumder
308 pages
Release Notes 11.2
No ratings yet
Release Notes 11.2
26 pages
DBMS Queries For 6th Sem
No ratings yet
DBMS Queries For 6th Sem
1 page
Module 5 Lab: Implementing Data Integrity: Exercise 1: Creating Constraints
No ratings yet
Module 5 Lab: Implementing Data Integrity: Exercise 1: Creating Constraints
9 pages
Data Science Notes 1
No ratings yet
Data Science Notes 1
3 pages
4 Ws of Research Data Publishing
No ratings yet
4 Ws of Research Data Publishing
41 pages
Homework Normalization
No ratings yet
Homework Normalization
2 pages
BTEC HND Database Design Verification
No ratings yet
BTEC HND Database Design Verification
32 pages
SQL Query Performance Tuning Tips
No ratings yet
SQL Query Performance Tuning Tips
3 pages
Working With Triggers in A MySQL Database PDF
No ratings yet
Working With Triggers in A MySQL Database PDF
10 pages

Exam Topics - PDE - Questions-7w1dhd9jefy8p8w9ucpjurqidy

Uploaded by

Exam Topics - PDE - Questions-7w1dhd9jefy8p8w9ucpjurqidy

Uploaded by

Important Keywords and Answers:

keywords --- answer

Google Stackdriver Monitoring → Ans : performance NOT missing data

warehouse,analytics,sql ---- Ans : Bigquery

sql,relational database, transaction,postgresql single or particular region specified ,upto 30TB

spark,hadoop -- cloud data proc

messaging service,queuing service,subscription --- cloud pub/sub

images and container -- cloud build

data transformation pipeline -- cloud data fusion

quick checks,within a minute , one second -- cloud functions

ruby, minimal code, serverless -- App engine

backend, app mobile -- firestore

analytics and ML - cloud dataprep

sdk,batch & stream processing - cloud dataflow

data access in single region - regional storage

data access once per quarter/ 90 days - coldline

B - Not labelled as Fraud or not. So Unsupervised.

First rule of dataproc is to keep data in GCS.

perform analytics → BigQuery

Apache Spark is faster than Hadoop/Pig/MapReduce

Keywords → gsutil , TAR

BigTable can take in data from dataproc, spark and Hadoop.

The link on authorized views (https://cloud.google.com/bigquery/docs/share-access-views) explicitly states

Speed of data transfer depends on Bandwidth

Add a ParDo transform in Cloud Dataflow to discard corrupt elements

The table is already partitioned with ingestion date.So package-tracking ID

Multi-region increases high availability and pdf can be stored in gcs

Cloud SQL (30TB)

ACID compliance for Spanner.

moving average → sliding window

You might also like