0% found this document useful (0 votes)
147 views34 pages

Google Professional Cloud Data Engineer Practice Exam PDF

The document is a practice exam for the Google Professional Cloud Data Engineer certification, providing sample questions and answers to help candidates prepare. It emphasizes the importance of understanding the types of questions that may appear on the actual exam and suggests using Google Cloud services effectively. Successful completion of the practice exam does not guarantee passing the certification exam, as the actual exam covers a broader range of topics.

Uploaded by

university.gamma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
147 views34 pages

Google Professional Cloud Data Engineer Practice Exam PDF

The document is a practice exam for the Google Professional Cloud Data Engineer certification, providing sample questions and answers to help candidates prepare. It emphasizes the importance of understanding the types of questions that may appear on the actual exam and suggests using Google Cloud services effectively. Successful completion of the practice exam does not guarantee passing the certification exam, as the actual exam covers a broader range of topics.

Uploaded by

university.gamma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

10/26/2020 Google Professional Cloud Data Engineer Practice Exam

(https://gcp-  
examquestions.com/)

Home (https://gcp-examquestions.com) 

GCP Practice Questions (https://gcp-examquestions.com/category/gcp-practice-questions/) 

Google Professional Cloud Data Engineer Practice Exam

05 Google Professional Cloud Data Engineer Practice


Apr
Exam
 By gcp-examquestions (https://gcp-examquestions.com/author/gcp-examquestions/)
 GCP Practice Questions (https://gcp-examquestions.com/category/gcp-practice-questions/)
 Google Professional Cloud Data Engineer Practice Exam (https://gcp-examquestions.com/tag/google-
professional-cloud-data-engineer-practice-exam/)
 0 Comments (https://gcp-examquestions.com/gcp-pro-data-engineer-practice/#respond)

Notes: Hi all, Google Professional Cloud Data Engineer practice exam will familiarize you with types of
questions you may encounter on the certi cation exam and help you determine your readiness or if
you need more preparation and/or experience. Successful completion of the practice exam does not
guarantee you will pass the certi cation exam as the actual exam is longer and covers a wider range
of topics.
We highly recommend you should take Google Professional Cloud Data Engineer Guarantee Part
(https://gcp-examquestions.com/gcp-pro-cloud-data-engineer-exam/) because it include real
questions and highlighted answers are collected in our exam. It will help you pass exam in easier
way.

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 1/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

1. You are building storage for les for a data pipeline on Google Cloud. You want to support JSON les.
The schema of these les will occasionally change. Your analyst teams will use running aggregate ANSI
SQL queries on this data. What should you do?

A. Use BigQuery for storage. Provide format les for data load. Update the format les as needed.
B. Use BigQuery for storage. Select “Automatically detect” in the Schema section.
C. Use Cloud Storage for storage. Link data as temporary tables in BigQuery and turn on the
“Automatically detect” option in the Schema section of BigQuery.
D. Use Cloud Storage for storage. Link data as permanent tables in BigQuery and turn on the
“Automatically detect” option in the Schema section of BigQuery.

Answers: B is correct because of the requirement to support occasionally (schema) changing JSON les
and aggregate ANSI SQL queries: you need to use BigQuery, and it is quickest to use ‘Automatically detect’
for schema changes.

2. You use a Hadoop cluster both for serving analytics and for processing and transforming data. The data
is currently stored on HDFS in Parquet format. The data processing jobs run for 6 hours each night.
Analytics users can access the system 24 hours a day. Phase 1 is to quickly migrate the entire Hadoop
environment without a major re-architecture. Phase 2 will include migrating to BigQuery for analytics and
to Cloud Data ow for data processing. You want to make the future migration to BigQuery and Cloud
Data ow easier by following Google-recommended practices and managed services. What should you do?

A. Lift and shift Hadoop/HDFS to Cloud Dataproc.


B. Lift and shift Hadoop/HDFS to Compute Engine.
C. Create a single Cloud Dataproc cluster to support both analytics and data processing, and point it at a
Cloud Storage bucket that contains the Parquet les that were previously stored on HDFS.
D. Create separate Cloud Dataproc clusters to support analytics and data processing, and point both at
the same Cloud Storage bucket that contains the Parquet les that were previously stored on HDFS.

Answers: D Is correct because it leverages a managed service (Cloud Dataproc), the data is stored on GCS
in Parquet format which can easily be loaded into BigQuery in the future and the Cloud Dataproc clusters
are job speci c.
https://cloud.google.com/solutions/migration/hadoop/hadoop-gcp-migration-jobs
(https://cloud.google.com/solutions/migration/hadoop/hadoop-gcp-migration-jobs)

3. You are building a new real-time data warehouse for your company and will use Google BigQuery
streaming inserts. There is no guarantee that data will only be sent in once but you do have a unique ID
for each row of data and an event timestamp. You want to ensure that duplicates are not included while
interactively querying data. Which query type should you use?

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 2/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

A. Include ORDER BY DESC on timestamp column and LIMIT to 1.


B. Use GROUP BY on the unique ID column and timestamp column and SUM on the values.
C. Use the LAG window function with PARTITION by unique ID along with WHERE LAG IS NOT NULL.
D. Use the ROW_NUMBER window function with PARTITION by unique ID along with WHERE row equals 1.

Answers: D is correct because it will just pick out a single row for each set of duplicates.
https://cloud.google.com/storage/ (https://cloud.google.com/storage/)
https://cloud.google.com/storage/transfer/ (https://cloud.google.com/storage/transfer/)

4. You are designing a streaming pipeline for ingesting player interaction data for a mobile game. You
want the pipeline to handle out-of-order data delayed up to 15 minutes on a per-player basis and
exponential growth in global users. What should you do?

A. Design a Cloud Data ow streaming pipeline with session windowing and a minimum gap duration of 15
minutes. Use “individual player” as the key. Use Cloud Pub/Sub as a message bus for ingestion.
B. Design a Cloud Data ow streaming pipeline with session windowing and a minimum gap duration of 15
minutes. Use “individual player” as the key. Use Apache Kafka as a message bus for ingestion.
C. Design a Cloud Data ow streaming pipeline with a single global window of 15 minutes. Use Cloud
Pub/Sub as a message bus for ingestion.
D. Design a Cloud Data ow streaming pipeline with a single global window of 15 minutes. Use Apache
Kafka as a message bus for ingestion.

Answers: A is correct because the question requires delay be handled on a per-player basis and session
windowing will do that. PubSub handles the need to scale exponentially with tra c coming from around
the globe.
https://beam.apache.org/documentation/programming-guide/#windowing
(https://beam.apache.org/documentation/programming-guide/#windowing)
https://cloud.google.com/pubsub/architecture (https://cloud.google.com/pubsub/architecture)

5. Your company is loading comma-separated values (CSV) les into Google BigQuery. The data is fully
imported successfully; however, the imported data is not matching byte-to-byte to the source le. What is
the most likely cause of this problem?

A. The CSV data loaded in BigQuery is not agged as CSV.


B. The CSV data had invalid rows that were skipped on import.
C. The CSV data loaded in BigQuery is not using BigQuery’s default encoding.
D. The CSV data has not gone through an ETL phase before loading into BigQuery.

Answers: C is correct because this is the only situation that would cause successful import.
https://cloud.google.com/bigquery/docs/loading-data#loading_encoded_data
(https://cloud.google.com/bigquery/docs/loading-data#loading_encoded_data)

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 3/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

6. Your company is migrating their 30-node Apache Hadoop cluster to the cloud. They want to re-use
Hadoop jobs they have already created and minimize the management of the cluster as much as possible.
They also want to be able to persist data beyond the life of the cluster. What should you do?

A. Create a Google Cloud Data ow job to process the data.


B. Create a Google Cloud Dataproc cluster that uses persistent disks for HDFS.
C. Create a Hadoop cluster on Google Compute Engine that uses persistent disks.
D. Create a Cloud Dataproc cluster that uses the Google Cloud Storage connector.
E. Create a Hadoop cluster on Google Compute Engine that uses Local SSD disks.

Answers: D is correct because it uses managed services, and also allows for the data to persist on GCS
beyond the life of the cluster.

7. You work for an economic consulting rm that helps companies identify economic trends as they
happen. As part of your analysis, you use Google BigQuery to correlate customer data with the average
prices of the 100 most common goods sold, including bread, gasoline, milk, and others. The average prices
of these goods are updated every 30 minutes. You want to make sure this data stays up to date so you
can combine it with other data in BigQuery as cheaply as possible. What should you do?
A. Load the data every 30 minutes into a new partitioned table in BigQuery.
B. Store and update the data in a regional Google Cloud Storage bucket and create a federated data
source in BigQuery.
C. Store the data in Google Cloud Datastore. Use Google Cloud Data ow to query BigQuery and combine
the data programmatically with the data stored in Cloud Datastore.
D. Store the data in a le in a regional Google Cloud Storage bucket. Use Google Cloud Data ow to query
BigQuery and combine the data programmatically with the data stored in Google Cloud Storage.

Answers: B is correct because regional storage is cheaper than BigQuery storage.

8. You have 250,000 devices which produce a JSON device status event every 10 seconds. You want to
capture this event data for outlier time series analysis. What should you do?

A. Ship the data into BigQuery. Develop a custom application that uses the BigQuery API to query the
dataset and displays device outlier data based on your business requirements.
B. Ship the data into BigQuery. Use the BigQuery console to query the dataset and display device outlier
data based on your business requirements.
C. Ship the data into Cloud Bigtable. Use the Cloud Bigtable cbt tool to display device outlier data based on
your business requirements.
D. Ship the data into Cloud Bigtable. Install and use the HBase shell for Cloud Bigtable to query the table
for device outlier data based on your business requirements.

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 4/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

Answers: C is correct because the data type, volume, and query pattern best ts BigTable capabilities and
also Google best practices as linked below.
https://cloud.google.com/bigtable/docs/go/cbt-overview (https://cloud.google.com/bigtable/docs/go/cbt-
overview)
https://cloud.google.com/bigtable/docs/installing-hbase-shell
(https://cloud.google.com/bigtable/docs/installing-hbase-shell)

9. You are selecting a messaging service for log messages that must include nal result message ordering
as part of building a data pipeline on Google Cloud. You want to stream input for 5 days and be able to
query the current status. You will be storing the data in a searchable repository. How should you set up
the input messages?

A. Use Cloud Pub/Sub for input. Attach a timestamp to every message in the publisher.
B. Use Cloud Pub/Sub for input. Attach a unique identi er to every message in the publisher.
C. Use Apache Kafka on Compute Engine for input. Attach a timestamp to every message in the publisher.
D. Use Apache Kafka on Compute Engine for input. Attach a unique identi er to every message in the
publisher.

Answers: A is correct because of recommended Google practices; see the links below.
https://cloud.google.com/pubsub/docs/ordering (https://cloud.google.com/pubsub/docs/ordering)
http://www.jesse-anderson.com/2016/07/apache-kafka-and-google-cloud-pubsub/ (http://www.jesse-
anderson.com/2016/07/apache-kafka-and-google-cloud-pubsub/)

10. You want to publish system metrics to Google Cloud from a large number of on-prem hypervisors and
VMs for analysis and creation of dashboards. You have an existing custom monitoring agent deployed to
all the hypervisors and your on-prem metrics system is unable to handle the load. You want to design a
system that can collect and store metrics at scale. You don’t want to manage your own time series
database. Metrics from all agents should be written to the same table but agents must not have
permission to modify or read data written by other agents.What should you do?

A. Modify the monitoring agent to publish protobuf messages to Cloud PubSub. Use a Dataproc cluster or
Data ow job to consume messages from Pubsub and write to BigTable.
B. Modify the monitoring agent to write protobuf messages directly to BigTable.
C. Modify the monitoring agent to write protobuf messages to HBase deployed on GCE VM Instances
D. Modify the monitoring agent to write protobuf messages to Cloud Pubsub. Use a Dataproc cluster or
Data ow job to consume messages from Pubsub and write to Cassandra deployed on GCE VM Instances.

Answers: A Is correct because Bigtable can store and analyze time series data, and the solution is using
managed services which is what the requirements are calling for.

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 5/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

11. You are designing storage for CSV les and using an I/O-intensive custom Apache Spark transform as
part of deploying a data pipeline on Google Cloud. You intend to use ANSI SQL to run queries for your
analysts. How should you transform the input data?

A. Use BigQuery for storage. Use Cloud Data ow to run the transformations.
B. Use BigQuery for storage. Use Cloud Dataproc to run the transformations.
C. Use Cloud Storage for storage. Use Cloud Data ow to run the transformations.
D. Use Cloud Storage for storage. Use Cloud Dataproc to run the transformations.

Answers: B is correct because of the requirement to use custom Spark transforms; use Cloud Dataproc.
ANSI SQL queries require the use of BigQuery.
https://stackover ow.com/questions/46436794/what-is-the-di erence-between-google-cloud-data ow-
and-google-cloud-dataproc (https://stackover ow.com/questions/46436794/what-is-the-di erence-
between-google-cloud-data ow-and-google-cloud-dataproc)

12. You are designing a relational data repository on Google Cloud to grow as needed. The data will be
transactionally consistent and added from any location in the world. You want to monitor and adjust node
count for input tra c, which can spike unpredictably. What should you do?

A. Use Cloud Spanner for storage. Monitor storage usage and increase node count if more than 70%
utilized.
B. Use Cloud Spanner for storage. Monitor CPU utilization and increase node count if more than 70%
utilized for your time span.
C. Use Cloud Bigtable for storage. Monitor data stored and increase node count if more than 70% utilized.
D. Use Cloud Bigtable for storage. Monitor CPU utilization and increase node count if more than 70%
utilized for your time span.

Answers: B is correct because of the requirement to globally scalable transactions—use Cloud Spanner.
CPU utilization is the recommended metric for scaling, per Google best practices, linked below.
https://cloud.google.com/spanner/docs/monitoring (https://cloud.google.com/spanner/docs/monitoring)
https://cloud.google.com/bigtable/docs/monitoring-instance
(https://cloud.google.com/bigtable/docs/monitoring-instance)

13. You have a Spark application that writes data to Cloud Storage in Parquet format. You scheduled the
application to run daily using DataProcSparkOperator and Apache Air ow DAG by Cloud Composer. You
want to add tasks to the DAG to make the data available to BigQuery users. You want to maximize query
speed and con gure partitioning and clustering on the table. What should you do?

A. Use “BashOperator” to call “bq insert”.


B. Use “BashOperator” to call “bq cp” with the “–append” ag.
C. Use “GoogleCloudStorageToBigQueryOperator” with “schema_object” pointing to a schema JSON in

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 6/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

Cloud Storage and “source_format” set to “PARQUET”.


D. Use “BigQueryCreateExternalTableOperator” with “schema_object” pointing to a schema JSON in Cloud
Storage and “source_format” set to “PARQUET”.

Answers: C is correct because it loads the data and sets partitioning and clustering.
https://cloud.google.com/bigquery/docs/loading-data (https://cloud.google.com/bigquery/docs/loading-
data)
https://air ow.incubator.apache.org/integration.html#bigquerycreateemptytableoperator
(https://air ow.incubator.apache.org/integration.html#bigquerycreateemptytableoperator)
https://cloud.google.com/bigquery/docs/reference/bq-cli-reference
(https://cloud.google.com/bigquery/docs/reference/bq-cli-reference)
https://cloud.google.com/bigquery/docs/bq-command-line-tool
(https://cloud.google.com/bigquery/docs/bq-command-line-tool)
https://air ow.incubator.apache.org/integration.html#googlecloudstoragetobigqueryoperator
(https://air ow.incubator.apache.org/integration.html#googlecloudstoragetobigqueryoperator)
https://cloud.google.com/bigquery/external-data-sources (https://cloud.google.com/bigquery/external-
data-sources)

14. You have a website that tracks page visits for each user and then creates a Cloud Pub/Sub message
with the session ID and URL of the page. You want to create a Cloud Data ow pipeline that sums the total
number of pages visited by each user and writes the result to BigQuery. User sessions timeout after 30
minutes. Which type of Cloud Data ow window should you choose?

A. A single global window


B. Fixed-time windows with a duration of 30 minutes
C. Session-based windows with a gap duration of 30 minutes
D. Sliding-time windows with a duration of 30 minutes and a new window every 5 minute

Answers: C is correct because it continues to sum user page visits during their browsing session and
completes at the same time as the session timeout.
https://cloud.google.com/data ow/docs/resources (https://cloud.google.com/data ow/docs/resources)

15. You are designing a basket abandonment system for an ecommerce company. The system will send a
message to a user based on these rules: a). No interaction by the user on the site for 1 hour b). Has added
more than $30 worth of products to the basket c). Has not completed a transaction. You use Google Cloud
Data ow to process the data and decide if a message should be sent. How should you design the
pipeline?

A. Use a xed-time window with a duration of 60 minutes.


B. Use a sliding time window with a duration of 60 minutes.
C. Use a session window with a gap time duration of 60 minutes.

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 7/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

D. Use a global window with a time based trigger with a delay of 60 minutes.

Answers: C is correct because it will send a message per user after that user is inactive for 60 minutes.
https://beam.apache.org/documentation/programming-guide/#windowing
(https://beam.apache.org/documentation/programming-guide/#windowing)

16. You need to stream time-series data in Avro format, and then write this to both BigQuery and Cloud
Bigtable simultaneously using Cloud Data ow. You want to achieve minimal end-to-end latency. Your
business requirements state this needs to be completed as quickly as possible. What should you do?

A. Create a pipeline and use ParDo transform.


B. Create a pipeline that groups the data into a PCollection and uses the Combine transform.
C. Create a pipeline that groups data using a PCollection and then uses Cloud Bigtable and BigQueryIO
transforms.
D. Create a pipeline that groups data using a PCollection, and then use Avro I/O transform to write to
Cloud Storage. After the data is written, load the data from Cloud Storage into BigQuery and Cloud
Bigtable.

Answers: C is correct because this is the right set of transformations that accepts and writes to the
required data stores.
https://cloud.google.com/blog/products/gcp/guide-to-common-cloud-data ow-use-case-patterns-part-1
(https://cloud.google.com/blog/products/gcp/guide-to-common-cloud-data ow-use-case-patterns-part-1)

17. Your company’s on-premises Apache Hadoop servers are approaching end-of-life, and IT has decided
to migrate the cluster to Google Cloud Dataproc. A like-for-like migration of the cluster would require 50
TB of Google Persistent Disk per node. The CIO is concerned about the cost of using that much block
storage. You want to minimize the storage cost of the migration. What should you do?

A. Put the data into Google Cloud Storage.


B. Use preemptible virtual machines (VMs) for the Cloud Dataproc cluster.
C. Tune the Cloud Dataproc cluster so that there is just enough disk for all data.
D. Migrate some of the cold data into Google Cloud Storage, and keep only the hot data in Persistent Disk.

Answers: A is correct because Google recommends using Google Cloud Storage instead of HDFS as it is
much more cost e ective especially when jobs aren’t running.
https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage
(https://cloud.google.com/dataproc/docs/concepts/connectors/cloud-storage)

18. You are designing storage for two relational tables that are part of a 10-TB database on Google Cloud.
You want to support transactions that scale horizontally. You also want to optimize data for range queries
on non-key columns. What should you do?

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 8/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

A. Use Cloud SQL for storage. Add secondary indexes to support query patterns.
B. Use Cloud SQL for storage. Use Cloud Data ow to transform data to support query patterns.
C. Use Cloud Spanner for storage. Add secondary indexes to support query patterns.
D. Use Cloud Spanner for storage. Use Cloud Data ow to transform data to support query patterns.

Answers: C is correct because Cloud Spanner scales horizontally, and you can create secondary indexes
for the range queries that are required.

19. Your company is streaming real-time sensor data from their factory oor into Bigtable and they have
noticed extremely poor performance. How should the row key be redesigned to improve Bigtable
performance on queries that populate real-time dashboards?

A. Use a row key of the form .


B. Use a row key of the form .
C. Use a row key of the form #.
D. Use a row key of the form #.

Answers: D is correct because it will allow for retrieval of data based on both sensor id and timestamp but
without causing hotspotting.
https://cloud.google.com/bigtable/docs/schema-design (https://cloud.google.com/bigtable/docs/schema-
design)
https://cloud.google.com/bigtable/docs/schema-design-time-
series#ensure_that_your_row_key_avoids_hotspotting (https://cloud.google.com/bigtable/docs/schema-
design-time-series#ensure_that_your_row_key_avoids_hotspotting)

20. You are developing an application on Google Cloud that will automatically generate subject labels for
users’ blog posts. You are under competitive pressure to add this feature quickly, and you have no
additional developer resources. No one on your team has experience with machine learning. What should
you do?

A. Call the Cloud Natural Language API from your application. Process the generated Entity Analysis as
labels.
B. Call the Cloud Natural Language API from your application. Process the generated Sentiment Analysis
as labels.
C. Build and train a text classi cation model using TensorFlow. Deploy the model using Cloud Machine
Learning Engine. Call the model from your application and process the results as labels.
D. Build and train a text classi cation model using TensorFlow. Deploy the model using a Kubernetes
Engine cluster. Call the model from your application and process the results as labels.

Answers: A is correct because it provides a managed service and a fully trained model, and the user is
pulling the entities, which is the right label.

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 9/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

21. Your company is using WILDCARD tables to query data across multiple tables with similar names. The
SQL statement is currently failing with the error shown below. Which table name will make the SQL
statement work correctly?

(https://media.gcp-examquestions.com/2020/04/05163352/GOOGLE-CLOUD-DATA-ENGINEER-1.jpg)

A. `bigquery-public-data.noaa_gsod.gsod`
B. bigquery-public-data.noaa_gsod.gsod*
C. ‘bigquery-public-data.noaa_gsod.gsod*’
D. `bigquery-public-data.noaa_gsod.gsod*`

Answers: D is correct because it follows the correct wildcard syntax of enclosing the table name in
backticks and including the * wildcard character.
https://cloud.google.com/bigquery/docs/reference/standard-sql/wildcard-table-reference
(https://cloud.google.com/bigquery/docs/reference/standard-sql/wildcard-table-reference)

22. You are working on an ML-based application that will transcribe conversations between manufacturing
workers. These conversations are in English and between 30-40 sec long. Conversation recordings come
from old enterprise radio sets that have a low sampling rate of 8000 Hz, but you have a large dataset of
these recorded conversations with their transcriptions. You want to follow Google-recommended
practices. How should you proceed with building your application?

A. Use Cloud Speech-to-Text API, and send requests in a synchronous mode.


B. Use Cloud Speech-to-Text API, and send requests in an asynchronous mode.
C. Use Cloud Speech-to-Text API, but resample your captured recordings to a rate of 16000 Hz.
D. Train your own speech recognition model because you have an uncommon use case and you have a
labeled dataset.

Answers: A is correct because synchronous mode is recommended for short audio les.
https://cloud.google.com/speech-to-text/docs/reference/rest/v1beta1/RecognitionCon g
(https://cloud.google.com/speech-to-text/docs/reference/rest/v1beta1/RecognitionCon g)
https://cloud.google.com/speech-to-text/docs/sync-recognize (https://cloud.google.com/speech-to-

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 10/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

text/docs/sync-recognize)
https://cloud.google.com/speech-to-text/docs/best-practices (https://cloud.google.com/speech-to-
text/docs/best-practices)

23. You are developing an application on Google Cloud that will label famous landmarks in users’ photos.
You are under competitive pressure to develop a predictive model quickly. You need to keep service costs
low. What should you do?

A. Build an application that calls the Cloud Vision API. Inspect the generated MID values to supply the
image labels.
B. Build an application that calls the Cloud Vision API. Pass client image locations as base64-encoded
strings.
C. Build and train a classi cation model with TensorFlow. Deploy the model using Cloud Machine Learning
Engine. Pass client image locations as base64-encoded strings.
D. Build and train a classi cation model with TensorFlow. Deploy the model using Cloud Machine Learning
Engine. Inspect the generated MID values to supply the image labels.

Answers: B is correct because of the requirement to quickly develop a model that generates landmark
labels from photos. This is supported in Cloud Vision API; see the link below.
https://cloud.google.com/vision/docs/labels (https://cloud.google.com/vision/docs/labels)

24. You are building a data pipeline on Google Cloud. You need to select services that will host a deep
neural network machine-learning model also hosted on Google Cloud. You also need to monitor and run
jobs that could occasionally fail. What should you do?

A. Use Cloud Machine Learning to host your model. Monitor the status of the Operation object for ‘error’
results.
B. Use Cloud Machine Learning to host your model. Monitor the status of the Jobs object for ‘failed’ job
states.
C. Use a Kubernetes Engine cluster to host your model. Monitor the status of the Jobs object for ‘failed’ job
states.
D. Use a Kubernetes Engine cluster to host your model. Monitor the status of Operation object for ‘error’
results.

Answers: B is correct because of the requirement to host an ML DNN and Google-recommended


monitoring object (Jobs); see the links below.
https://cloud.google.com/ml-engine/docs/managing-models-jobs (https://cloud.google.com/ml-
engine/docs/managing-models-jobs)
https://cloud.google.com/kubernetes-engine/docs/how-to/monitoring
(https://cloud.google.com/kubernetes-engine/docs/how-to/monitoring)

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 11/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

25. You work on a regression problem in a natural language processing domain, and you have 100M
labeled examples in your dataset. You have randomly shu ed your data and split your dataset into
training and test samples (in a 90/10 ratio). After you have trained the neural network and evaluated your
model on a test set, you discover that the root-mean-squared error (RMSE) of your model is twice as high
on the train set as on the test set. How should you improve the performance of your model?

A. Increase the share of the test sample in the train-test split.


B. Try to collect more data and increase the size of your dataset.
C. Try out regularization techniques (e.g., dropout or batch normalization) to avoid over tting.
D. Increase the complexity of your model by, e.g., introducing an additional layer or increasing the size of
vocabularies or n-grams used to avoid under tting.

Answers: D is correct since increasing model complexity generally helps when you have an under tting
problem.
https://developers.google.com/machine-learning/crash-course/generalization/peril-of-over tting
(https://developers.google.com/machine-learning/crash-course/generalization/peril-of-over tting)
https://towardsdatascience.com/deep-learning-3-more-on-cnns-handling-over tting-2bd5d99abe5d
(https://towardsdatascience.com/deep-learning-3-more-on-cnns-handling-over tting-2bd5d99abe5d)
https://developers.google.com/machine-learning/crash-course/regularization-for-simplicity/l2-
regularization (https://developers.google.com/machine-learning/crash-course/regularization-for-
simplicity/l2-regularization)
https://developers.google.com/machine-learning/crash-course/training-neural-networks/best-practices
(https://developers.google.com/machine-learning/crash-course/training-neural-networks/best-practices)

26. You are using Cloud Pub/Sub to stream inventory updates from many point-of-sale (POS) terminals
into BigQuery. Each update event has the following information: product identi er “prodSku”, change
increment “quantityDelta”, POS identi cation “termId”, and “messageId” which is created for each push
attempt from the terminal. During a network outage, you discovered that duplicated messages were sent,
causing the inventory system to over-count the changes. You determine that the terminal application has
design problems and may send the same event more than once during push retries. You want to ensure
that the inventory update is accurate. What should you do?

A. Inspect the “publishTime” of each message. Make sure that messages whose “publishTime” values
match rows in the BigQuery table are discarded.
B. Inspect the “messageId” of each message. Make sure that any messages whose “messageId” values
match corresponding rows in the BigQuery table are discarded.
C. Instead of specifying a change increment for “quantityDelta”, always use the derived inventory value
after the increment has been applied. Name the new attribute “adjustedQuantity”.
D. Add another attribute orderId to the message payload to mark the unique check-out order across all
terminals. Make sure that messages whose “orderId” and “prodSku” values match corresponding rows in
the BigQuery table are discarded.

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 12/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

Answers: D is correct because the client application must include a unique identi er to disambiguate
possible duplicates due to push retries.
https://cloud.google.com/pubsub/docs/faq#duplicates
(https://cloud.google.com/pubsub/docs/faq#duplicates)
https://cloud.google.com/pubsub/docs/reference/rest/v1/PubsubMessage
(https://cloud.google.com/pubsub/docs/reference/rest/v1/PubsubMessage)

27. You designed a database for patient records as a pilot project to cover a few hundred patients in three
clinics. Your design used a single database table to represent all patients and their visits, and you used
self-joins to generate reports. The server resource utilization was at 50%. Since then, the scope of the
project has expanded. The database table must now store 100 times more patient records. You can no
longer run the reports, because they either take too long or they encounter errors with insu cient
compute resources. How should you adjust the database design?

A. Add capacity (memory and disk space) to the database server by the order of 200.
B. Shard the tables into smaller ones based on date ranges, and only generate reports with pre-speci ed
date ranges.
C. Normalize the master patient-record table into the patients table and the visits table, and create other
necessary tables to avoid self-join.
D. Partition the table into smaller tables, with one for each clinic. Run queries against the smaller table
pairs, and use unions for consolidated reports.

Answers: C is correct because this option provides the least amount of inconvenience over using pre-
speci ed date ranges or one table per clinic while also increasing performance due to avoiding self-joins.
https://cloud.google.com/bigquery/docs/best-practices-performance-patterns
(https://cloud.google.com/bigquery/docs/best-practices-performance-patterns)
https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#explicit-alias-visibility
(https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#explicit-alias-visibility)

28. Your startup has never implemented a formal security policy. Currently, everyone in the company has
access to the datasets stored in Google BigQuery. Teams have the freedom to use the service as they see
t, and they have not documented their use cases. You have been asked to secure the data warehouse.
You need to discover what everyone is doing. What should you do rst?

A. Use Google Stackdriver Audit Logs to review data access.


B. Get the identity and access management (IAM) policy of each table.
C. Use Stackdriver Monitoring to see the usage of BigQuery query slots.
D. Use the Google Cloud Billing API to see what account the warehouse is being billed to.

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 13/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

Answers: A is correct because this is the best way to get granular access to data showing which users are
accessing which data.
https://cloud.google.com/bigquery/docs/reference/auditlogs/#mapping-audit-entries-to-log-streams
(https://cloud.google.com/bigquery/docs/reference/auditlogs/#mapping-audit-entries-to-log-streams)
https://cloud.google.com/bigquery/docs/monitoring#slots-available
(https://cloud.google.com/bigquery/docs/monitoring#slots-available)

29. You created a job which runs daily to import highly sensitive data from an on-premises location to
Cloud Storage. You also set up a streaming data insert into Cloud Storage via a Kafka node that is running
on a Compute Engine instance. You need to encrypt the data at rest and supply your own encryption key.
Your key should not be stored in the Google Cloud. What should you do?

A. Create a dedicated service account, and use encryption at rest to reference your data stored in Cloud
Storage and Compute Engine data as part of your API service calls.
B. Upload your own encryption key to Cloud Key Management Service, and use it to encrypt your data in
Cloud Storage. Use your uploaded encryption key and reference it as part of your API service calls to
encrypt your data in the Kafka node hosted on Compute Engine.
C. Upload your own encryption key to Cloud Key Management Service, and use it to encrypt your data in
your Kafka node hosted on Compute Engine.
D. Supply your own encryption key, and reference it as part of your API service calls to encrypt your data
in Cloud Storage and your Kafka node hosted on Compute Engine.

Answers: D is correct because the scenario requires you to use your own key and also to not store your
key on Compute Engine, and also this is a
Google recommended practice.

30. You are working on a project with two compliance requirements. The rst requirement states that
your developers should be able to see the Google Cloud Platform billing charges for only their own
projects. The second requirement states that your nance team members can set budgets and view the
current charges for all projects in the organization. The nance team should not be able to view the
project contents. You want to set permissions. What should you do?

A. Add the nance team members to the default IAM Owner role. Add the developers to a custom role
that allows them to see their own spend only.
B. Add the nance team members to the Billing Administrator role for each of the billing accounts that
they need to manage. Add the developers to the Viewer role for the Project.
C. Add the developers and nance managers to the Viewer role for the Project.
D. Add the nance team to the Viewer role for the Project. Add the developers to the Security Reviewer
role for each of the billing accounts.

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 14/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

Answers: B is correct because it uses the principle of least privilege for IAM roles; use the Billing
Administrator IAM role for that job function.
https://cloud.google.com/iam/docs/job-functions/billing (https://cloud.google.com/iam/docs/job-
functions/billing)

32. Suppose you have a table that includes a nested column called “city” inside a column called “person”,
but when you try to submit the following query in BigQuery, it gives you an error.
SELECT person FROM `project1.example.table1` WHERE city = “London”
How would you the error?

A. Add “, UNNEST(person)” before the WHERE clause.


B. Change “person” to “person.city”.
C. Change “person” to “city.person”.
D. Add “, UNNEST(city)” before the WHERE clause.
Answer: A

33. What are two of the bene ts of using denormalized data structures in BigQuery?

A. Reduces the amount of data processed, reduces the amount of storage required
B. Increases query speed, makes queries simpler
C. Reduces the amount of storage required, increases query speed
D. Reduces the amount of data processed, increases query speed
Answer: BD

34. Which of these statements about exporting data from BigQuery is false?

A. To export more than 1 GB of data, you need to put a wildcard in the destination lename.
B. The only supported export destination is Google Cloud Storage.
C. Data can only be exported in JSON or Avro format.
D. The only compression option available is GZIP.
Answer: C

35. What are all of the BigQuery operations that Google charges for?

A. Storage, queries, and streaming inserts


B. Storage, queries, and loading data from a le
C. Storage, queries, and exporting data
D. Queries and streaming inserts
Answer: A

36. Which of the following is not possible using primitive roles?

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 15/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

A. Give a user viewer access to BigQuery and owner access to Google Compute Engine instances.
B. Give UserA owner access and UserB editor access for all datasets in a project.
C. Give a user access to view all datasets in a project, but not run queries on them.
D. Give GroupA owner access and GroupB editor access for all datasets in a project.
Answer: C

37. Which of these statements about BigQuery caching is true?

A. By default, a query’s results are not cached.


B. BigQuery caches query results for 48 hours.
C. Query results are cached even if you specify a destination table.
D. There is no charge for a query that retrieves its results from cache.
Answer: D

38. Which of these sources can you not load data into BigQuery from?

A. File upload
B. Google Drive
C. Google Cloud Storage
D. Google Cloud SQL
Answer: D

39. Which of the following statements about Legacy SQL and Standard SQL is not true?

A. Standard SQL is the preferred query language for BigQuery.


B. If you write a query in Legacy SQL, it might generate an error if you try to run it with Standard SQL.
C. One di erence between the two query languages is how you specify fully-quali ed table names (i.e.
table names that include their associated project name).
D. You need to set a query language for each dataset and the default is Standard SQL.
Answer: D

40. How would you query speci c partitions in a BigQuery table?

A. Use the DAY column in the WHERE clause


B. Use the EXTRACT(DAY) clause
C. Use the __PARTITIONTIME pseudo-column in the WHERE clause
D. Use DATE BETWEEN in the WHERE clause
Answer: C

41. Which SQL keyword can be used to reduce the number of columns processed by BigQuery?

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 16/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

A. BETWEEN
B. WHERE
C. SELECT
D. LIMIT
Answer: C

42. To give a user read permission for only the rst three columns of a table, which access control method
would you use?

A. Primitive role
B. Prede ned role
C. Authorized view
D. It’s not possible to give access to only the rst three columns of a table.
Answer: C

43. What are two methods that can be used to denormalize tables in BigQuery?

A. 1) Split table into multiple tables;


2) Use a partitioned table
B. 1) Join tables into one table;
2) Use nested repeated elds
C. 1) Use a partitioned table;
2) Join tables into one table
D. 1) Use nested repeated elds;
2) Use a partitioned table
Answer: B

44. Which of these is not a supported method of putting data into a partitioned table?

A. If you have existing data in a separate le for each day, then create a partitioned table and upload each
le into the appropriate partition.
B. Run a query to get the records for a speci c day from an existing table and for the destination table,
specify a partitioned table ending with the day in the format “$YYYYMMDD”.
C. Create a partitioned table and stream new records to it every day.
D. Use ORDER BY to put a table’s rows into chronological order and then change the table’s type to
“Partitioned”.
Answer: D

45. Which of these operations can you perform from the BigQuery Web UI?

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 17/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

A. Upload a le in SQL format.


B. Load data with nested and repeated elds.
C. Upload a 20 MB le.
D. Upload multiple les using a wildcard.
Answer: BD

46. Which methods can be used to reduce the number of rows processed by BigQuery?

A. Splitting tables into multiple tables; putting data in partitions


B. Splitting tables into multiple tables; putting data in partitions; using the LIMIT clause
C. Putting data in partitions; using the LIMIT clause
D. Splitting tables into multiple tables; using the LIMIT clause
Answer: A

47. Why do you need to split a machine learning dataset into training data and test data?

A. So you can try two di erent sets of features


B. To make sure your model is generalized for more than just the training data
C. To allow you to create unit tests in your code
D. So you can use one dataset for a wide model and one for a deep model
Answer: B

48. Which of these numbers are adjusted by a neural network as it learns from a training dataset (select 2
answers)?

A. Weights
B. Biases
C. Continuous features
D. Input values
Answer: AB

49. The CUSTOM tier for Cloud Machine Learning Engine allows you to specify the number of which types
of cluster nodes?

A. Workers
B. Masters, workers, and parameter servers
C. Workers and parameter servers
D. Parameter servers
Answer: C

50. Which software libraries are supported by Cloud Machine Learning Engine?

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 18/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

A. Theano and TensorFlow


B. Theano and Torch
C. TensorFlow
D. TensorFlow and Torch
Answer: C

51. Which TensorFlow function can you use to con gure a categorical column if you don’t know all of the
possible values for that column?

A. categorical_column_with_vocabulary_list
B. categorical_column_with_hash_bucket
C. categorical_column_with_unknown_values
D. sparse_column_with_keys
Answer: B

52. Which of the following statements about the Wide & Deep Learning model are true? (Select 2 answers.)

A. The wide model is used for memorization, while the deep model is used for generalization.
B. A good use for the wide and deep model is a recommender system.
C. The wide model is used for generalization, while the deep model is used for memorization.
D. A good use for the wide and deep model is a small-scale linear regression problem.
Answer: AB

53. To run a TensorFlow training job on your own computer using Cloud Machine Learning Engine, what
would your command start with?

A. gcloud ml-engine local train


B. gcloud ml-engine jobs submit training
C. gcloud ml-engine jobs submit training local
D. You can’t run a TensorFlow program on your own computer using Cloud ML Engine .
Answer: A

54. If you want to create a machine learning model that predicts the price of a particular stock based on its
recent price history, what type of estimator should you use?
A. Unsupervised learning
B. Regressor
C. Classi er
D. Clustering estimator
Answer: B

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 19/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

55. Suppose you have a dataset of images that are each labeled as to whether or not they contain a
human face. To create a neural network that recognizes human faces in images using this labeled dataset,
what approach would likely be the most e ective?
A. Use K-means Clustering to detect faces in the pixels.
B. Use feature engineering to add features for eyes, noses, and mouths to the input data.
C. Use deep learning by creating a neural network with multiple hidden layers to automatically detect
features of faces.
D. Build a neural network with an input layer of pixels, a hidden layer, and an output layer with two
categories.
Answer: C

56. What are two of the characteristics of using online prediction rather than batch prediction?
A. It is optimized to handle a high volume of data instances in a job and to run more complex models.
B. Predictions are returned in the response message.
C. Predictions are written to output les in a Cloud Storage location that you specify.
D. It is optimized to minimize the latency of serving predictions.
Answer: BD

57. Which of these are examples of a value in a sparse vector? (Select 2 answers.)
A. [0, 5, 0, 0, 0, 0]
B. [0, 0, 0, 1, 0, 0, 1]
C. [0, 1]
D. [1, 0, 0, 0, 0, 0, 0]
Answer: CD

58. How can you get a neural network to learn about relationships between categories in a categorical
feature?

A. Create a multi-hot column


B. Create a one-hot column
C. Create a hash bucket
D. Create an embedding column
Answer: D

59. If a dataset contains rows with individual people and columns for year of birth, country, and income,
how many of the columns are continuous and how many are categorical?
A. 1 continuous and 2 categorical
B. 3 categorical
C. 3 continuous
D. 2 continuous and 1 categorical
Answer: D

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 20/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

60. Which of the following are examples of hyperparameters? (Select 2 answers.)

A. Number of hidden layers


B. Number of nodes in each hidden layer
C. Biases
D. Weights
Answer: AB

61. Which of the following are feature engineering techniques? (Select 2 answers)

A. Hidden feature layers


B. Feature prioritization
C. Crossed feature columns
D. Bucketization of a continuous feature
Answer: CD

62. You want to use a BigQuery table as a data sink. In which writing mode(s) can you use BigQuery as a
sink?

A. Both batch and streaming


B. BigQuery cannot be used as a sink
C. Only batch
D. Only streaming
Answer: A

63. You have a job that you want to cancel. It is a streaming pipeline, and you want to ensure that any data
that is in- ight is processed and written to the output. Which of the following commands can you use on
the Data ow monitoring console to stop the pipeline job?

A. Cancel
B. Drain
C. Stop
D. Finish
Answer: B

64. When running a pipeline that has a BigQuery source, on your local machine, you continue to get
permission denied errors. What could be the reason for that?

A. Your gcloud does not have access to the BigQuery resources


B. BigQuery cannot be accessed from local machines
C. You are missing gcloud on your machine

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 21/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

D. Pipelines cannot be run locally


Answer: A

65. What Data ow concept determines when a Window’s contents should be output based on certain
criteria being met?

A. Sessions
B. OutputCriteria
C. Windows
D. Triggers
Answer: D

66. Which of the following is NOT one of the three main types of triggers that Data ow supports?

A. Trigger based on element size in bytes


B. Trigger that is a combination of other triggers
C. Trigger based on element count
D. Trigger based on time
Answer: A

67. Which Java SDK class can you use to run your Data ow programs locally?

A. LocalRunner
B. DirectPipelineRunner
C. MachineRunner
D. LocalPipelineRunner
Answer: B

68. The Data ow SDKs have been recently transitioned into which Apache service?

A. Apache Spark
B. Apache Hadoop
C. Apache Kafka
D. Apache Beam
Answer: D

69. The _________ for Cloud Bigtable makes it possible to use Cloud Bigtable in a Cloud Data ow pipeline.

A. Cloud Data ow connector


B. DataFlow SDK
C. BiqQuery API

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 22/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

D. BigQuery Data Transfer Service


Answer: A

70. Does Data ow process batch data pipelines or streaming data pipelines?

A. Only Batch Data Pipelines


B. Both Batch and Streaming Data Pipelines
C. Only Streaming Data Pipelines
D. None of the above
Answer: B

71. You are planning to use Google’s Data ow SDK to analyze customer data such as displayed below.
Your project requirement is to extract only the customer name from the data source and then write to an
output PCollection.
Tom,555 X street –
Tim,553 Y street –
Sam, 111 Z street –
Which operation is best suited for the above data processing requirement?

A. ParDo
B. Sink API
C. Source API
D. Data extraction
Answer: A

72. Which Cloud Data ow / Beam feature should you use to aggregate data in an unbounded data source
every hour based on the time when the data entered the pipeline?

A. An hourly watermark
B. An event time trigger
C. The with Allowed Lateness method
D. A processing time trigger
Answer: D

73. Which of the following is NOT true about Data ow pipelines?

A. Data ow pipelines are tied to Data ow, and cannot be run on any other runner
B. Data ow pipelines can consume data from other Google Cloud services
C. Data ow pipelines can be programmed in Java
D. Data ow pipelines use a uni ed programming model, so can work both with streaming and batch data
sources
Answer: A
https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 23/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

74. You are developing a software application using Google’s Data ow SDK, and want to use conditional,
for loops and other complex programming structures to create a branching pipeline. Which component
will be used for the data processing operation?

A. PCollection
B. Transform
C. Pipeline
D. Sink API
Answer: B

75. Which of the following IAM roles does your Compute Engine account require to be able to run pipeline
jobs?

A. data ow.worker
B. data ow.compute
C. data ow.developer
D. data ow.viewer
Answer: A

76. Which of the following is not true about Data ow pipelines?

A. Pipelines are a set of operations


B. Pipelines represent a data processing job
C. Pipelines represent a directed graph of steps
D. Pipelines can share data between instances
Answer: D

77. By default, which of the following windowing behavior does Data ow apply to unbounded data sets?

A. Windows at every 100 MB of data


B. Single, Global Window
C. Windows at every 1 minute
D. Windows at every 10 minutes
Answer: B

78. Which of the following job types are supported by Cloud Dataproc (select 3 answers)?

A. Hive
B. Pig
C. YARN
D. Spark
Answer: ABD

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 24/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

79. What are the minimum permissions needed for a service account used with Google Dataproc?

A. Execute to Google Cloud Storage; write to Google Cloud Logging


B. Write to Google Cloud Storage; read to Google Cloud Logging
C. Execute to Google Cloud Storage; execute to Google Cloud Logging
D. Read and write to Google Cloud Storage; write to Google Cloud Logging
Answer: D

80. Which role must be assigned to a service account used by the virtual machines in a Dataproc cluster so
they can execute jobs?

A. Dataproc Worker
B. Dataproc Viewer
C. Dataproc Runner
D. Dataproc Editor
Answer: A

81. When creating a new Cloud Dataproc cluster with the projects.regions.clusters.create operation, these
four values are required: project, region, name, and ____.

A. zone
B. node
C. label
D. type
Answer: A

82. Which Google Cloud Platform service is an alternative to Hadoop with Hive?
A. Cloud Data ow
B. Cloud Bigtable
C. BigQuery
D. Cloud Datastore
Answer: C

83. Which of these rules apply when you add preemptible workers to a Dataproc cluster (select 2
answers)?

A. Preemptible workers cannot use persistent disk.


B. Preemptible workers cannot store data.
C. If a preemptible worker is reclaimed, then a replacement worker must be added manually.
D. A Dataproc cluster cannot have only preemptible workers.
Answer: BD

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 25/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

84. When using Cloud Dataproc clusters, you can access the YARN web interface by con guring a browser
to connect through a ____ proxy.

A. HTTPS
B. VPN
C. SOCKS
D. HTTP
Answer: C

85. Cloud Dataproc is a managed Apache Hadoop and Apache _____ service.

A. Blaze
B. Spark
C. Fire
D. Ignite
Answer: B

86. Which action can a Cloud Dataproc Viewer perform?

A. Submit a job.
B. Create a cluster.
C. Delete a cluster.
D. List the jobs.
Answer: D

87. Cloud Dataproc charges you only for what you really use with _____ billing.

A. month-by-month
B. minute-by-minute
C. week-by-week
D. hour-by-hour
Answer: B

88. The YARN ResourceManager and the HDFS NameNode interfaces are available on a Cloud Dataproc
cluster ____.

A. application node
B. conditional node
C. master node
D. worker node
Answer: C

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 26/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

89. Which of these is NOT a way to customize the software on Dataproc cluster instances?

A. Set initialization actions


B. Modify con guration les using cluster properties
C. Con gure the cluster using Cloud Deployment Manager
D. Log into the master node and make changes from there
Answer: C

90. In order to securely transfer web tra c data from your computer’s web browser to the Cloud Dataproc
cluster you should use a(n) _____.

A. VPN connection
B. Special browser
C. SSH tunnel
D. FTP connection
Answer: C

91. All Google Cloud Bigtable client requests go through a front-end server ______ they are sent to a Cloud
Bigtable node.

A. before
B. after
C. only if
D. once
Answer: A

92. What is the general recommendation when designing your row keys for a Cloud Bigtable schema?

A. Include multiple time series values within the row key


B. Keep the row keep as an 8 bit integer
C. Keep your row key reasonably short
D. Keep your row key as long as the eld permits
Answer: C

93. Which of the following statements is NOT true regarding Bigtable access roles?

A. Using IAM roles, you cannot give a user access to only one table in a project, rather than all tables in a
project.
B. To give a user access to only one table in a project, grant the user the Bigtable Editor role for that table.
C. You can con gure access control only at the project level.
D. To give a user access to only one table in a project, you must con gure access through your application.
Answer: B

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 27/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

94. For the best possible performance, what is the recommended zone for your Compute Engine instance
and Cloud Bigtable instance?

A. Have the Compute Engine instance in the furthest zone from the Cloud Bigtable instance.
B. Have both the Compute Engine instance and the Cloud Bigtable instance to be in di erent zones.
C. Have both the Compute Engine instance and the Cloud Bigtable instance to be in the same zone.
D. Have the Cloud Bigtable instance to be in the same zone as all of the consumers of your data.
Answer: C

95. Which row keys are likely to cause a disproportionate number of reads and/or writes on a particular
node in a Bigtable cluster (select 2 answers)?

A. A sequential numeric ID
B. A timestamp followed by a stock symbol
C. A non-sequential numeric ID
D. A stock symbol followed by a timestamp
Answer: AB

96. When a Cloud Bigtable node fails, ____ is lost.

A. all data
B. no data
C. the last transaction
D. the time dimension
Answer: B

97. Which is not a valid reason for poor Cloud Bigtable performance?

A. The workload isn’t appropriate for Cloud Bigtable.


B. The table’s schema is not designed correctly.
C. The Cloud Bigtable cluster has too many nodes.
D. There are issues with the network connection.
Answer: C

98. Which is the preferred method to use to avoid hotspotting in time series data in Bigtable?

A. Field promotion
B. Randomization
C. Salting
D. Hashing
Answer: A

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 28/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

99. When you design a Google Cloud Bigtable schema it is recommended that you _________.

A. Avoid schema designs that are based on NoSQL concepts


B. Create schema designs that are based on a relational database design
C. Avoid schema designs that require atomicity across rows
D. Create schema designs that require atomicity across rows
Answer: C

100. Which of the following is NOT a valid use case to select HDD (hard disk drives) as the storage for
Google Cloud Bigtable?

A. You expect to store at least 10 TB of data.


B. You will mostly run batch workloads with scans and writes, rather than frequently executing random
reads of a small number of rows.
C. You need to integrate with Google BigQuery.
D. You will not use the data to back a user-facing or latency-sensitive application.
Answer: C

101. Cloud Bigtable is Google’s ______ Big Data database service.

A. Relational
B. mySQL
C. NoSQL
D. SQL Server
Answer: C

102. When you store data in Cloud Bigtable, what is the recommended minimum amount of stored data?

A. 500 TB
B. 1 GB
C. 1 TB
D. 500 GB
Answer: C

103. If you’re running a performance test that depends upon Cloud Bigtable, all the choices except one
below are recommended steps. Which is NOT a recommended step to follow?

A. Do not use a production instance.


B. Run your test for at least 10 minutes.
C. Before you test, run a heavy pre-test for several minutes.
D. Use at least 300 GB of data.
Answer: A

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 29/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

104. Cloud Bigtable is a recommended option for storing very large amounts of
____________________________?

A. multi-keyed data with very high latency


B. multi-keyed data with very low latency
C. single-keyed data with very low latency
D. single-keyed data with very high latency
Answer: C

105. Google Cloud Bigtable indexes a single value in each row. This value is called the _______.

A. primary key
B. unique key
C. row key
D. master key
Answer: C

106. What is the HBase Shell for Cloud Bigtable?

A. The HBase shell is a GUI based interface that performs administrative tasks, such as creating and
deleting tables.
B. The HBase shell is a command-line tool that performs administrative tasks, such as creating and
deleting tables.
C. The HBase shell is a hypervisor based shell that performs administrative tasks, such as creating and
deleting new virtualized instances.
D. The HBase shell is a command-line tool that performs only user account management functions to
grant access to Cloud Bigtable instances.
Answer: B

107. What is the recommended action to do in order to switch between SSD and HDD storage for your
Google Cloud Bigtable instance?

A. create a third instance and sync the data from the two storage types via batch jobs
B. export the data from the existing instance and import the data into a new instance
C. run parallel instances where one is HDD and the other is SDD
D. the selection is nal and you must resume using the same storage type.

Notes: Hi all, This is Google Professional Cloud Data Engineer practice exam will familiarize you with
types of questions you may encounter on the certi cation exam and help you determine your
readiness or if you need more preparation and/or experience. Successful completion of the practice
exam does not guarantee you will pass the certi cation exam as the actual exam is longer and covers
a wider range of topics.
https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 30/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

We highly recommend Google Professional Cloud Data Engineer Guarantee part because it include
real questions and highlighted answers are collected in our exam. It will help you pass exam in easier
way.

 Share this post     

Leave a Reply
Your email address will not be published. Required elds are marked *

Comment

Name *

Email *

Website

Save my name, email, and website in this browser for the next time I comment.

Post Comment

RELATED POSTS  

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 31/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

10
Google Cloud 25
Google Associate Cloud
Professional DevOps Engineer Practice Exam
Sep Aug
Engineer Practice Exam PR00061 Part 2
Part 1 Quiz Format (https://gcp-
(https://gcp- examquestions.com/google-
examquestions.com/gcp- associate-cloud-engineer-
(https://gcp-examque
pro-devops-engineer- practice-exam-pr00061-part- pro-network-engineer
practice-exam-part-1-quiz/) 2/)
Notes: Hi all, Google Cloud Professional Google Clo
Notes: Hi all, Google Associate Cloud 06
DevOps Engineer Practice Exam will Engineer Practice Exam will familiarize Profession
Apr
familiarize you with types of questions you with types of questions you may Engineer P
you may [...] encounter [...] (https://gcp-
examquestions.
pro-network-en
Read More (https://gcp-examquestions.com/gcp-pro-devops-engineer-practice-exam-part-1-quiz/)
Read More (https://gcp-examquestions.com/google-associate-clou
practice/)
Notes: Hi all, Google C
Network Engineer Prac
familiarize you with typ
you may [...]

om/gcp-associate-cloud-engineer-practice/)
Read More (https://g

Search for:

Search

CATEGORIES

Blogs (https://gcp-examquestions.com/category/blogs/)

GCP Exam Questions (https://gcp-examquestions.com/category/gcp-exam-questions/)

GCP Practice Questions (https://gcp-examquestions.com/category/gcp-practice-questions/)

RECENT POSTS

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 32/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

How I passed GCP Professional Data Engineer In 2020 (https://gcp-examquestions.com/how-i-passed-gcp-


professional-data-engineer-in-2020/)

Google Cloud Professional DevOps Engineer Practice Exam Part 1 Quiz Format (https://gcp-examquestions.com/gcp-
pro-devops-engineer-practice-exam-part-1-quiz/)

Google Professional Cloud Data Engineer PR000111 Practice Exam Part 3 (https://gcp-examquestions.com/gcp-pro-
data-engineer-practice-exam-part-3/)

Google Professional Cloud Data Engineer PR000111 Practice Exam Part 2 (https://gcp-examquestions.com/gcp-pro-
data-engineer-practice-exam-part-2/)

Google Professional Cloud Data Engineer PR000111 Practice Exam Part 1 (https://gcp-examquestions.com/gcp-pro-
data-engineer-practice-exam-part-1/)

TAGS

Google Associate Cloud Engineer Practice Exam (https://gcp-examquestions.com/tag/google-associate-cloud-engineer-


practice-exam/)

Google Cloud Associate Cloud Engineer (https://gcp-examquestions.com/tag/google-cloud-associate-cloud-engineer/)

Google Cloud Professional DevOps Engineer (https://gcp-examquestions.com/tag/google-cloud-professional-devops-


engineer/)

Google Cloud Professional Network Engineer Practice Exam (https://gcp-examquestions.com/tag/google-cloud-professional-


network-engineer-practice-exam/)

Google Professional Cloud Architect (https://gcp-examquestions.com/tag/google-professional-cloud-architect/)

Google Professional Cloud Architect Practice Exam (https://gcp-examquestions.com/tag/google-professional-cloud-architect-


practice-exam/)

Google Professional Cloud Data Engineer (https://gcp-examquestions.com/tag/google-professional-cloud-data-engineer/)

Google Professional Cloud Data Engineer Exam Questions (https://gcp-examquestions.com/tag/google-professional-cloud-


data-engineer-exam-questions/)

Google Professional Cloud Data Engineer Practice Exam (https://gcp-examquestions.com/tag/google-professional-cloud-data-


engineer-practice-exam/)

Google Professional Cloud Developer (https://gcp-examquestions.com/tag/google-professional-cloud-developer/)

Google Professional Cloud Developer Practice Exam (https://gcp-examquestions.com/tag/google-professional-cloud-


developer-practice-exam/)

Google Professional Cloud Network Engineer (https://gcp-examquestions.com/tag/google-professional-cloud-network-


engineer/)

Google Professional Cloud Security Engineer (https://gcp-examquestions.com/tag/google-professional-cloud-security-


engineer/)

Google Professional Cloud Security Engineer Practice Exam (https://gcp-examquestions.com/tag/google-professional-cloud-


security-engineer-practice-exam/)

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 33/34
10/26/2020 Google Professional Cloud Data Engineer Practice Exam

CATEGORIES

Blogs (https://gcp-examquestions.com/category/blogs/) (1)

GCP Exam Questions (https://gcp-examquestions.com/category/gcp-exam-questions/) (6)

GCP Practice Questions (https://gcp-examquestions.com/category/gcp-practice-questions/) (16)

ARCHIVES

October 2020 (https://gcp-examquestions.com/2020/10/) (1)

September 2020 (https://gcp-examquestions.com/2020/09/) (1)

August 2020 (https://gcp-examquestions.com/2020/08/) (9)

April 2020 (https://gcp-examquestions.com/2020/04/) (12)

Get in Touch!

 Email: [email protected] (mailto:[email protected])

 Working Days/Hours: 24/7

© Copyright 2020. All Rights Reserved.

  

https://gcp-examquestions.com/gcp-pro-data-engineer-practice/ 34/34

You might also like