0% found this document useful (0 votes)

110 views60 pages

ExamTopics PDE Questions Continued-Xq9fxx8o7pb7teqebey9suitnw

Flowlogistic is a logistics provider facing challenges with outdated infrastructure and the need for real-time shipment tracking and analytics. They aim to migrate to the cloud to enhance their operations and utilize proprietary technology for better resource deployment and predictive analytics. MJTelco, a startup in telecom, seeks to build a scalable data infrastructure for real-time analysis and machine learning, emphasizing security and efficient data transport as they expand their network capabilities.

Uploaded by

Kasani Ravikumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

110 views60 pages

ExamTopics PDE Questions Continued-Xq9fxx8o7pb7teqebey9suitnw

Uploaded by

Kasani Ravikumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

#34.

Flowlogistic Case Study -

Company Overview -
Flowlogistic is a leading logistics and supply chain provider. They help businesses throughout the world
manage their resources and transport them to their final destination. The company has grown rapidly,
expanding their offerings to include rail, truck, aircraft, and oceanic shipping.

Company Background -
The company started as a regional trucking company, and then expanded into other logistics market. Because
they have not updated their infrastructure, managing and tracking orders and shipments has become a
bottleneck. To improve operations, Flowlogistic developed proprietary technology for tracking shipments in real
time at the parcel level. However, they are unable to deploy it because their technology stack, based on Apache
Kafka, cannot support the processing volume. In addition, Flowlogistic wants to further analyze their orders and
shipments to determine how best to deploy their resources.

Solution Concept -
Flowlogistic wants to implement two concepts using the cloud:
✑ Use their proprietary technology in a real-time inventory-tracking system that indicates the location of their
loads
✑ Perform analytics on all their orders and shipment logs, which contain both structured and unstructured
data, to determine how best to deploy resources, which markets to expand info. They also want to use
predictive analytics to learn earlier when a shipment will be delayed.

Existing Technical Environment -

Flowlogistic architecture resides in a single data center:
✑ Databases
8 physical servers in 2 clusters
- SQL Server `" user data, inventory, static data
3 physical servers
- Cassandra `" metadata, tracking messages
10 Kafka servers `" tracking message aggregation and batch insert
✑ Application servers `" customer front end, middleware for order/customs
60 virtual machines across 20 physical servers
- Tomcat `" Java services
- Nginx `" static content
- Batch servers
✑ Storage appliances
- iSCSI for virtual machine (VM) hosts
- Fibre Channel storage area network (FC SAN) `" SQL server storage
- Network-attached storage (NAS) image storage, logs, backups
✑ 10 Apache Hadoop /Spark servers
- Core Data Lake
- Data analysis workloads
✑ 20 miscellaneous servers
- Jenkins, monitoring, bastion hosts,

Business Requirements -
Build a reliable and reproducible environment with scaled panty of production.

✑ Aggregate data in a centralized Data Lake for analysis

For Internal Use Only

✑ Use historical data to perform predictive analytics on future shipments
✑ Accurately track every shipment worldwide using proprietary technology
✑ Improve business agility and speed of innovation through rapid provisioning of new resources
✑ Analyze and optimize architecture for performance in the cloud
✑ Migrate fully to the cloud if all other requirements are met

Technical Requirements -
✑ Handle both streaming and batch data
✑ Migrate existing Hadoop workloads
✑ Ensure architecture is scalable and elastic to meet the changing demands of the company.
✑ Use managed services whenever possible
✑ Encrypt data flight and at rest
✑ Connect a VPN between the production data center and cloud environment

SEO Statement -
We have grown so quickly that our inability to upgrade our infrastructure is really hampering further growth and
efficiency. We are efficient at moving shipments around the world, but we are inefficient at moving data around.
We need to organize our information so we can more easily understand where our customers are and what they
are shipping.

CTO Statement -
IT has never been a priority for us, so as our data has grown, we have not invested enough in our technology. I
have a good staff to manage IT, but they are so busy managing our infrastructure that I cannot get them to do
the things that really matter, such as organizing our data, building the analytics, and figuring out how to
implement the CFO' s tracking technology.

CFO Statement -
Part of our competitive advantage is that we penalize ourselves for late shipments and deliveries. Knowing
where out shipments are at all times has a direct correlation to our bottom line and profitability. Additionally, I
don't want to commit capital to building out a server environment.
Flowlogistic wants to use Google BigQuery as their primary analysis system, but they still have Apache Hadoop
and Spark workloads that they cannot move to
BigQuery. Flowlogistic does not know how to store the data that is common to both workloads. What should
they do?

• A. Store the common data in BigQuery as partitioned tables.

• B. Store the common data in BigQuery and expose authorized views.

• C. Store the common data encoded as Avro in Google Cloud Storage.

• D. Store he common data in the HDFS storage for a Google Cloud Dataproc cluster.

For Internal Use Only

#35.
Continuation of question 34.
You need to build a new system on Google Cloud Platform (GCP) that will feed the proprietary tracking
software. The system must be able to ingest data from a variety of global sources, process and query in real-
time, and store the data reliably. Which combination of GCP products should you choose?

• A. Cloud Pub/Sub, Cloud Dataflow, and Cloud Storage

• B. Cloud Pub/Sub, Cloud Dataflow, and Local SSD

• C. Cloud Pub/Sub, Cloud SQL, and Cloud Storage

• D. Cloud Load Balancing, Cloud Dataflow, and Cloud Storage

For Internal Use Only

#36.
Flowlogistic's CEO wants to gain rapid insight into their customer base so his sales team can be better
informed in the field. This team is not very technical, so they've purchased a visualization tool to simplify the
creation of BigQuery reports. However, they've been overwhelmed by all the data in the table, and are spending
a lot of money on queries trying to find the data they need. You want to solve their problem in the most cost-
effective way. What should you do?

• A. Export the data into a Google Sheet for virtualization.

• B. Create an additional table with only the necessary columns.

• C. Create a view on the table to present to the virtualization tool.

• D. Create identity and access management (IAM) roles on the appropriate columns, so only they appear
in a query.

For Internal Use Only

#37.
Flowlogistic is rolling out their real-time inventory tracking system. The tracking devices will all send package-
tracking messages, which will now go to a single
Google Cloud Pub/Sub topic instead of the Apache Kafka cluster. A subscriber application will then process the
messages for real-time reporting and store them in
Google BigQuery for historical analysis. You want to ensure the package data can be analyzed over time.
Which approach should you take?

• A. Attach the timestamp on each message in the Cloud Pub/Sub subscriber application as they are
received.

• B. Attach the timestamp and Package ID on the outbound message from each publisher device as they
are sent to Clod Pub/Sub.

• C. Use the NOW () function in BigQuery to record the event's time.

• D. Use the automatically generated timestamp from Cloud Pub/Sub to order the data.

For Internal Use Only

#38.
MJTelco Case Study -

Company Overview -
MJTelco is a startup that plans to build networks in rapidly growing, underserved markets around the world. The
company has patents for innovative optical communications hardware. Based on these patents, they can
create many reliable, high-speed backbone links with inexpensive hardware.

Company Background -
Founded by experienced telecom executives, MJTelco uses technologies originally developed to overcome
communications challenges in space. Fundamental to their operation, they need to create a distributed data
infrastructure that drives real-time analysis and incorporates machine learning to continuously optimize their
topologies. Because their hardware is inexpensive, they plan to overdeploy the network allowing them to
account for the impact of dynamic regional politics on location availability and cost.
Their management and operations teams are situated all around the globe creating many-to-many relationship
between data consumers and provides in their system. After careful consideration, they decided public cloud is
the perfect environment to support their needs.

Solution Concept -
MJTelco is running a successful proof-of-concept (PoC) project in its labs. They have two primary needs:
✑ Scale and harden their PoC to support significantly more data flows generated when they ramp to more than
50,000 installations.
✑ Refine their machine-learning cycles to verify and improve the dynamic models they use to control topology
definition.
MJTelco will also use three separate operating environments `" development/test, staging, and production `" to
meet the needs of running experiments, deploying new features, and serving production customers.

Business Requirements -
✑ Scale up their production environment with minimal cost, instantiating resources when and where needed

For Internal Use Only

in an unpredictable, distributed telecom user community.
✑ Ensure security of their proprietary data to protect their leading-edge machine learning and analysis.
✑ Provide reliable and timely access to data for analysis from distributed research workers
✑ Maintain isolated environments that support rapid iteration of their machine-learning models without
affecting their customers.

Technical Requirements -
✑ Ensure secure and efficient transport and storage of telemetry data
✑ Rapidly scale instances to support between 10,000 and 100,000 data providers with multiple flows each.
✑ Allow analysis and presentation against data tables tracking up to 2 years of data storing approximately
100m records/day
✑ Support rapid iteration of monitoring infrastructure focused on awareness of data pipeline problems both in
telemetry flows and in production learning cycles.

CEO Statement -
Our business model relies on our patents, analytics and dynamic machine learning. Our inexpensive hardware
is organized to be highly reliable, which gives us cost advantages. We need to quickly stabilize our large
distributed data pipelines to meet our reliability and capacity commitments.

CTO Statement -
Our public cloud services must operate as advertised. We need resources that scale and keep our data secure.
We also need environments in which our data scientists can carefully study and quickly adapt our models.
Because we rely on automation to process our data, we also need our development and test environments to
work as we iterate.

CFO Statement -
The project is too large for us to maintain the hardware and software required for the data and analysis. Also,
we cannot afford to staff an operations team to monitor so many data feeds, so we will rely on automation and
infrastructure. Google Cloud's machine learning will allow our quantitative researchers to work on our high-
value problems instead of problems with our data pipelines.
MJTelco's Google Cloud Dataflow pipeline is now ready to start receiving data from the 50,000 installations. You
want to allow Cloud Dataflow to scale its compute power up as required. Which Cloud Dataflow pipeline
configuration setting should you update?

• A. The zone

• B. The number of workers

• C. The disk size per worker

• D. The maximum number of workers

For Internal Use Only

#39.
You need to compose visualizations for operations teams with the following requirements:
✑ The report must include telemetry data from all 50,000 installations for the most resent 6 weeks (sampling
once every minute).
✑ The report must not be more than 3 hours delayed from live data.
✑ The actionable report should only show suboptimal links.
✑ Most suboptimal links should be sorted to the top.
✑ Suboptimal links can be grouped and filtered by regional geography.
✑ User response time to load the report must be <5 seconds.
Which approach meets the requirements?

• A. Load the data into Google Sheets, use formulas to calculate a metric, and use filters/sorting to show
only suboptimal links in a table.

• B. Load the data into Google BigQuery tables, write Google Apps Script that queries the data, calculates
the metric, and shows only suboptimal rows in a table in Google Sheets.

• C. Load the data into Google Cloud Datastore tables, write a Google App Engine Application that
queries all rows, applies a function to derive the metric, and then renders results in a table using the
Google charts and visualization API.

For Internal Use Only

• D. Load the data into Google BigQuery tables, write a Google Data Studio 360 report that connects to
your data, calculates a metric, and then uses a filter expression to show only suboptimal rows in a
table.

#40.
You create a new report for your large team in Google Data Studio 360. The report uses Google BigQuery as its
data source. It is company policy to ensure employees can view only the data associated with their region, so
you create and populate a table for each region. You need to enforce the regional access policy to the data.
Which two actions should you take? (Choose two.)

• A. Ensure all the tables are included in global dataset.

• B. Ensure each table is included in a dataset for a region.

• C. Adjust the settings for each table to allow a related region-based security group view access.

• D. Adjust the settings for each view to allow a related region-based security group view access.

• E. Adjust the settings for each dataset to allow a related region-based security group view access.

For Internal Use Only

#41.
MJTelco needs you to create a schema in Google Bigtable that will allow for the historical analysis of the last 2
years of records. Each record that comes in is sent every 15 minutes, and contains a unique identifier of the
device and a data record. The most common query is for all the data for a given device for a given day.
Which schema should you use?

• A. Rowkey: date#device_id Column data: data_point

• B. Rowkey: date Column data: device_id, data_point

• C. Rowkey: device_id Column data: date, data_point

• D. Rowkey: data_point Column data: device_id, date

• E. Rowkey: date#data_point Column data: device_id

Ans: None, it needs to be device_id#date with column data being: data_point

#80.
Actual exam question from Google's Professional Data Engineer

Question #: 80
Topic #: 1

For Internal Use Only

[All Professional Data Engineer Questions]

MJTelco Case Study -

Business Requirements -
✑ Scale up their production environment with minimal cost, instantiating resources when and where needed
in an unpredictable, distributed telecom user community.
✑ Ensure security of their proprietary data to protect their leading-edge machine learning and analysis.
✑ Provide reliable and timely access to data for analysis from distributed research workers
✑ Maintain isolated environments that support rapid iteration of their machine-learning models without
affecting their customers.

Technical Requirements -
Ensure secure and efficient transport and storage of telemetry data
Rapidly scale instances to support between 10,000 and 100,000 data providers with multiple flows each.
Allow analysis and presentation against data tables tracking up to 2 years of data storing approximately 100m
records/day
Support rapid iteration of monitoring infrastructure focused on awareness of data pipeline problems both in
telemetry flows and in production learning cycles.

For Internal Use Only

Because we rely on automation to process our data, we also need our development and test environments to
work as we iterate.

CFO Statement -
The project is too large for us to maintain the hardware and software required for the data and analysis. Also,
we cannot afford to staff an operations team to monitor so many data feeds, so we will rely on automation and
infrastructure. Google Cloud's machine learning will allow our quantitative researchers to work on our high-
value problems instead of problems with our data pipelines.
MJTelco is building a custom interface to share data. They have these requirements:
1. They need to do aggregations over their petabyte-scale datasets.
2. They need to scan specific time range rows with a very fast response time (milliseconds).
Which combination of Google Cloud Platform products should you recommend?

• A. Cloud Datastore and Cloud Bigtable

• B. Cloud Bigtable and Cloud SQL

• C. BigQuery and Cloud Bigtable

• D. BigQuery and Cloud Storage

#81.
The project is too large for us to maintain the hardware and software required for the data and analysis. Also,
we cannot afford to staff an operations team to monitor so many data feeds, so we will rely on automation and
infrastructure. Google Cloud's machine learning will allow our quantitative researchers to work on our high-
value problems instead of problems with our data pipelines.
You need to compose visualization for operations teams with the following requirements:
✑ Telemetry must include data from all 50,000 installations for the most recent 6 weeks (sampling once every
minute)
✑ The report must not be more than 3 hours delayed from live data.
✑ The actionable report should only show suboptimal links.
✑ Most suboptimal links should be sorted to the top.
Suboptimal links can be grouped and filtered by regional geography.

✑ User response time to load the report must be <5 seconds.

You create a data source to store the last 6 weeks of data, and create visualizations that allow viewers to see
multiple date ranges, distinct geographic regions, and unique installation types. You always show the latest
data without any changes to your visualizations. You want to avoid creating and updating new visualizations
each month. What should you do?

Look through the current data and compose a series of charts and tables, one for each possible combination of
criteria.

Look through the current data and compose a small set of generalized charts and tables bound to criteria filters
that allow value selection.

Export the data to a spreadsheet, compose a series of charts and tables, one for each possible combination of
criteria, and spread them across multiple tabs.

For Internal Use Only

Load the data into relational database tables, write a Google App Engine application that queries all rows,
summarizes the data across each criteria, and then renders results using the Google Charts and visualization
API.

#82.
Given the record streams MJTelco is interested in ingesting per day, they are concerned about the cost of
Google BigQuery increasing. MJTelco asks you to provide a design solution. They require a single large data
table called tracking_table. Additionally, they want to minimize the cost of daily queries while performing fine-
grained analysis of each day's events. They also want to use streaming ingestion. What should you do?

Create a table called tracking_table and include a DATE column.

Create a partitioned table called tracking_table and include a TIMESTAMP column.

Create sharded tables for each day following the pattern tracking_table_YYYYMMDD.

Create a table called tracking_table with a TIMESTAMP column to represent the day.

#83.
Flowlogistic's management has determined that the current Apache Kafka servers cannot handle the data
volume for their real-time inventory tracking system.
You need to build a new system on Google Cloud Platform (GCP) that will feed the proprietary tracking
software. The system must be able to ingest data from a variety of global sources, process and query in real-
time, and store the data reliably. Which combination of GCP products should you choose?

Cloud Pub/Sub, Cloud Dataflow, and Cloud Storage

Cloud Pub/Sub, Cloud Dataflow, and Local SSD

Cloud Pub/Sub, Cloud SQL, and Cloud Storage

Cloud Load Balancing, Cloud Dataflow, and Cloud Storage

Cloud Dataflow, Cloud SQL, and Cloud Storage

For Internal Use Only

Stackdriver Monitoring = Cloud Monitoring

For Internal Use Only

For Internal Use Only
For Internal Use Only
For Internal Use Only
For Internal Use Only
#214 not sure if B is correct?

For Internal Use Only

For Internal Use Only
#220. NAT gateway is not secure.

#221. BigQuery Omni allows you to query data in Azure and AWS object stores directly without physically
moving it to BigQuery, reducing data transfer costs and delays. BigLake Tables: Provide a unified view of both
BigQuery tables and external object storage files, enabling seamless querying across multi-cloud data.

#222.
- Dataprep is a serverless, no-code data preparation tool that allows users to visually explore, cleanse, and
prepare data for analysis.
- It's designed for business analysts, data scientists, and others who want to work with data without writing
code.
- Dataprep can directly access and transform data in Cloud Storage, making it a suitable choice for a team that
prefers a low-code, user-friendly solution.

For Internal Use Only

#223.
- Dataform provides a feature called "assertions," which are essentially SQL-based tests that you can define to
verify the quality of your data.
- Assertions in Dataform are a built-in way to perform data quality checks, including checking for uniqueness
and null values in your tables.

#225. Answer C is correct:

#226.
Setting up a perimeter around project A is future proof, the question asks to "ensure that project B and any
future project cannot access data in the project A topic", IAM is not future proof. Reference:
https://cloud.google.com/vpc-service-controls/docs/overview#isolate
p.s: VPC Service Controls is not the same thing as VPC, instead its a security layer on top of a VPC and it should
be used together with IAM, not one or the other (https://cloud.google.com/vpc-service-

For Internal Use Only

controls/docs/overview#how-vpc-service-controls-works)

#228.
DE Another way to replay messages that have been acknowledged is to seek to a timestamp. To seek to a
timestamp, you must first configure the subscription to retain acknowledged messages using retain-acked-
messages. If retain-acked-messages is set, Pub/Sub retains acknowledged messages for 7 days. You only need
to do this step if you intend to seek to a timestamp, not to a snapshot.
https://cloud.google.com/pubsub/docs/replay-message

For Internal Use Only

#229.

For Internal Use Only

#234.
D.
1. Store the historical data in BigQuery for analytics.
2. In a Cloud SQL table, store the last state of the product after every product change.
3. Serve the last state data directly from Cloud SQL to the AP
This approach leverages BigQuery's scalability and efficiency for handling large datasets for analytics. BigQuery
is well-suited for managing the 10 PB of historical product data. Meanwhile, Cloud SQL provides the necessary
performance to handle the API queries with the required low latency. By storing the latest state of each product
in Cloud SQL, you can efficiently handle the high QPS with sub-second latency, which is crucial for the API's
performance. This combination of BigQuery and Cloud SQL offers a balanced solution for both the large-scale
analytics and the high-performance API needs.

For Internal Use Only

#235.
Option C is correct
Because,
This explains why it's not D: maintainable workflow to process hundreds of tables and provide the freshest data
to your end users How is creating a DAG for each of the hundreds of tables maintainable?

For Internal Use Only

- AEAD cryptographic functions in BigQuery allow for encryption and decryption of data at the column level.
- You can encrypt specific data fields using a unique key per user and manage these keys outside of BigQuery
(for example, in your application or using a key management system).
- By "deleting" or revoking access to the key for a specific user, you effectively make their data unreadable,
achieving crypto-deletion.
- This method provides fine-grained encryption control but requires careful key management and integration
with your applications.
https://cloud.google.com/bigquery/docs/aead-encryption-concepts
https://cloud.google.com/bigquery/docs/reference/standard-sql/aead_encryption_functions

For Internal Use Only

For Internal Use Only
1. Use a dual-region Cloud Storage bucket with turbo replication enabled.
2. Monitor Dataflow metrics with Cloud Monitoring to determine when an outage occurs.
3. Seek the subscription back in time by 60 minutes to recover the acknowledged messages.
4. Start the Dataflow job in a secondary region. RPO of 15 minutes is guaranteed when turbo replication is used
https://cloud.google.com/storage/docs/availability-durability

Gemini told me C Here's why it's the best of the limited choices:
Calculates price_per_sqft: It includes the calculation for the target variable your model needs.
Handles Nulls: It uses IFNULL(feature1, 0) to replace nulls in feature1 with 0, similar to COALESCE.
Most Comprehensive: While it excludes the original price, square_feet, and feature1 columns, it still retains any
other columns that might be present in the training_data table.

For Internal Use Only

Option C:
You've just concluded processing data, ending up with clean and prepared data for the model. Now you need to
decide how to split the data for testing and for training. Only afterwards, you can train the model, evaluate it,
fine tune it and, eventually, predict with it

Cloud Data Loss Prevention (DLP) is the best tool for identifying sensitive information such as street
addresses in BigQuery.

A deep inspection job scans all tables in the dataset to detect occurrences of street addresses, using
predefined infoTypes like STREET_ADDRESS.

This method is scalable and works even if the address format varies, unlike simple regex-based queries.

For Internal Use Only

A data mesh is designed to decentralize data ownership, allowing domain teams to manage their own data,
rather than relying on a central data platform team (which was a bottleneck).
✔ Creating one lake per domain (airlines, hotels, ride-hailing) allows domain-specific governance and
autonomy.
✔ Creating separate zones for analytics and data science teams ensures logical data separation while
keeping collaboration within the same domain.
✔ Delegating management of data assets to each domain eliminates the bottleneck, ensuring faster
insights and pipeline updates.

- The table structure shows that the vCPU data is stored in a nested field within the component’s column.
- Using the UNNEST operator to flatten the nested field and apply the filter.

For Internal Use Only

- Autoclass automatically moves objects between storage classes without impacting performance or
availability, nor incurring retrieval costs.
- It continuously optimizes storage costs based on access patterns without the need to set specific lifecycle
management policies.
https://cloud.google.com/storage/docs/autoclass

Format-preserving encryption (FPE) with FFX in Cloud DLP is a strong choice for de-identifying PII like email
addresses. FPE maintains the format of the data and ensures that the same input results in the same encrypted
output consistently. This means the email fields in both datasets can be encrypted to the same value, allowing
for accurate joins in BigQuery while keeping the actual email addresses hidden.

For Internal Use Only

- Setting a retention policy on a Cloud Storage bucket prevents objects from being deleted for the duration of
the retention period.
- Locking the policy makes it immutable, meaning that the retention period cannot be reduced or removed,
thus ensuring that the documents cannot be deleted or overwritten until the retention period expires.

- Private Google Access for services allows VM instances with only internal IP addresses in a VPC network or
on-premises networks (via Cloud VPN or Cloud Interconnect) to reach Google APIs and services.
- When you launch a Dataflow job, you can specify that it should use worker instances without external IP
addresses if Private Google Access is enabled on the subnetwork where these instances are launched.
- This way, your Dataflow workers will be able to access Cloud Storage and BigQuery without violating the
organizational constraint of no external IPs.
ref - https://cloud.google.com/dataflow/docs/guides/routes-firewall

For Internal Use Only

- Datastream is a serverless and easy-to-use change data capture (CDC) and replication service.
- You would create a Datastream service that sources from your Oracle database and targets BigQuery, with
private connectivity configuration to the same VPC.
- This option is designed to minimize the need to manage infrastructure and is a fully managed service.

----------------------------------
#256.
Correct option is B
✔ Option B is the best choice because it:

• Uses Cloud Storage notifications to reactively trigger the DAG.

• Uses a Cloud Function to invoke the Cloud Composer API securely.

• Uses VPC Serverless Access to reach Cloud Composer in a private network.

For Internal Use Only

- Cloud Data Fusion is a fully managed, code-free, GUI-based data integration service that allows you to
visually connect, transform, and move data between various sources and sinks.
- It supports various file formats and can write to Cloud Storage. - You can configure it to use Customer-
Managed Encryption Keys (CMEK) for the buckets where it writes data.

For Internal Use Only

#261.

✔ TPT (Teradata Parallel Transporter) is built for high-performance data transfers.

✔ DTS handles the migration efficiently without requiring extra storage.
✔ Google Cloud specifically recommends using TPT with DTS for migrating Teradata to BigQuery.
✔ It requires the least programming and minimizes infrastructure management.
Even though Option A (JDBC with FastExport) may have received more votes online, it is not the best
choice for this specific scenario.
✔ Option C (BigQuery DTS with TPT) is the best choice for large-scale, efficient Teradata migrations.

- Cloud EKM allows you to use encryption keys managed in external key management systems, including on-
premises HSMs, while using Google Cloud services.
- This means that the key material remains in your control and environment, and Google Cloud services use it
via the Cloud EKM integration.
- This approach aligns with the need to generate and store encryption material only on your on-premises HSM
and is the correct way to integrate such keys with BigQuery.
======
Why not Option C - Cloud HSM is a fully managed service by Google Cloud that provides HSMs for your
cryptographic needs. However, it's a cloud-based solution, and the keys generated or managed in Cloud HSM
are not stored on-premises. This option doesn't align with the requirement to use only on-premises HSM for key
storage.

For Internal Use Only

✔ Option A is correct because using Reshuffle forces Dataflow to separate execution steps, helping
isolate bottlenecks and improve performance.

For Internal Use Only

- Lowest RPO: Time travel offers point-in-time recovery for the past seven days by default, providing the
shortest possible recovery point objective (RPO) among the given options. You can recover data to any state
within that window.
- No Additional Costs: Time travel is a built-in feature of BigQuery, incurring no extra storage or operational
costs.
- Managed Service: BigQuery handles time travel automatically, eliminating manual backup and restore
processes.

For Internal Use Only

- It utilizes Data Catalog's native support for both BigQuery datasets and Pub/Sub topics.
- For PostgreSQL tables running on a Compute Engine instance, you'd use Data Catalog APIs to create custom
entries, as Data Catalog does not automatically discover external databases like PostgreSQL.

For Internal Use Only

- Detailed Investigation of Logs and Jobs Checking for duplicate rows targets the potential immediate cause of
the issue.
- Checking the BigQuery Audit logs helps identify which jobs might be contributing to the increased data
volume.
- Using Cloud Monitoring to correlate job starts with pipeline versions helps identify if a specific version of the
pipeline is responsible.
- Managing multiple versions of pipelines ensures that only the intended version is active, addressing any

For Internal Use Only

versioning errors that might have occurred during deployment.

#272.
Tug of war between C & D,

For Internal Use Only

For Internal Use Only
For Internal Use Only
For Internal Use Only
For Internal Use Only
https://cloud.google.com/dataflow/docs/concepts/streaming-pipelines#watermarks
A watermark is a threshold that indicates when Dataflow expects all of the data in a window to have arrived. If
the watermark has progressed past the end of the window and new data arrives with a timestamp within the
window, the data is considered late data. For more information, see Watermarks and late data in the Apache
Beam documentation.
Dataflow tracks watermarks because of the following reasons:
Data is not guaranteed to arrive in time order or at predictable intervals. Data events are not guaranteed to
appear in pipelines in the same order that they were generated.

For Internal Use Only

For Internal Use Only
For Internal Use Only
For Internal Use Only
For Internal Use Only
For Internal Use Only
For Internal Use Only
Minimize cost. https://cloud.google.com/alloydb?hl=en
AlloyDB offers superior performance, 4x faster than standard PostgreSQL for transactional workloads. That
does not come without cost.

For Internal Use Only

For Internal Use Only
For Internal Use Only
For Internal Use Only
For Internal Use Only
For Internal Use Only
For Internal Use Only
For Internal Use Only

Professional Data Engineer
No ratings yet
Professional Data Engineer
5 pages
Major Presentation
No ratings yet
Major Presentation
20 pages
T GCPPCA A m1 l7 en File 12.en
No ratings yet
T GCPPCA A m1 l7 en File 12.en
32 pages
Designing and Planning A Cloud SA
No ratings yet
Designing and Planning A Cloud SA
32 pages
Cloud Based Engineering Platform-2
No ratings yet
Cloud Based Engineering Platform-2
11 pages
Cloud Bursting: Strategies & Challenges
No ratings yet
Cloud Bursting: Strategies & Challenges
17 pages
GCP Quizs SET 4
No ratings yet
GCP Quizs SET 4
10 pages
Preparing For PCA Workbook
No ratings yet
Preparing For PCA Workbook
87 pages
Eecram v2020-08-26 q62
No ratings yet
Eecram v2020-08-26 q62
60 pages
Aws Dumps
100% (1)
Aws Dumps
168 pages
Google Cloud Architect
No ratings yet
Google Cloud Architect
82 pages
of MIS of Deloitte
No ratings yet
of MIS of Deloitte
15 pages
Preparing For PCA Workbook
No ratings yet
Preparing For PCA Workbook
87 pages
Google Cloud Architect Exam Discussions
No ratings yet
Google Cloud Architect Exam Discussions
23 pages
GCP Quizs SET 3
No ratings yet
GCP Quizs SET 3
10 pages
Systems Analysis and Design 3
No ratings yet
Systems Analysis and Design 3
5 pages
Cloud Data Management Guide
No ratings yet
Cloud Data Management Guide
60 pages
Analytical Questions (Unit-1)
No ratings yet
Analytical Questions (Unit-1)
6 pages
Preparing For PCA Workbook
100% (1)
Preparing For PCA Workbook
87 pages
Google Prep4sure Professional-Cloud-Architect v2018-08-09 by Melissa 43q
No ratings yet
Google Prep4sure Professional-Cloud-Architect v2018-08-09 by Melissa 43q
36 pages
Preparing For PCA Workbook
No ratings yet
Preparing For PCA Workbook
87 pages
Unit-II CC
No ratings yet
Unit-II CC
17 pages
Google - Realtests.professional Cloud Architect.v2020!06!25.by - Oliver.101q
No ratings yet
Google - Realtests.professional Cloud Architect.v2020!06!25.by - Oliver.101q
77 pages
Cloud Computing Thesis Analysis
No ratings yet
Cloud Computing Thesis Analysis
123 pages
Preparing For PCA Workbook
No ratings yet
Preparing For PCA Workbook
87 pages
Univ QP DC Case Study Based Question
No ratings yet
Univ QP DC Case Study Based Question
8 pages
Unit 4
No ratings yet
Unit 4
21 pages
Google - Pass4sure - Professional Cloud Architect - Free.draindumps.2024 Jun 05.by - Osborn.188q.vce
No ratings yet
Google - Pass4sure - Professional Cloud Architect - Free.draindumps.2024 Jun 05.by - Osborn.188q.vce
14 pages
VPR - CC - Unit3.2 - Aneka - CometCloud - TSystems - Workflow - MapReduce - PPSX
No ratings yet
VPR - CC - Unit3.2 - Aneka - CometCloud - TSystems - Workflow - MapReduce - PPSX
67 pages
Fintech's Cloud Transition Analysis
No ratings yet
Fintech's Cloud Transition Analysis
17 pages
Designing Fast Data App Architectures
No ratings yet
Designing Fast Data App Architectures
43 pages
Lpic Devops 701 2
No ratings yet
Lpic Devops 701 2
8 pages
ST Open Source Data Pipelines Oreilly f22568 202003 en PDF
No ratings yet
ST Open Source Data Pipelines Oreilly f22568 202003 en PDF
79 pages
Google: Exam Questions Professional-Cloud-Architect
No ratings yet
Google: Exam Questions Professional-Cloud-Architect
31 pages
Google Certified Professional - Cloud Architect (GCP) - Professional-Cloud-Architect Free Exam Questions (2024) - 5
No ratings yet
Google Certified Professional - Cloud Architect (GCP) - Professional-Cloud-Architect Free Exam Questions (2024) - 5
9 pages
T GCPPCA A m0 l6 en File 6 1.en
No ratings yet
T GCPPCA A m0 l6 en File 6 1.en
86 pages
CCD Prelims
No ratings yet
CCD Prelims
11 pages
AWS Cloud Solutions & Optimization
No ratings yet
AWS Cloud Solutions & Optimization
9 pages
IoT M2M 12 09 23
No ratings yet
IoT M2M 12 09 23
17 pages
CC&BD Unit 2
No ratings yet
CC&BD Unit 2
26 pages
GCP Notes For Certification
No ratings yet
GCP Notes For Certification
24 pages
Scenario-Based Questions On Integrating Data in A Cloud
No ratings yet
Scenario-Based Questions On Integrating Data in A Cloud
17 pages
03 Prep For PCA Designing and Implementing v1.2
No ratings yet
03 Prep For PCA Designing and Implementing v1.2
92 pages
CC1A2
No ratings yet
CC1A2
10 pages
Wso2integrationplatform Visionandroadmapforwebinar 150825094803 Lva1 App6891
No ratings yet
Wso2integrationplatform Visionandroadmapforwebinar 150825094803 Lva1 App6891
56 pages
Final Assessment Complete With Answers All 30
100% (1)
Final Assessment Complete With Answers All 30
6 pages
Screenshot 2024-10-22 at 5.52.08 PM
No ratings yet
Screenshot 2024-10-22 at 5.52.08 PM
6 pages
Chapter 6 - Big Data Architecture Part 1
No ratings yet
Chapter 6 - Big Data Architecture Part 1
41 pages
GCP Sample Questions
No ratings yet
GCP Sample Questions
5 pages
GCP Tech Leap Dumps Latest 2023
No ratings yet
GCP Tech Leap Dumps Latest 2023
147 pages
CC Module 3
No ratings yet
CC Module 3
6 pages
InfiniScaleStorage TAR
No ratings yet
InfiniScaleStorage TAR
57 pages
GCP Quizs SET 2
No ratings yet
GCP Quizs SET 2
11 pages
Fundamentals of Big Data and Business Analytics
No ratings yet
Fundamentals of Big Data and Business Analytics
6 pages
Google - Professional Cloud Architect.v2022 06 30.q216
100% (1)
Google - Professional Cloud Architect.v2022 06 30.q216
147 pages
12.4.unit Iv
No ratings yet
12.4.unit Iv
8 pages
CloudComputing Module2
No ratings yet
CloudComputing Module2
17 pages
CC Da 1
No ratings yet
CC Da 1
5 pages
Architectural Requirements For Cloud Computing Systems
No ratings yet
Architectural Requirements For Cloud Computing Systems
20 pages
Collapse of Compacted Clayey Sand
No ratings yet
Collapse of Compacted Clayey Sand
16 pages
The Seminar of Jacques Lacan X
No ratings yet
The Seminar of Jacques Lacan X
313 pages
Reflexivity in Sustainability Accounting and Management: Transcending The Economic Focus of Corporate Sustainability
No ratings yet
Reflexivity in Sustainability Accounting and Management: Transcending The Economic Focus of Corporate Sustainability
12 pages
Powerpoint Cervantes
100% (1)
Powerpoint Cervantes
19 pages
Instrumentation Tailing Dams
100% (1)
Instrumentation Tailing Dams
257 pages
Similarshapes (Past Papers Questions) PDF
No ratings yet
Similarshapes (Past Papers Questions) PDF
8 pages
Industrial Water Biocide Innovation
No ratings yet
Industrial Water Biocide Innovation
7 pages
Infineon IQDH29NE2LM5 DataSheet v02 00 EN-3367072
No ratings yet
Infineon IQDH29NE2LM5 DataSheet v02 00 EN-3367072
12 pages
Stewart's Strong Ion Theory
No ratings yet
Stewart's Strong Ion Theory
36 pages
VarSet Mini-Catalog (Digital File)
No ratings yet
VarSet Mini-Catalog (Digital File)
24 pages
The Case of The Faltering Factory
No ratings yet
The Case of The Faltering Factory
1 page
Mechanism of Labor
No ratings yet
Mechanism of Labor
22 pages
Using A Fume Hood: Riskware
No ratings yet
Using A Fume Hood: Riskware
7 pages
Quiz Questions
No ratings yet
Quiz Questions
4 pages
Ip 287
100% (6)
Ip 287
5 pages
Studio Camera Operation Guide
No ratings yet
Studio Camera Operation Guide
6 pages
New Years Resolutions
No ratings yet
New Years Resolutions
10 pages
Details of Projects B Pharm and M Pharm Final Year
No ratings yet
Details of Projects B Pharm and M Pharm Final Year
19 pages
02 Topically P3 A4 M10
No ratings yet
02 Topically P3 A4 M10
43 pages
TLE EIM 9-Q1-M10-Parts and Operating Procedures of Hydraulic Crimper
No ratings yet
TLE EIM 9-Q1-M10-Parts and Operating Procedures of Hydraulic Crimper
15 pages
Third Schedule - SCH of Payment For Residential Guideline
No ratings yet
Third Schedule - SCH of Payment For Residential Guideline
1 page
Log-Periodic Antenna Specs
No ratings yet
Log-Periodic Antenna Specs
2 pages
BOC CAO 01-2016 Advance Cargo Declaration Inward Foreign Manifest Consolidated Cargo Manifest Rule
100% (1)
BOC CAO 01-2016 Advance Cargo Declaration Inward Foreign Manifest Consolidated Cargo Manifest Rule
6 pages
XTW2020 1589249333000
No ratings yet
XTW2020 1589249333000
36 pages
ANAKIM
No ratings yet
ANAKIM
5 pages
UPSC Combined GEO-Scientist Exam - Geophysics Part For Geophysicist
No ratings yet
UPSC Combined GEO-Scientist Exam - Geophysics Part For Geophysicist
11 pages
Drainage Details
No ratings yet
Drainage Details
1 page
Design Basis Report - PL & FF
100% (1)
Design Basis Report - PL & FF
17 pages
600lbs Timer Electromagnetic Lock: Description
No ratings yet
600lbs Timer Electromagnetic Lock: Description
2 pages
Lab Report 1
No ratings yet
Lab Report 1
9 pages