Manage data assets in a lake

This page explains how to add, upgrade, and remove Cloud Storage buckets and BigQuery datasets as assets in existing Dataplex Universal Catalog zones.

Overview

An asset maps to data stored in either Cloud Storage or BigQuery. You can map data stored in separate Google Cloud projects as assets into a single zone within a lake. You can attach existing Cloud Storage buckets or BigQuery datasets to be managed from within the lake.

Before you begin

  • If you haven't already, create a lake and a zone in that lake.

  • Most gcloud lakes commands require a location. You can specify the location by using the --location flag.

Required roles and permissions

To manage assets in Dataplex Universal Catalog, permissions are required for users to perform management actions, and for the Dataplex Universal Catalog service account to access underlying resources.

To add or remove assets, users must be granted IAM roles that contain the necessary permissions (such as dataplex.assets.create and dataplex.assets.delete). The predefined roles Dataplex Admin (roles/dataplex.admin) and Dataplex Editor (roles/dataplex.editor), or the legacy roles Owner (roles/owner) and Editor (roles/editor), include these permissions.

When you attach a resource (Cloud Storage bucket or BigQuery dataset) to a lake, Dataplex Universal Catalog uses its service account to interact with that resource.

  • If the resource is in the same project as the lake, permissions are granted implicitly to the service account.
  • If the resource is in a different project than the lake, you must explicitly grant the service account permissions to access that resource, as described in the following sections.

For more information, see Dataplex Universal Catalog IAM and access control.

Grant roles for Cloud Storage buckets

To attach a Cloud Storage bucket from another project, grant the Dataplex Universal Catalog service account (service-PROJECT_NUMBER@gcp-sa-dataplex.iam.gserviceaccount.com) permissions in one of the following ways:

  • Project-level permissions: grant the roles/dataplex.serviceAgent role to the service account on the project that contains the bucket. This provides Dataplex Universal Catalog with administrator permissions on all buckets in the project, which allows it to manage permissions on attached buckets.

  • Bucket-level permissions: for more granular control, use the gcloud dataplex lakes authorize command to grant the service account the necessary permissions on only a specific bucket.

Grant roles for BigQuery datasets

To attach a BigQuery dataset from another project, grant the Dataplex Universal Catalog service account the BigQuery Administrator role (roles/bigquery.admin) on the dataset.

VPC Service Controls considerations

Dataplex Universal Catalog doesn't violate VPC Service Controls perimeters. Before adding an asset to the lake, make sure that the underlying bucket or dataset is in the same VPC Service Controls network as the lake.

For more information, See VPC Service Controls with Dataplex Universal Catalog.

Add an asset

If there is no overlap between the Dataplex Universal Catalog lake region and one of the Cloud Storage buckets region, you can't add the bucket to a zone in your lake.

To learn more about the region location of a Cloud Storage asset and how Dataplex Universal Catalog handles the location of a bucket when creating the publishing dataset, see Regional resources.

To add an asset, follow these steps:

Console

  1. In the Google Cloud console, go to the Dataplex Universal Catalog Lakes page.

    Go to Lakes

  2. Click the lake to which you want to add a Cloud Storage bucket or BigQuery dataset. The lake page opens.

  3. On the Zones tab, click the name of the data zone to which you want to add the asset. The Data zone page for that data zone opens.

  4. On the Assets tab, click + Add Assets. The Add assets page opens.

  5. Click Add an Asset.

  6. In the Type field, and select either BigQuery dataset or Cloud Storage bucket.

  7. In the Display name field, enter a name for the new asset.

  8. In the ID field, enter a unique ID for the asset.

  9. Optional: Enter a Description.

  10. In the Dataset or Bucket field (based on the type of your asset), click Browse to find and select your Cloud Storage bucket or BigQuery dataset.

  11. Optional: If your asset type is Cloud Storage bucket and if you want Dataplex Universal Catalog to manage the asset, then select the Upgrade to Managed checkbox. If you choose this option, you don't have to upgrade the asset separately. This option isn't available for BigQuery datasets.

  12. Click Continue.

  13. Choose the rest of the parameter values. For more information about security settings, see Lake security.

  14. Click Submit.

  15. Verify that you have returned to the data zone page, and that your new asset appears in the assets list.

REST

To add an asset, use the lakes.zones.assets.create method.

When the addition succeeds, the data zone automatically enters active state. If it fails, then the data zone is rolled back to its previous healthy state.

Upgrade a Cloud Storage bucket asset

When you add an asset of type Cloud Storage bucket, Dataplex Universal Catalog automatically publishes BigQuery external tables for the tables hosted in the asset.

When you upgrade a Cloud Storage bucket asset, Dataplex Universal Catalog removes the attached external tables and creates BigLake tables. BigLake tables support better fine-grained security, including row-level, column-level, and dynamic data masking.

To upgrade a Cloud Storage bucket asset, follow these steps:

Console

  1. In the Google Cloud console, go to the Dataplex Universal Catalog Lakes page.

    Go to Lakes

  2. Click the name of the lake. The lake page opens.

  3. On the Zones tab, click the name of the data zone. The data zone page opens.

  4. On the Assets tab, click the name of the asset that you want to upgrade.

  5. Click Upgrade to Managed.

REST

To upgrade a bucket asset, use the lakes.zones.assets.patch method.

Downgrade a Cloud Storage bucket asset

When you downgrade a Cloud Storage bucket asset, Dataplex Universal Catalog removes the attached BigLake tables and creates external tables.

Console

  1. In the Google Cloud console, go to the Dataplex Universal Catalog Lakes page.

    Go to Lakes

  2. Click the name of the lake. The lake page opens.

  3. On the Zones tab, click the name of the data zone. The data zone page opens.

  4. On the Assets tab, click the name of the asset that you want to upgrade.

  5. Click Downgrade from Managed.

REST

To downgrade a bucket asset, use the lakes.zones.assets.patch method. Make sure that you set the readAccessMode field to DIRECT in ResourceSpec.

Remove an asset

Remove the asset from the data zone or lake before attaching it to a different one.

To remove an asset, follow these steps:

Console

  1. In the Google Cloud console, go to the Dataplex Universal Catalog Lakes page.

    Go to Lakes

  2. Click the lake from which you want to remove a Cloud Storage bucket or BigQuery dataset. The lake page for that lake opens.

  3. On the Zones tab, click the name of the data zone you want to remove the Cloud Storage bucket or BigQuery dataset from. The Data zone page for that data zone opens.

  4. On the Assets tab, select the asset by checking the box to the left of the asset name.

  5. Click Delete Asset.

  6. On the confirmation dialog, click Delete.

REST

To remove a bucket, use the lakes.zones,assets.delete method.

What's next