Lakehouse for Apache Iceberg manages metadata through the Lakehouse runtime catalog. The system uses the Apache Iceberg REST catalog endpoint to organize data into a strict resource hierarchy. The catalog configuration determines the supported storage types, regional routing behaviors, and query federation options.
Resource hierarchy
The Apache Iceberg REST catalog endpoint uses a hierarchy of resources to organize your data. The following table provides a high-level look at these resources:
| Resource | Description |
|---|---|
| Catalog | The top-level container, a catalog lets you organize namespaces and tables into logical groups by splitting them up into different catalogs. |
| Namespace | A logical grouping used to organize tables within a catalog, this functions like databases, schemas, or directories. |
| Table | Tables contain definitions of rows and columns that can be queried. |
Supported catalog types
When you configure your client, you specify a warehouse location. This choice determines how your catalog operates and integrates with other Google Cloud services. The following table describes supported catalog types:
| Catalog Type | Description |
|---|---|
| Cloud Storage bucket | All data in a catalog is stored in a single Cloud Storage bucket; for data shared across multiple buckets, multiple catalogs are required. |
Warehouse details
Recommended
Cloud Storage bucket warehouse (
gs://): This is the standard approach where the catalog directly manages Apache Iceberg metadata and data files in a Cloud Storage bucket that you specify. This option gives you direct control over your data layout and supports credential vending for fine-grained access control. This lets you create and manage Lakehouse Iceberg REST catalog tables.For example, if you created your bucket to store your catalog and named it
iceberg-bucket, both your catalog name and bucket name areiceberg-bucket. This is used later when you query your catalog in BigQuery, using the P.C.N.T syntax. For examplemy-project.lakehouse-catalog-id.quickstart_namespace.quickstart_table.
Alternative
- BigQuery catalog federation (
bq://): This approach lets you use the Apache Iceberg REST catalog endpoint to manage and query tables that are visible to BigQuery, without needing to create a catalog resource. For more information, see Catalog federation with BigQuery.
Bucket and catalog regions
For Cloud Storage bucket warehouses in Lakehouse runtime catalog, the system selects the catalog region to match the underlying bucket's region:
Single-region buckets: The catalog region matches the bucket region exactly.
Dual-region buckets: Includes predefined and user-defined dual regions, such as
ASIA1andNAM4. The catalog region matches the dual regions.Multi-region buckets: The system selects regional locations for the catalog within the multi-region's geographic domain. By default, these locations might not match common BigQuery locations like
USandEU. Instead, they are regional locations within the geographic domain (for example,us-central1andus-east4for aUSmulti-region bucket).
When BigQuery runs a query over tables in these catalogs,
BigQuery routes the query to the region in the catalog's primary
region. If you run a query in a specific virtual region (like US or EU) and
the catalog metadata isn't present in that location, the query might fail.
Specify primary regions for US and EU multi-regions
For catalogs that use a US or EU multi-region bucket, you can specify the
primary region when you create the catalog to ensure that
BigQuery can access it from the corresponding regions.
- Cloud Storage EU multi-region: Specify
EUoreurope-west4. - Cloud Storage US multi-region: Specify
USorus-central1.
The system selects a catalog's primary replica when you create it, but you can
dynamically update it by calling FailoverCatalog. For more information about
defining primary locations, see Create a catalog.
Querying catalogs
When querying Lakehouse runtime catalog tables from BigQuery, you use a four-part naming structure, often referred to as P.C.N.T:
- Project: The Google Cloud project ID that owns the catalog.
- Catalog: The name of the Lakehouse runtime catalog.
- Namespace: The Apache Iceberg namespace (equivalent to a BigQuery dataset).
- Table: The name of the table.
For example, my-project.lakehouse-catalog-id.my-namespace.my-table.
Catalog federation with BigQuery
You can use the Apache Iceberg REST catalog endpoint interface to manage and query external tables through BigQuery catalog federation. Instead of creating a dedicated catalog resource, BigQuery acts as a federation gateway. This lets you route requests through BigQuery to interact with your external catalogs in any project where the BigQuery API is enabled. This lets you:
- Create and manage external Apache Iceberg tables through BigQuery.
- Query Lakehouse Iceberg REST catalog tables using the BigQuery Apache Iceberg REST catalog endpoint.
Because BigQuery federates access to these external resources, you must have the applicable required permissions configured within BigQuery. Note that credential vending is not supported for federated catalogs.
To enable federation, see Use catalog federation with BigQuery.