Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

How to extract data for bigqueries

Hi

I'm using BigQuery to automatically import data from an GWS. The external system doesn't allow me to configure the data import fields, so I'm receiving a lot of unnecessary columns in BigQuery.

I'm looking for a way to exclude specific columns from the data imported into BigQuery. Is there a way to do this using BigQuery's built-in features? Or would I need to use a third-party tool?

Any advice would be greatly appreciated.

Thanks,

0 3 303
3 REPLIES 3

There is no built-in feature in BigQuery to exclude specific columns during data import. However, there are a few workarounds that you can use:

1. Use a regular table

You can load the data into a regular table first, and then select the required columns from that table and populate your final table. This can be done using the following steps:

  1. Create a regular table to store the imported data:
CREATE TABLE my_temp_table
AS
SELECT *
FROM `[my-project-id].my-dataset.my-external-table`;
  1. Select the required columns from the temporary table and populate your final table:
CREATE OR REPLACE TABLE my_final_table
AS
SELECT column_1, column_2, ...
FROM my_temp_table;

2. Use a data transformation pipeline

You can also use a data transformation pipeline to exclude specific columns from the imported data. This can be done using a variety of tools, such as Cloud Data Fusion, Cloud Dataproc, or Apache Spark.

For example, you could use Cloud Data Fusion to create a pipeline that performs the following steps:

  1. Reads the data from the external system.
  2. Excludes the unwanted columns.
  3. Writes the data to BigQuery.

3. Use a third-party tool

There are also a number of third-party tools that can be used to exclude specific columns from BigQuery data imports. Some examples include Talend, Fivetran, Stitch, and Alooma (now part of Google Cloud).

Which workaround is best for you will depend on your specific needs and requirements. If you only need to exclude a small number of columns, then using a regular table may be the simplest solution. However, if you need to exclude a large number of columns, or if you need to perform other data transformations, then using a data transformation pipeline or a third-party tool may be a better option.

Additional notes:

  • BigQuery automatically creates temporary tables when you run a query, so you don’t necessarily have to create one manually.
  • When using a third-party tool, be sure to check the latest documentation for the most up-to-date information on how to exclude specific columns from BigQuery data imports.

Thanks

I am still working on the issue, but I am hopeful that your suggestions will help me resolve it. I will let you know how things go.

I appreciate that you took the time to explain the advantages and disadvantages of each suggestion.

Creating a new table will increase the cost, so I would like to take care of that and consider how to deal with it.

Thank you again for your advice.

 

Hi @nawakashii unfortunately, when using BigQuery external tables, there’s no built-in way to exclude specific columns during the import — it always brings in the full schema from your source (in this case, GWS data). That said, there are a few workarounds you can try:

Here Are Some Options:

Create a View in BigQuery

Once your data is imported, you can create a view that only includes the columns you actually need. This helps simplify your dataset without duplicating the data.

CREATE OR REPLACE VIEW dataset.clean_data AS
SELECT col1, col2, col3
FROM dataset.raw_import;

Use a Scheduled Query to Write to a Clean Table

If you work with the data frequently, you might want to set up a scheduled query that writes just the relevant columns into a new table. It’s efficient and keeps things tidy.

Consider Using an External Tool

If your data sources are varied or tend to change often, using a tool like Windsor.ai might help. It can automate the process of connecting, filtering, and loading data into BigQuery , giving you more flexibility over what fields to include

Hope this helps!