Hi
I'm using BigQuery to automatically import data from an GWS. The external system doesn't allow me to configure the data import fields, so I'm receiving a lot of unnecessary columns in BigQuery.
I'm looking for a way to exclude specific columns from the data imported into BigQuery. Is there a way to do this using BigQuery's built-in features? Or would I need to use a third-party tool?
Any advice would be greatly appreciated.
Thanks,
There is no built-in feature in BigQuery to exclude specific columns during data import. However, there are a few workarounds that you can use:
1. Use a regular table
You can load the data into a regular table first, and then select the required columns from that table and populate your final table. This can be done using the following steps:
CREATE TABLE my_temp_table
AS
SELECT *
FROM `[my-project-id].my-dataset.my-external-table`;
CREATE OR REPLACE TABLE my_final_table
AS
SELECT column_1, column_2, ...
FROM my_temp_table;
2. Use a data transformation pipeline
You can also use a data transformation pipeline to exclude specific columns from the imported data. This can be done using a variety of tools, such as Cloud Data Fusion, Cloud Dataproc, or Apache Spark.
For example, you could use Cloud Data Fusion to create a pipeline that performs the following steps:
3. Use a third-party tool
There are also a number of third-party tools that can be used to exclude specific columns from BigQuery data imports. Some examples include Talend, Fivetran, Stitch, and Alooma (now part of Google Cloud).
Which workaround is best for you will depend on your specific needs and requirements. If you only need to exclude a small number of columns, then using a regular table may be the simplest solution. However, if you need to exclude a large number of columns, or if you need to perform other data transformations, then using a data transformation pipeline or a third-party tool may be a better option.
Additional notes:
Thanks
I am still working on the issue, but I am hopeful that your suggestions will help me resolve it. I will let you know how things go.
I appreciate that you took the time to explain the advantages and disadvantages of each suggestion.
Creating a new table will increase the cost, so I would like to take care of that and consider how to deal with it.
Thank you again for your advice.
Hi @nawakashii unfortunately, when using BigQuery external tables, there’s no built-in way to exclude specific columns during the import — it always brings in the full schema from your source (in this case, GWS data). That said, there are a few workarounds you can try:
Here Are Some Options:
Create a View in BigQuery
Once your data is imported, you can create a view that only includes the columns you actually need. This helps simplify your dataset without duplicating the data.
Use a Scheduled Query to Write to a Clean Table
If you work with the data frequently, you might want to set up a scheduled query that writes just the relevant columns into a new table. It’s efficient and keeps things tidy.
Consider Using an External Tool
If your data sources are varied or tend to change often, using a tool like Windsor.ai might help. It can automate the process of connecting, filtering, and loading data into BigQuery , giving you more flexibility over what fields to include
Hope this helps!