Skip to content

fix: read_gbq_table respects primary keys even when filters are set #689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 16, 2024

Conversation

tswast
Copy link
Collaborator

@tswast tswast commented May 14, 2024

Closes internal issues 338039517 (primary key inconsistency), 338037499 (LIMIT for max_results), 340540991 (avoid running query immediately if time travel is supported), 337925142 (push down column filters to when we create the time travel subquery).

feat: read_gbq_query supports filters
perf: use a LIMIT clause when max_results is set
perf: don't run query immediately from read_gbq_table if filters is set

🦕

Closes internal issues 338039517 (primary key inconsistency), 338037499
(LIMIT for max_results), 340540991 (avoid running query immediately if
time travel is supported), 337925142 (push down column filters to when
we create the time travel subquery).

feat: `read_gbq_query` supports `filters`
perf: use a `LIMIT` clause when `max_results` is set
perf: don't run query immediately from `read_gbq_table` if `filters` is set
@tswast tswast requested review from a team as code owners May 14, 2024 19:17
@tswast tswast requested a review from TrevorBergeron May 14, 2024 19:17
@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels May 14, 2024
Copy link
Contributor

@TrevorBergeron TrevorBergeron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly looks good, seems we've lost some validation on unknown column names though?

index_cols=index_cols,
columns=columns,
filters=filters,
time_travel_timestamp=time_travel_timestamp,
)

for key in columns:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TrevorBergeron Which "validation for unknown column names'" are we missing? I still see this here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you like me to move it to before we create the ibis expression? It's possible ibis will fail with a dry run error when we try to access the columns in it now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see tests/system/small/test_session.py::test_read_gbq_w_columns[unknown_col] failure now. Likely this is the issue. I'll move this validation to after we tech the table metadata.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 87365db

@tswast tswast requested a review from TrevorBergeron May 16, 2024 16:39
@tswast tswast enabled auto-merge (squash) May 16, 2024 17:29
@tswast tswast merged commit 9386373 into main May 16, 2024
20 of 21 checks passed
@tswast tswast deleted the b338039517-read_gbq_table-primary_keys branch May 16, 2024 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants