Skip to content

fix: use random table when loading data for read_csv, read_json, read_parquet #175

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 9, 2023

Conversation

ashleyxuu
Copy link
Contributor

@ashleyxuu ashleyxuu commented Nov 6, 2023

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@ashleyxuu ashleyxuu requested review from a team as code owners November 6, 2023 19:38
@ashleyxuu ashleyxuu requested review from shobsi and tswast November 6, 2023 19:38
@product-auto-label product-auto-label bot added size: xs Pull request size is extra small. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Nov 6, 2023
@ashleyxuu ashleyxuu changed the title fix: use random table when loading data for read_csv, read_json, read… fix: use random table when loading data for read_csv, read_json, read_parquet Nov 6, 2023
Copy link
Contributor

@shobsi shobsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's take Tim's LGTM too before submitting

Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait until after Trevor's PR so that we can make sure to cluster by the correct columns.

@ashleyxuu ashleyxuu requested a review from tswast November 8, 2023 18:54
Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure the tables we create are clustered by the native ordering (in cases where we preserve it) and/or the index.

Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in chat, we'll update this to do clustering in a future change.

@tswast tswast added the automerge Merge the pull request once unit tests and other checks pass. label Nov 9, 2023
Copy link

Merge-on-green attempted to merge your PR for 6 hours, but it was not mergeable because either one of your required status checks failed, one of your required reviews was not approved, or there is a do not merge label. Learn more about your required status checks here: https://help.github.com/en/github/administering-a-repository/enabling-required-status-checks. You can remove and reapply the label to re-run the bot.

@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Nov 9, 2023
@ashleyxuu ashleyxuu added kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Nov 9, 2023
@bigframes-bot bigframes-bot removed kokoro:run Add this label to force Kokoro to re-run the tests. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Nov 9, 2023
@tswast tswast added the automerge Merge the pull request once unit tests and other checks pass. label Nov 9, 2023
@gcf-merge-on-green gcf-merge-on-green bot merged commit 9d2e6dc into main Nov 9, 2023
@gcf-merge-on-green gcf-merge-on-green bot deleted the b309108907-update-csv-parquet-json-table branch November 9, 2023 23:00
@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Nov 9, 2023
@@ -1156,7 +1156,7 @@ def read_parquet(
# Note: "engine" is omitted because it is redundant. Loading a table
# from a pandas DataFrame will just create another parquet file + load
# job anyway.
table = bigquery.Table(self._create_session_table())
table = bigframes_io.random_table(self._anonymous_dataset)

job_config = bigquery.LoadJobConfig()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We forgot to set the table expiration. I can follow-up with a PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gcf-merge-on-green bot pushed a commit that referenced this pull request Nov 15, 2023
🤖 I have created a release *beep* *boop*
---


## [0.14.0](https://togithub.com/googleapis/python-bigquery-dataframes/compare/v0.13.0...v0.14.0) (2023-11-14)


### Features

* Add 'cross' join support ([#176](https://togithub.com/googleapis/python-bigquery-dataframes/issues/176)) ([765446a](https://togithub.com/googleapis/python-bigquery-dataframes/commit/765446a929abe1ac076c3037afa7892f64105356))
* Add 'index', 'pad', 'nearest' interpolate methods ([#162](https://togithub.com/googleapis/python-bigquery-dataframes/issues/162)) ([6a28403](https://togithub.com/googleapis/python-bigquery-dataframes/commit/6a2840349a23035bdfdabacd1e231b41bbb5ed7a))
* Add series.sample (identical to existing dataframe.sample) ([#187](https://togithub.com/googleapis/python-bigquery-dataframes/issues/187)) ([37914a4](https://togithub.com/googleapis/python-bigquery-dataframes/commit/37914a4077c681881491f5c36d1a9c9f4255e18f))
* Add unordered sql compilation ([#156](https://togithub.com/googleapis/python-bigquery-dataframes/issues/156)) ([58f420c](https://togithub.com/googleapis/python-bigquery-dataframes/commit/58f420c91d94ca085e9810f36513ffe772bfddcf))
* Log most recent API calls as `recent-bigframes-api-xx` labels on BigQuery jobs ([#145](https://togithub.com/googleapis/python-bigquery-dataframes/issues/145)) ([4ea33b7](https://togithub.com/googleapis/python-bigquery-dataframes/commit/4ea33b7433532ae3a386a6ffa9eb57360ea39526))
* Read_gbq creates order deterministically without table copy ([#191](https://togithub.com/googleapis/python-bigquery-dataframes/issues/191)) ([8ab81de](https://togithub.com/googleapis/python-bigquery-dataframes/commit/8ab81dee4d0eee499094f2dd576550f0c59d7551))
* Support `date_series.astype("string[pyarrow]")` to cast DATE to STRING ([#186](https://togithub.com/googleapis/python-bigquery-dataframes/issues/186)) ([aee0e8e](https://togithub.com/googleapis/python-bigquery-dataframes/commit/aee0e8e2518c59bd1e0b07940c3309871fde8899))
* Support `series.at[row_label] = scalar` ([#173](https://togithub.com/googleapis/python-bigquery-dataframes/issues/173)) ([0c8bd33](https://togithub.com/googleapis/python-bigquery-dataframes/commit/0c8bd33806bb99206b8b12dbdf7d7485c6ffb759))
* Temporary resources no longer use BigQuery Sessions ([#194](https://togithub.com/googleapis/python-bigquery-dataframes/issues/194)) ([4a02cac](https://togithub.com/googleapis/python-bigquery-dataframes/commit/4a02cac88c7d7b46bed1fa813a862fc2ef9ef084))


### Bug Fixes

* All sort operation are now stable ([#195](https://togithub.com/googleapis/python-bigquery-dataframes/issues/195)) ([3a2761f](https://togithub.com/googleapis/python-bigquery-dataframes/commit/3a2761f3c38d0de8b8eda47fffa15b8412aa84b0))
* Default to 7 days expiration for `read_csv`, `read_json`, `read_parquet` ([#193](https://togithub.com/googleapis/python-bigquery-dataframes/issues/193)) ([03606cd](https://togithub.com/googleapis/python-bigquery-dataframes/commit/03606cda30eb7645bfd4534460112dcca56b0ab0))
* Deprecate the `remote_service_type` in llm model ([#180](https://togithub.com/googleapis/python-bigquery-dataframes/issues/180)) ([a8a409a](https://togithub.com/googleapis/python-bigquery-dataframes/commit/a8a409ab0bd1f99dfb442df0703bf8786e0fe58e))
* For reset_index on unnamed multiindex, always use level_[n] label ([#182](https://togithub.com/googleapis/python-bigquery-dataframes/issues/182)) ([f95000d](https://togithub.com/googleapis/python-bigquery-dataframes/commit/f95000d3f88662be4d88c8b0152f1b838e99ec55))
* Match pandas behavior when assigning listlike to empty dfs ([#172](https://togithub.com/googleapis/python-bigquery-dataframes/issues/172)) ([c1d1f42](https://togithub.com/googleapis/python-bigquery-dataframes/commit/c1d1f42a21cc089877f79ebb46a39ddef6958e04))
* Use anonymous dataset instead of session dataset for temp tables ([#181](https://togithub.com/googleapis/python-bigquery-dataframes/issues/181)) ([800d44e](https://togithub.com/googleapis/python-bigquery-dataframes/commit/800d44eb5eb77da5d87b2e005f5a2ed53842e7b5))
* Use random table for `read_pandas` ([#192](https://togithub.com/googleapis/python-bigquery-dataframes/issues/192)) ([741c75e](https://togithub.com/googleapis/python-bigquery-dataframes/commit/741c75e5797e26a1487ff3da76a07953d9537f3f))
* Use random table when loading data for `read_csv`, `read_json`, `read_parquet` ([#175](https://togithub.com/googleapis/python-bigquery-dataframes/issues/175)) ([9d2e6dc](https://togithub.com/googleapis/python-bigquery-dataframes/commit/9d2e6dc1ae4e11e80da4aabe0daa3a6044137cc6))


### Documentation

* Add code samples for `read_gbq_function` using community UDFs ([#188](https://togithub.com/googleapis/python-bigquery-dataframes/issues/188)) ([7506eab](https://togithub.com/googleapis/python-bigquery-dataframes/commit/7506eabf2e58159507809e36abfe90c417dfe92f))
* Add docstring code samples for `Series.apply` and `DataFrame.map` ([#185](https://togithub.com/googleapis/python-bigquery-dataframes/issues/185)) ([c816d84](https://togithub.com/googleapis/python-bigquery-dataframes/commit/c816d843e6f3c5a944cd4395ed0e1e91cec49812))
* Add llm kmeans notebook as an included example ([#177](https://togithub.com/googleapis/python-bigquery-dataframes/issues/177)) ([d49ae42](https://togithub.com/googleapis/python-bigquery-dataframes/commit/d49ae42a379fafd601cc94227e7f8f14b3d5f8c3))
* Use `head()` to get top `n` results, not to preview results ([#190](https://togithub.com/googleapis/python-bigquery-dataframes/issues/190)) ([87f84c9](https://togithub.com/googleapis/python-bigquery-dataframes/commit/87f84c9e58e7d0ea521ac386c9f02791cdddd19f))

---
This PR was generated with [Release Please](https://togithub.com/googleapis/release-please). See [documentation](https://togithub.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: xs Pull request size is extra small.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants