Skip to content

feat: temporary resources no longer use BigQuery Sessions #194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Nov 14, 2023

Conversation

tswast
Copy link
Collaborator

@tswast tswast commented Nov 10, 2023

This allows multiple queries to run in parallel.

Draft because this is blocked by:

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes internal issue 303953103
🦕

This allows multiple queries to run in parallel.
@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Nov 10, 2023
@tswast tswast marked this pull request as ready for review November 13, 2023 20:42
@tswast tswast requested review from a team as code owners November 13, 2023 20:42
@tswast tswast requested a review from Genesis929 November 13, 2023 20:42
@tswast
Copy link
Collaborator Author

tswast commented Nov 14, 2023

The following tests failed:

FAILED tests/system/large/ml/test_ensemble.py::test_randomforestclassifier_default_params
FAILED tests/system/large/ml/test_ensemble.py::test_randomforestclassifier_multiple_params
FAILED tests/system/large/ml/test_ensemble.py::test_xgbregressor_dart_booster_multiple_params

I'll investigate this morning.

@tswast tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 14, 2023
@tswast tswast requested review from GarrettWu and removed request for Genesis929 November 14, 2023 15:49
@bigframes-bot bigframes-bot removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 14, 2023
@tswast
Copy link
Collaborator Author

tswast commented Nov 14, 2023

@GarrettWu Please take a look, as this modifies the bigframes.ml module to use anonymous dataset instead of session model.

options={"option_key1": "option_value1", "option_key2": 2},
)
assert (
sql
== """CREATE TEMP MODEL `my_model_id`
== """CREATE MODEL `a-project`.`a-dataset`.`my_model_id`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be more consistent? Seeing test-prj, testprj, a-project, etc. If really want to give distinct names, we can use more meaning full names like test-project-imported.

For dataset names too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm intentionally choosing different projects here so that we know that we're creating in the dataset specified, not something hardcoded (not that would be likely, but still...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# Model create and alter
def create_model(
self,
source_df: bpd.DataFrame,
dataset: google.cloud.bigquery.DatasetReference,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the ModelCreationtSqlGenerator is specific to one model entity, the dataset should be the same as the model_id, as private members of the Generator. Which makes it easier to inject and test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean?

Honestly, I found the global state of model_id very concerning.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to take a ModelReference instead, avoiding thread safety issues.

@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Nov 14, 2023
@tswast tswast requested a review from GarrettWu November 14, 2023 17:01
@tswast tswast merged commit 4a02cac into main Nov 14, 2023
@tswast tswast deleted the b303953103-remove-session-from-session branch November 14, 2023 17:47
gcf-merge-on-green bot pushed a commit that referenced this pull request Nov 15, 2023
🤖 I have created a release *beep* *boop*
---


## [0.14.0](https://togithub.com/googleapis/python-bigquery-dataframes/compare/v0.13.0...v0.14.0) (2023-11-14)


### Features

* Add 'cross' join support ([#176](https://togithub.com/googleapis/python-bigquery-dataframes/issues/176)) ([765446a](https://togithub.com/googleapis/python-bigquery-dataframes/commit/765446a929abe1ac076c3037afa7892f64105356))
* Add 'index', 'pad', 'nearest' interpolate methods ([#162](https://togithub.com/googleapis/python-bigquery-dataframes/issues/162)) ([6a28403](https://togithub.com/googleapis/python-bigquery-dataframes/commit/6a2840349a23035bdfdabacd1e231b41bbb5ed7a))
* Add series.sample (identical to existing dataframe.sample) ([#187](https://togithub.com/googleapis/python-bigquery-dataframes/issues/187)) ([37914a4](https://togithub.com/googleapis/python-bigquery-dataframes/commit/37914a4077c681881491f5c36d1a9c9f4255e18f))
* Add unordered sql compilation ([#156](https://togithub.com/googleapis/python-bigquery-dataframes/issues/156)) ([58f420c](https://togithub.com/googleapis/python-bigquery-dataframes/commit/58f420c91d94ca085e9810f36513ffe772bfddcf))
* Log most recent API calls as `recent-bigframes-api-xx` labels on BigQuery jobs ([#145](https://togithub.com/googleapis/python-bigquery-dataframes/issues/145)) ([4ea33b7](https://togithub.com/googleapis/python-bigquery-dataframes/commit/4ea33b7433532ae3a386a6ffa9eb57360ea39526))
* Read_gbq creates order deterministically without table copy ([#191](https://togithub.com/googleapis/python-bigquery-dataframes/issues/191)) ([8ab81de](https://togithub.com/googleapis/python-bigquery-dataframes/commit/8ab81dee4d0eee499094f2dd576550f0c59d7551))
* Support `date_series.astype("string[pyarrow]")` to cast DATE to STRING ([#186](https://togithub.com/googleapis/python-bigquery-dataframes/issues/186)) ([aee0e8e](https://togithub.com/googleapis/python-bigquery-dataframes/commit/aee0e8e2518c59bd1e0b07940c3309871fde8899))
* Support `series.at[row_label] = scalar` ([#173](https://togithub.com/googleapis/python-bigquery-dataframes/issues/173)) ([0c8bd33](https://togithub.com/googleapis/python-bigquery-dataframes/commit/0c8bd33806bb99206b8b12dbdf7d7485c6ffb759))
* Temporary resources no longer use BigQuery Sessions ([#194](https://togithub.com/googleapis/python-bigquery-dataframes/issues/194)) ([4a02cac](https://togithub.com/googleapis/python-bigquery-dataframes/commit/4a02cac88c7d7b46bed1fa813a862fc2ef9ef084))


### Bug Fixes

* All sort operation are now stable ([#195](https://togithub.com/googleapis/python-bigquery-dataframes/issues/195)) ([3a2761f](https://togithub.com/googleapis/python-bigquery-dataframes/commit/3a2761f3c38d0de8b8eda47fffa15b8412aa84b0))
* Default to 7 days expiration for `read_csv`, `read_json`, `read_parquet` ([#193](https://togithub.com/googleapis/python-bigquery-dataframes/issues/193)) ([03606cd](https://togithub.com/googleapis/python-bigquery-dataframes/commit/03606cda30eb7645bfd4534460112dcca56b0ab0))
* Deprecate the `remote_service_type` in llm model ([#180](https://togithub.com/googleapis/python-bigquery-dataframes/issues/180)) ([a8a409a](https://togithub.com/googleapis/python-bigquery-dataframes/commit/a8a409ab0bd1f99dfb442df0703bf8786e0fe58e))
* For reset_index on unnamed multiindex, always use level_[n] label ([#182](https://togithub.com/googleapis/python-bigquery-dataframes/issues/182)) ([f95000d](https://togithub.com/googleapis/python-bigquery-dataframes/commit/f95000d3f88662be4d88c8b0152f1b838e99ec55))
* Match pandas behavior when assigning listlike to empty dfs ([#172](https://togithub.com/googleapis/python-bigquery-dataframes/issues/172)) ([c1d1f42](https://togithub.com/googleapis/python-bigquery-dataframes/commit/c1d1f42a21cc089877f79ebb46a39ddef6958e04))
* Use anonymous dataset instead of session dataset for temp tables ([#181](https://togithub.com/googleapis/python-bigquery-dataframes/issues/181)) ([800d44e](https://togithub.com/googleapis/python-bigquery-dataframes/commit/800d44eb5eb77da5d87b2e005f5a2ed53842e7b5))
* Use random table for `read_pandas` ([#192](https://togithub.com/googleapis/python-bigquery-dataframes/issues/192)) ([741c75e](https://togithub.com/googleapis/python-bigquery-dataframes/commit/741c75e5797e26a1487ff3da76a07953d9537f3f))
* Use random table when loading data for `read_csv`, `read_json`, `read_parquet` ([#175](https://togithub.com/googleapis/python-bigquery-dataframes/issues/175)) ([9d2e6dc](https://togithub.com/googleapis/python-bigquery-dataframes/commit/9d2e6dc1ae4e11e80da4aabe0daa3a6044137cc6))


### Documentation

* Add code samples for `read_gbq_function` using community UDFs ([#188](https://togithub.com/googleapis/python-bigquery-dataframes/issues/188)) ([7506eab](https://togithub.com/googleapis/python-bigquery-dataframes/commit/7506eabf2e58159507809e36abfe90c417dfe92f))
* Add docstring code samples for `Series.apply` and `DataFrame.map` ([#185](https://togithub.com/googleapis/python-bigquery-dataframes/issues/185)) ([c816d84](https://togithub.com/googleapis/python-bigquery-dataframes/commit/c816d843e6f3c5a944cd4395ed0e1e91cec49812))
* Add llm kmeans notebook as an included example ([#177](https://togithub.com/googleapis/python-bigquery-dataframes/issues/177)) ([d49ae42](https://togithub.com/googleapis/python-bigquery-dataframes/commit/d49ae42a379fafd601cc94227e7f8f14b3d5f8c3))
* Use `head()` to get top `n` results, not to preview results ([#190](https://togithub.com/googleapis/python-bigquery-dataframes/issues/190)) ([87f84c9](https://togithub.com/googleapis/python-bigquery-dataframes/commit/87f84c9e58e7d0ea521ac386c9f02791cdddd19f))

---
This PR was generated with [Release Please](https://togithub.com/googleapis/release-please). See [documentation](https://togithub.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants