feat: temporary resources no longer use BigQuery Sessions #194

tswast · 2023-11-10T18:49:59Z

This allows multiple queries to run in parallel.

Draft because this is blocked by:

Updating models to no longer use session, issue 308813632.
Updating read_pandas to no longer use session, issue 309108590. (fix: use random table for read_pandas #192)
Updating read_gbq_table to no longer use session, 309109254. (feat: read_gbq creates order deterministically without table copy #191)

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes internal issue 303953103
🦕

This allows multiple queries to run in parallel.

…sion-from-session

tswast · 2023-11-14T15:15:08Z

The following tests failed:

FAILED tests/system/large/ml/test_ensemble.py::test_randomforestclassifier_default_params
FAILED tests/system/large/ml/test_ensemble.py::test_randomforestclassifier_multiple_params
FAILED tests/system/large/ml/test_ensemble.py::test_xgbregressor_dart_booster_multiple_params

I'll investigate this morning.

tswast · 2023-11-14T15:50:24Z

@GarrettWu Please take a look, as this modifies the bigframes.ml module to use anonymous dataset instead of session model.

GarrettWu · 2023-11-13T22:59:59Z

tests/unit/ml/test_sql.py

        options={"option_key1": "option_value1", "option_key2": 2},
    )
    assert (
        sql
-        == """CREATE TEMP MODEL `my_model_id`
+        == """CREATE MODEL `a-project`.`a-dataset`.`my_model_id`


Can we be more consistent? Seeing test-prj, testprj, a-project, etc. If really want to give distinct names, we can use more meaning full names like test-project-imported.

For dataset names too.

I'm intentionally choosing different projects here so that we know that we're creating in the dataset specified, not something hardcoded (not that would be likely, but still...)

GarrettWu · 2023-11-14T15:56:25Z

bigframes/ml/sql.py

    # Model create and alter
    def create_model(
        self,
        source_df: bpd.DataFrame,
+        dataset: google.cloud.bigquery.DatasetReference,


Since the ModelCreationtSqlGenerator is specific to one model entity, the dataset should be the same as the model_id, as private members of the Generator. Which makes it easier to inject and test.

I'm not sure what you mean?

Honestly, I found the global state of model_id very concerning.

Updated to take a ModelReference instead, avoiding thread safety issues.

feat: temporary resources no longer use BigQuery Sessions

1db08f8

This allows multiple queries to run in parallel.

product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Nov 10, 2023

tswast added 4 commits November 10, 2023 14:22

Merge branch 'main' into b303953103-remove-session-from-session

e96d2a3

Merge branch 'main' into b303953103-remove-session-from-session

ecaebc6

use anonymous dataset for models

0b0e912

Merge remote-tracking branch 'origin/main' into b303953103-remove-ses…

543ee7d

…sion-from-session

tswast marked this pull request as ready for review November 13, 2023 20:42

tswast requested review from a team as code owners November 13, 2023 20:42

tswast requested a review from Genesis929 November 13, 2023 20:42

remove reference to bq session

6cc6692

avoid 'model already exists' error

9ac00a2

tswast added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 14, 2023

tswast requested review from GarrettWu and removed request for Genesis929 November 14, 2023 15:49

bigframes-bot removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 14, 2023

GarrettWu reviewed Nov 14, 2023

View reviewed changes

This was referenced Nov 14, 2023

feat(bigquery): add support for temporary tables ibis-project/ibis#7527

Merged

feat(bigquery): support reading parquet, json and csv files ibis-project/ibis#7546

Merged

remove global _model_id

83d444d

product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Nov 14, 2023

tswast requested a review from GarrettWu November 14, 2023 17:01

GarrettWu approved these changes Nov 14, 2023

View reviewed changes

tswast merged commit 4a02cac into main Nov 14, 2023

tswast deleted the b303953103-remove-session-from-session branch November 14, 2023 17:47

release-please bot mentioned this pull request Nov 13, 2023

chore(main): release 0.14.0 #183

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: temporary resources no longer use BigQuery Sessions #194

feat: temporary resources no longer use BigQuery Sessions #194

tswast commented Nov 10, 2023 •

edited

Loading

tswast commented Nov 14, 2023

tswast commented Nov 14, 2023

GarrettWu Nov 13, 2023

tswast Nov 14, 2023

tswast Nov 14, 2023

GarrettWu Nov 14, 2023

tswast Nov 14, 2023

tswast Nov 14, 2023

feat: temporary resources no longer use BigQuery Sessions #194

feat: temporary resources no longer use BigQuery Sessions #194

Conversation

tswast commented Nov 10, 2023 • edited Loading

tswast commented Nov 14, 2023

tswast commented Nov 14, 2023

GarrettWu Nov 13, 2023

Choose a reason for hiding this comment

tswast Nov 14, 2023

Choose a reason for hiding this comment

tswast Nov 14, 2023

Choose a reason for hiding this comment

GarrettWu Nov 14, 2023

Choose a reason for hiding this comment

tswast Nov 14, 2023

Choose a reason for hiding this comment

tswast Nov 14, 2023

Choose a reason for hiding this comment

tswast commented Nov 10, 2023 •

edited

Loading