Skip to content

docs: add sample for getting started with BQML #141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
Dec 12, 2023
Merged

docs: add sample for getting started with BQML #141

merged 35 commits into from
Dec 12, 2023

Conversation

DevStephanie
Copy link
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@DevStephanie DevStephanie requested review from a team as code owners October 25, 2023 15:31
@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. samples Issues that are directly related to samples. labels Oct 25, 2023
@tswast tswast mentioned this pull request Oct 25, 2023
4 tasks
@snippet-bot
Copy link

snippet-bot bot commented Oct 25, 2023

Here is the summary of changes.

You are about to add 1 region tag.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

  • Refresh this comment

Comment on lines 70 to 71
# When writing a DataFrame to a BigQuery table, include destinaton table
# and parameters, index defaults to "True".
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment has nothing to do with BigQuery ML models. Please fix.

Note: The important thing here is that we're taking our trained model and writing it to a permanent location.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, corrected.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: The comment is still talking about tables not models. I'll make a comment with a suggested edit.

@DevStephanie DevStephanie requested a review from tswast November 3, 2023 17:26
@@ -0,0 +1,13 @@
# Copyright 2023 Google LLC
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove this file for now. Let's do a separate PR for the K-Means tutorials.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file still needs to be deleted.

@DevStephanie DevStephanie requested a review from tswast November 6, 2023 18:13
@tswast tswast added automerge Merge the pull request once unit tests and other checks pass. and removed automerge Merge the pull request once unit tests and other checks pass. labels Nov 16, 2023

# The model.fit() call above created a temporary model.
# Use the to_gbq() method to write to a permanent location.
model.to_gbq("bqml_tutorial.sample_model", replace=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: We're getting

E           google.api_core.exceptions.BadRequest: 400 Concurrent update on same model: bigframes-dev:bqml_tutorial.sample_model is not supported. Share your usecase with the BigQuery DataFrames team at the [https://bit.ly/bigframes-feedback](https://www.google.com/url?q=https://bit.ly/bigframes-feedback&sa=D) survey.

failure in our test suite: https://fusion2.corp.google.com/invocations/8a7513c8-e7c9-4b5b-82fe-9a83c176fbc1/targets/bigframes%2Fpresubmit%2Fe2e/log

I think we'll need a test fixture for this to create a temporary place for the model and clean it up when the test finishes.

  1. Create a file called samples/snippets/conftest.py.

  2. In the conftest.py file you create, add a fixture called random_model_id, similar to this one: https://github.com/googleapis/python-bigquery/blob/f804d639fe95bef5d083afe1246d756321128b05/samples/snippets/conftest.py#L101-L111 except it'll call delete_model(...) instead of delete_table(...).

    You'll also need to add "prefixer" https://github.com/googleapis/python-bigquery/blob/f804d639fe95bef5d083afe1246d756321128b05/samples/snippets/conftest.py#L21 and bigquery_client fixture https://github.com/googleapis/python-bigquery/blob/f804d639fe95bef5d083afe1246d756321128b05/samples/snippets/conftest.py#L33-L36

  3. Update your code sample to use the new random_model_id fixture.

    Look how we do it in the remote functions test:

    your_gcp_project_id = project_id
    # [START bigquery_dataframes_remote_function]
    import bigframes.pandas as bpd
    # Set BigQuery DataFrames options
    bpd.options.bigquery.project = your_gcp_project_id
    but instead you'll be setting your_model_id = random_model_id and calling

    model.to_gbq(
        your_model_id,  # "project.dataset.model_id" or "dataset.model_id"
        replace=True,
    )
    



def test_bqml_getting_started():
# [START bigquery_getting_started_bqml_tutorial]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use bigquery_dataframes_bqml_getting_started for our region tags.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great, will make edits now.

@DevStephanie DevStephanie requested a review from a team as a code owner December 12, 2023 20:34
@product-auto-label product-auto-label bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Dec 12, 2023
@@ -0,0 +1,25 @@
Copyright (c) 2013-2022, GeoPandas developers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this

@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: l Pull request size is large. labels Dec 12, 2023
@tswast tswast added the automerge Merge the pull request once unit tests and other checks pass. label Dec 12, 2023
@tswast tswast merged commit fb14f54 into main Dec 12, 2023
@tswast tswast deleted the bqml_tutorial branch December 12, 2023 22:25
@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. samples Issues that are directly related to samples. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants