Skip to content

docs: add python code sample to multiple timeseries forecasting #531

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 6, 2024

Conversation

DevStephanie
Copy link
Contributor

@DevStephanie DevStephanie commented Mar 27, 2024

BEGIN_COMMIT_OVERRIDE
docs: Add python code sample for multiple forecasting time series (#531)
END_COMMIT_OVERRIDE

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@DevStephanie DevStephanie requested review from a team as code owners March 27, 2024 18:13
Copy link

snippet-bot bot commented Mar 27, 2024

Here is the summary of changes.

You are about to add 2 region tags.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

  • Refresh this comment

@product-auto-label product-auto-label bot added the size: m Pull request size is medium. label Mar 27, 2024
@product-auto-label product-auto-label bot added api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. samples Issues that are directly related to samples. labels Mar 27, 2024
@DevStephanie
Copy link
Contributor Author

Corrected region tags, will be updated on next PR!

# Start by selecting the data you'll use for training. `read_gbq_table` accepts
# either a SQL query or a table ID. Since this example selects from multiple
# tables via a wildcard, use SQL to define this data. Watch issue
# https://github.com/googleapis/python-bigquery-dataframes/issues/169
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already fixed

# https://github.com/googleapis/python-bigquery-dataframes/issues/169
# for updates to `read_gbq_table` to support wildcard tables.

df = bpd.read_gbq_table("bigquery-public-data.new_york.citibike_trips", filters=[])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't need filters, no need to pass in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will correct.

# https://github.com/googleapis/python-bigquery-dataframes/issues/169
# for updates to `read_gbq_table` to support wildcard tables.

df = bpd.read_gbq_table("bigquery-public-data.new_york.citibike_trips", filters=[])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use read_gbq_table instead of more general API read_gbq?

num_trips = features.groupby(["date"], as_index=False).count()
model = forecasting.ARIMAPlus()

X = num_trips["date"].to_frame()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_frame() not needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, will update that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we might need .to_frame() because without it, I see

AttributeError                            Traceback (most recent call last)
Cell In[12], line 18
     15 X = num_trips["date"]
     16 y = num_trips["num_trips"]
---> 18 model.fit(X, y)
     19 # The model.fit() call above created a temporary model.
     20 # Use the to_gbq() method to write to a permanent location.
     22 model.to_gbq(
     23     your_model_id,  # For example: "bqml_tutorial.sample_model",
     24     replace=True,
     25 )

File ~/python-bigquery-dataframes/bigframes/ml/base.py:163, in SupervisedTrainablePredictor.fit(self, X, y)
    158 def fit(
    159     self: _T,
    160     X: Union[bpd.DataFrame, bpd.Series],
    161     y: Union[bpd.DataFrame, bpd.Series],
    162 ) -> _T:
--> 163     return self._fit(X, y)

File ~/python-bigquery-dataframes/bigframes/core/log_adapter.py:44, in method_logger.<locals>.wrapper(*args, **kwargs)
     42 if api_method_name.startswith("__") or not api_method_name.startswith("_"):
     43     add_api_method(full_method_name)
---> 44 return method(*args, **kwargs)

File ~/python-bigquery-dataframes/bigframes/ml/forecasting.py:218, in ARIMAPlus._fit(self, X, y, transforms)
    197 def _fit(
    198     self,
    199     X: Union[bpd.DataFrame, bpd.Series],
    200     y: Union[bpd.DataFrame, bpd.Series],
    201     transforms: Optional[List[str]] = None,
    202 ):
    203     """Fit the model to training data.
    204 
    205     Args:
   (...)
    216         ARIMAPlus: Fitted estimator.
    217     """
--> 218     if X.columns.size != 1:
    219         raise ValueError(
    220             "Time series timestamp input X must only contain 1 column."
    221         )
    222     if y.columns.size != 1:

File ~/python-bigquery-dataframes/bigframes/series.py:1062, in Series.__getattr__(self, key)
   1053     raise AttributeError(
   1054         textwrap.dedent(
   1055             f"""
   (...)
   1059         )
   1060     )
   1061 else:
-> 1062     raise AttributeError(key)

AttributeError: columns

your_model_id, # For example: "bqml_tutorial.sample_model",
replace=True,
)
# ARIMAPlus(auto_arima_max_order=5, data_frequency='AUTO_FREQUENCY',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why code below are all commented out?

Comment on lines 9 to 14
# either a SQL query or a table ID. Since this example selects from multiple
# tables via a wildcard, use SQL to define this data. Watch issue
# https://github.com/googleapis/python-bigquery-dataframes/issues/169
# for updates to `read_gbq_table` to support wildcard tables.

df = bpd.read_gbq_table("bigquery-public-data.new_york.citibike_trips", filters=[])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just reading a regular table. We don't need to explain wildcard tables.

Also, the default filter is no filters, so we don't need to pass that in.

Suggested change
# either a SQL query or a table ID. Since this example selects from multiple
# tables via a wildcard, use SQL to define this data. Watch issue
# https://github.com/googleapis/python-bigquery-dataframes/issues/169
# for updates to `read_gbq_table` to support wildcard tables.
df = bpd.read_gbq_table("bigquery-public-data.new_york.citibike_trips", filters=[])
# either a SQL query or a table ID.
df = bpd.read_gbq_table("bigquery-public-data.new_york.citibike_trips")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will edit the filter but removing from this section that doesn't have one and removing the read gbp table explanation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

@@ -0,0 +1,106 @@
def test_multiple_timeseries_forecasting_model(random_model_id):
# [START bigquery_dataframes_bqml_create_data__set]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in our 1:1, these region tags need to be globally unique, so let's borrow from the URL of the tutorial (https://cloud.google.com/bigquery/docs/arima-multiple-time-series-forecasting-tutorial) to construct these.

For example:

Suggested change
# [START bigquery_dataframes_bqml_create_data__set]
# [START bigquery_dataframes_bqml_arima_multiple_step_2_visualize]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, region tags will follow URL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected

Comment on lines 16 to 18
# [END bigquery_dataframes_bqml_create_data__set(1)]

# [START bigquery_dataframes_bqml_visualize_time_series_to_forecast]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two region tags can be combined since they are both for step 2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, makes sense! is "/ " or "&" the preference?

# Use the to_gbq() method to write to a permanent location.

model.to_gbq(
your_model_id, # For example: "bqml_tutorial.sample_model",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stay consistent with the SQL with regards to suggested model ID. See step 3: https://cloud.google.com/bigquery/docs/arima-multiple-time-series-forecasting-tutorial#arima-single-model

Suggested change
your_model_id, # For example: "bqml_tutorial.sample_model",
your_model_id, # For example: "bqml_tutorial.nyc_citibike_arima_model",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OOH, good catch! Yes, will correct this.

Comment on lines 92 to 94
# ARIMAPlus(auto_arima_max_order=5, data_frequency='AUTO_FREQUENCY',
# max_time_series_length=3, min_time_series_length=20,
# time_series_length_fraction=1.0, trend_smoothing_window_size=-1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what this comment is from. Let's remove it.

Suggested change
# ARIMAPlus(auto_arima_max_order=5, data_frequency='AUTO_FREQUENCY',
# max_time_series_length=3, min_time_series_length=20,
# time_series_length_fraction=1.0, trend_smoothing_window_size=-1)


# [END bigquery_dataframes_bqml_visualize_time_series_to_forecast]

# [START bigquery_dataframes_bqml_visualize_time_series_to_forecast]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in our 1:1, EXPLAIN_FORECAST isn't implemented yet in bigframes. Please file an issue if you haven't already and remove this sample for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will remove and file request.

@DevStephanie DevStephanie requested a review from a team as a code owner May 6, 2024 20:49
Copy link
Collaborator

@tswast tswast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Will wait for e2e tests with code samples to pass before merging.

@tswast tswast enabled auto-merge (squash) May 6, 2024 21:20
@tswast tswast merged commit 16866d2 into main May 6, 2024
15 of 16 checks passed
@tswast tswast deleted the Stephanie446 branch May 6, 2024 22:41
@tswast
Copy link
Collaborator

tswast commented May 7, 2024

e2e tests failed with

FAILED tests/system/large/ml/test_core.py::test_bqml_e2e - AssertionError: Da...
FAILED tests/system/large/ml/test_ensemble.py::test_xgbregressor_default_params
FAILED tests/system/large/ml/test_pipeline.py::test_pipeline_random_forest_classifier_fit_score_predict
FAILED tests/system/large/ml/test_pipeline.py::test_pipeline_xgbregressor_fit_score_predict

which appear to be unrelated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. samples Issues that are directly related to samples. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants