-
Notifications
You must be signed in to change notification settings - Fork 48
docs: add python code sample to multiple timeseries forecasting #531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Here is the summary of changes. You are about to add 2 region tags.
This comment is generated by snippet-bot.
|
Corrected region tags, will be updated on next PR! |
# Start by selecting the data you'll use for training. `read_gbq_table` accepts | ||
# either a SQL query or a table ID. Since this example selects from multiple | ||
# tables via a wildcard, use SQL to define this data. Watch issue | ||
# https://github.com/googleapis/python-bigquery-dataframes/issues/169 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already fixed
# https://github.com/googleapis/python-bigquery-dataframes/issues/169 | ||
# for updates to `read_gbq_table` to support wildcard tables. | ||
|
||
df = bpd.read_gbq_table("bigquery-public-data.new_york.citibike_trips", filters=[]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't need filters, no need to pass in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will correct.
# https://github.com/googleapis/python-bigquery-dataframes/issues/169 | ||
# for updates to `read_gbq_table` to support wildcard tables. | ||
|
||
df = bpd.read_gbq_table("bigquery-public-data.new_york.citibike_trips", filters=[]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use read_gbq_table instead of more general API read_gbq?
num_trips = features.groupby(["date"], as_index=False).count() | ||
model = forecasting.ARIMAPlus() | ||
|
||
X = num_trips["date"].to_frame() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to_frame() not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, will update that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we might need .to_frame() because without it, I see
AttributeError Traceback (most recent call last)
Cell In[12], line 18
15 X = num_trips["date"]
16 y = num_trips["num_trips"]
---> 18 model.fit(X, y)
19 # The model.fit() call above created a temporary model.
20 # Use the to_gbq() method to write to a permanent location.
22 model.to_gbq(
23 your_model_id, # For example: "bqml_tutorial.sample_model",
24 replace=True,
25 )
File ~/python-bigquery-dataframes/bigframes/ml/base.py:163, in SupervisedTrainablePredictor.fit(self, X, y)
158 def fit(
159 self: _T,
160 X: Union[bpd.DataFrame, bpd.Series],
161 y: Union[bpd.DataFrame, bpd.Series],
162 ) -> _T:
--> 163 return self._fit(X, y)
File ~/python-bigquery-dataframes/bigframes/core/log_adapter.py:44, in method_logger.<locals>.wrapper(*args, **kwargs)
42 if api_method_name.startswith("__") or not api_method_name.startswith("_"):
43 add_api_method(full_method_name)
---> 44 return method(*args, **kwargs)
File ~/python-bigquery-dataframes/bigframes/ml/forecasting.py:218, in ARIMAPlus._fit(self, X, y, transforms)
197 def _fit(
198 self,
199 X: Union[bpd.DataFrame, bpd.Series],
200 y: Union[bpd.DataFrame, bpd.Series],
201 transforms: Optional[List[str]] = None,
202 ):
203 """Fit the model to training data.
204
205 Args:
(...)
216 ARIMAPlus: Fitted estimator.
217 """
--> 218 if X.columns.size != 1:
219 raise ValueError(
220 "Time series timestamp input X must only contain 1 column."
221 )
222 if y.columns.size != 1:
File ~/python-bigquery-dataframes/bigframes/series.py:1062, in Series.__getattr__(self, key)
1053 raise AttributeError(
1054 textwrap.dedent(
1055 f"""
(...)
1059 )
1060 )
1061 else:
-> 1062 raise AttributeError(key)
AttributeError: columns
your_model_id, # For example: "bqml_tutorial.sample_model", | ||
replace=True, | ||
) | ||
# ARIMAPlus(auto_arima_max_order=5, data_frequency='AUTO_FREQUENCY', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why code below are all commented out?
# either a SQL query or a table ID. Since this example selects from multiple | ||
# tables via a wildcard, use SQL to define this data. Watch issue | ||
# https://github.com/googleapis/python-bigquery-dataframes/issues/169 | ||
# for updates to `read_gbq_table` to support wildcard tables. | ||
|
||
df = bpd.read_gbq_table("bigquery-public-data.new_york.citibike_trips", filters=[]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just reading a regular table. We don't need to explain wildcard tables.
Also, the default filter is no filters, so we don't need to pass that in.
# either a SQL query or a table ID. Since this example selects from multiple | |
# tables via a wildcard, use SQL to define this data. Watch issue | |
# https://github.com/googleapis/python-bigquery-dataframes/issues/169 | |
# for updates to `read_gbq_table` to support wildcard tables. | |
df = bpd.read_gbq_table("bigquery-public-data.new_york.citibike_trips", filters=[]) | |
# either a SQL query or a table ID. | |
df = bpd.read_gbq_table("bigquery-public-data.new_york.citibike_trips") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will edit the filter but removing from this section that doesn't have one and removing the read gbp table explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved
@@ -0,0 +1,106 @@ | |||
def test_multiple_timeseries_forecasting_model(random_model_id): | |||
# [START bigquery_dataframes_bqml_create_data__set] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in our 1:1, these region tags need to be globally unique, so let's borrow from the URL of the tutorial (https://cloud.google.com/bigquery/docs/arima-multiple-time-series-forecasting-tutorial) to construct these.
For example:
# [START bigquery_dataframes_bqml_create_data__set] | |
# [START bigquery_dataframes_bqml_arima_multiple_step_2_visualize] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, region tags will follow URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected
# [END bigquery_dataframes_bqml_create_data__set(1)] | ||
|
||
# [START bigquery_dataframes_bqml_visualize_time_series_to_forecast] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two region tags can be combined since they are both for step 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, makes sense! is "/ " or "&" the preference?
# Use the to_gbq() method to write to a permanent location. | ||
|
||
model.to_gbq( | ||
your_model_id, # For example: "bqml_tutorial.sample_model", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's stay consistent with the SQL with regards to suggested model ID. See step 3: https://cloud.google.com/bigquery/docs/arima-multiple-time-series-forecasting-tutorial#arima-single-model
your_model_id, # For example: "bqml_tutorial.sample_model", | |
your_model_id, # For example: "bqml_tutorial.nyc_citibike_arima_model", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOH, good catch! Yes, will correct this.
# ARIMAPlus(auto_arima_max_order=5, data_frequency='AUTO_FREQUENCY', | ||
# max_time_series_length=3, min_time_series_length=20, | ||
# time_series_length_fraction=1.0, trend_smoothing_window_size=-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what this comment is from. Let's remove it.
# ARIMAPlus(auto_arima_max_order=5, data_frequency='AUTO_FREQUENCY', | |
# max_time_series_length=3, min_time_series_length=20, | |
# time_series_length_fraction=1.0, trend_smoothing_window_size=-1) |
|
||
# [END bigquery_dataframes_bqml_visualize_time_series_to_forecast] | ||
|
||
# [START bigquery_dataframes_bqml_visualize_time_series_to_forecast] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in our 1:1, EXPLAIN_FORECAST isn't implemented yet in bigframes. Please file an issue if you haven't already and remove this sample for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will remove and file request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Will wait for e2e tests with code samples to pass before merging.
e2e tests failed with
which appear to be unrelated. |
BEGIN_COMMIT_OVERRIDE
docs: Add python code sample for multiple forecasting time series (#531)
END_COMMIT_OVERRIDE
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕