-
Notifications
You must be signed in to change notification settings - Fork 48
fix: model.fit metric not collected issue. #1085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bigframes/ml/core.py
Outdated
@@ -273,6 +273,8 @@ def _create_model_ref( | |||
def _create_model_with_sql(self, session: bigframes.Session, sql: str) -> BqmlModel: | |||
# fit the model, synchronously | |||
_, job = session._start_query_ml_ddl(sql) | |||
if session._metrics is not None: | |||
session._metrics.count_job_stats(job) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not do this inside _start_query_ml_ddl? That way all ML DDLs will be accounted in the session metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, and added an assertion in test for register.
@@ -31,8 +31,14 @@ def test_linear_regression_configure_fit_score(penguins_df_default_index, datase | |||
] | |||
] | |||
y_train = df[["body_mass_g"]] | |||
|
|||
start_execution_count = df._block._expr.session._metrics.execution_count | |||
|
|||
model.fit(X_train, y_train) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have the opportunity to make this work work all ML APIs - fit, score, predict, transform, .... We should write such tests for all APIs by writing the metrics collection logic in more central place in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
score, predict works, seems internally is calling other functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know, thanks for checking! We could just pick one or two of the existing tests that does fit, score, predict, register and transform and assert everywhere that we are counting stats for those operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test Added.
bigframes/session/__init__.py
Outdated
@@ -1373,7 +1373,14 @@ def _start_query_ml_ddl( | |||
# https://cloud.google.com/bigquery/docs/customer-managed-encryption#encrypt-model | |||
job_config.destination_encryption_configuration = None | |||
|
|||
return bf_io_bigquery.start_query_with_client(self.bqclient, sql, job_config) | |||
results_iterator, query_job = bf_io_bigquery.start_query_with_client( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see start_query_with_client
already has the stats update
python-bigquery-dataframes/bigframes/session/_io/bigquery/__init__.py
Lines 245 to 246 in fd06d31
if metrics is not None: | |
metrics.count_job_stats(query_job) |
so feels like we could be adding double counting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not double counting, but yes here we can send the metric into the start_query_with_client, thanks for point that out.
model.fit(X_train, y_train) | ||
|
||
end_execution_count = df._block._expr.session._metrics.execution_count | ||
assert end_execution_count - start_execution_count == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth adding a comment why fit does 2 queries
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕