-
Notifications
You must be signed in to change notification settings - Fork 48
feat: Add quantile statistic #613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
def test_dataframe_aggregates_quantile_multi(scalars_df_index, scalars_pandas_df_index): | ||
q = [0, 0.33, 0.67, 1.0] | ||
col_names = ["int64_too", "int64_col", "float64_col"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe support the numeric_only
parameter also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
bigframes/dataframe.py
Outdated
@@ -2012,6 +2012,23 @@ def median( | |||
block = frame._block.aggregate_all_and_stack(agg_ops.median_op) | |||
return bigframes.series.Series(block.select_column("values")) | |||
|
|||
def quantile(self, q): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we try to define the parameter type here? q: float | Sequence[float]=
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the doc also need to be updated.
bigframes/series.py
Outdated
else: | ||
return typing.cast(float, self._apply_aggregation(agg_ops.median_op)) | ||
|
||
def quantile(self, q: float) -> Union[Series, float]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
float
or Sequence[float]
? The tests show it works for Sequence[float]
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
) | ||
quantile_cols = [] | ||
labels = [] | ||
for col in columns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throws an exception for dataframe if column_name is larger than 30?
if len(self.value_columns) > 30: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added limit
🤖 I have created a release *beep* *boop* --- ## [1.3.0](https://togithub.com/googleapis/python-bigquery-dataframes/compare/v1.2.0...v1.3.0) (2024-04-22) ### Features * Add `Series.struct.dtypes` property ([#599](https://togithub.com/googleapis/python-bigquery-dataframes/issues/599)) ([d924ec2](https://togithub.com/googleapis/python-bigquery-dataframes/commit/d924ec2937c158644b5d1bbae4f82476de2c1655)) * Add fine tuning `fit()` for Palm2TextGenerator ([#616](https://togithub.com/googleapis/python-bigquery-dataframes/issues/616)) ([9c106bd](https://togithub.com/googleapis/python-bigquery-dataframes/commit/9c106bd24482620ef5ff3c85f94be9da76c49716)) * Add quantile statistic ([#613](https://togithub.com/googleapis/python-bigquery-dataframes/issues/613)) ([bc82804](https://togithub.com/googleapis/python-bigquery-dataframes/commit/bc82804da43c03c2311cd56f47a2316d3aae93d2)) * Expose `max_batching_rows` in `remote_function` ([#622](https://togithub.com/googleapis/python-bigquery-dataframes/issues/622)) ([240a1ac](https://togithub.com/googleapis/python-bigquery-dataframes/commit/240a1ac6fa914550bb6216cd5d179a36009f2657)) * Support primary key(s) in `read_gbq` by using as the `index_col` by default ([#625](https://togithub.com/googleapis/python-bigquery-dataframes/issues/625)) ([75bb240](https://togithub.com/googleapis/python-bigquery-dataframes/commit/75bb2409532e80de742030d05ffcbacacf5ffba2)) * Warn if location is set to unknown location ([#609](https://togithub.com/googleapis/python-bigquery-dataframes/issues/609)) ([3706b4f](https://togithub.com/googleapis/python-bigquery-dataframes/commit/3706b4f9dde65788b5e6343a6428fb1866499461)) ### Bug Fixes * Address technical writers fb ([#611](https://togithub.com/googleapis/python-bigquery-dataframes/issues/611)) ([9f8f181](https://togithub.com/googleapis/python-bigquery-dataframes/commit/9f8f181279133abdb7da3aa045df6fa278587013)) * Infer narrowest numeric type when combining numeric columns ([#602](https://togithub.com/googleapis/python-bigquery-dataframes/issues/602)) ([8f9ece6](https://togithub.com/googleapis/python-bigquery-dataframes/commit/8f9ece6d13f57f02d677bf0e3fea97dea94ae240)) * Use exact median implementation by default ([#619](https://togithub.com/googleapis/python-bigquery-dataframes/issues/619)) ([9d205ae](https://togithub.com/googleapis/python-bigquery-dataframes/commit/9d205aecb77f35baeec82a8f6e1b72c2d852ca46)) ### Documentation * Fix rendering of examples for multiple apis ([#620](https://togithub.com/googleapis/python-bigquery-dataframes/issues/620)) ([9665e39](https://togithub.com/googleapis/python-bigquery-dataframes/commit/9665e39ef288841f03a9d823bd2210ef58394ad3)) * Set `index_cols` in `read_gbq` as a best practice ([#624](https://togithub.com/googleapis/python-bigquery-dataframes/issues/624)) ([70015b7](https://togithub.com/googleapis/python-bigquery-dataframes/commit/70015b79e8cff16ff1b36c5e3f019fe099750a9d)) --- This PR was generated with [Release Please](https://togithub.com/googleapis/release-please). See [documentation](https://togithub.com/googleapis/release-please#release-please).
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕