Skip to content

feat: Add quantile statistic #613

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 16, 2024
Merged

feat: Add quantile statistic #613

merged 11 commits into from
Apr 16, 2024

Conversation

TrevorBergeron
Copy link
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

@TrevorBergeron TrevorBergeron requested review from a team as code owners April 15, 2024 18:30
@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Apr 15, 2024

def test_dataframe_aggregates_quantile_multi(scalars_df_index, scalars_pandas_df_index):
q = [0, 0.33, 0.67, 1.0]
col_names = ["int64_too", "int64_col", "float64_col"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe support the numeric_only parameter also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@@ -2012,6 +2012,23 @@ def median(
block = frame._block.aggregate_all_and_stack(agg_ops.median_op)
return bigframes.series.Series(block.select_column("values"))

def quantile(self, q):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we try to define the parameter type here? q: float | Sequence[float]=?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the doc also need to be updated.

else:
return typing.cast(float, self._apply_aggregation(agg_ops.median_op))

def quantile(self, q: float) -> Union[Series, float]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float or Sequence[float]? The tests show it works for Sequence[float] as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

)
quantile_cols = []
labels = []
for col in columns:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throws an exception for dataframe if column_name is larger than 30?

if len(self.value_columns) > 30:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added limit

@TrevorBergeron TrevorBergeron added the automerge Merge the pull request once unit tests and other checks pass. label Apr 16, 2024
@gcf-merge-on-green gcf-merge-on-green bot merged commit bc82804 into main Apr 16, 2024
15 checks passed
@gcf-merge-on-green gcf-merge-on-green bot deleted the quantiles branch April 16, 2024 23:32
@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Apr 16, 2024
gcf-merge-on-green bot pushed a commit that referenced this pull request Apr 22, 2024
🤖 I have created a release *beep* *boop*
---


## [1.3.0](https://togithub.com/googleapis/python-bigquery-dataframes/compare/v1.2.0...v1.3.0) (2024-04-22)


### Features

* Add `Series.struct.dtypes` property ([#599](https://togithub.com/googleapis/python-bigquery-dataframes/issues/599)) ([d924ec2](https://togithub.com/googleapis/python-bigquery-dataframes/commit/d924ec2937c158644b5d1bbae4f82476de2c1655))
* Add fine tuning `fit()` for Palm2TextGenerator ([#616](https://togithub.com/googleapis/python-bigquery-dataframes/issues/616)) ([9c106bd](https://togithub.com/googleapis/python-bigquery-dataframes/commit/9c106bd24482620ef5ff3c85f94be9da76c49716))
* Add quantile statistic ([#613](https://togithub.com/googleapis/python-bigquery-dataframes/issues/613)) ([bc82804](https://togithub.com/googleapis/python-bigquery-dataframes/commit/bc82804da43c03c2311cd56f47a2316d3aae93d2))
* Expose `max_batching_rows` in `remote_function` ([#622](https://togithub.com/googleapis/python-bigquery-dataframes/issues/622)) ([240a1ac](https://togithub.com/googleapis/python-bigquery-dataframes/commit/240a1ac6fa914550bb6216cd5d179a36009f2657))
* Support primary key(s) in `read_gbq` by using as the `index_col` by default ([#625](https://togithub.com/googleapis/python-bigquery-dataframes/issues/625)) ([75bb240](https://togithub.com/googleapis/python-bigquery-dataframes/commit/75bb2409532e80de742030d05ffcbacacf5ffba2))
* Warn if location is set to unknown location ([#609](https://togithub.com/googleapis/python-bigquery-dataframes/issues/609)) ([3706b4f](https://togithub.com/googleapis/python-bigquery-dataframes/commit/3706b4f9dde65788b5e6343a6428fb1866499461))


### Bug Fixes

* Address technical writers fb ([#611](https://togithub.com/googleapis/python-bigquery-dataframes/issues/611)) ([9f8f181](https://togithub.com/googleapis/python-bigquery-dataframes/commit/9f8f181279133abdb7da3aa045df6fa278587013))
* Infer narrowest numeric type when combining numeric columns ([#602](https://togithub.com/googleapis/python-bigquery-dataframes/issues/602)) ([8f9ece6](https://togithub.com/googleapis/python-bigquery-dataframes/commit/8f9ece6d13f57f02d677bf0e3fea97dea94ae240))
* Use exact median implementation by default ([#619](https://togithub.com/googleapis/python-bigquery-dataframes/issues/619)) ([9d205ae](https://togithub.com/googleapis/python-bigquery-dataframes/commit/9d205aecb77f35baeec82a8f6e1b72c2d852ca46))


### Documentation

* Fix rendering of examples for multiple apis ([#620](https://togithub.com/googleapis/python-bigquery-dataframes/issues/620)) ([9665e39](https://togithub.com/googleapis/python-bigquery-dataframes/commit/9665e39ef288841f03a9d823bd2210ef58394ad3))
* Set `index_cols` in `read_gbq` as a best practice ([#624](https://togithub.com/googleapis/python-bigquery-dataframes/issues/624)) ([70015b7](https://togithub.com/googleapis/python-bigquery-dataframes/commit/70015b79e8cff16ff1b36c5e3f019fe099750a9d))

---
This PR was generated with [Release Please](https://togithub.com/googleapis/release-please). See [documentation](https://togithub.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants