Skip to content

x/exp/stats: new package with Mean, Median, more #69264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hemanth0525 opened this issue Sep 4, 2024 · 141 comments
Open

x/exp/stats: new package with Mean, Median, more #69264

hemanth0525 opened this issue Sep 4, 2024 · 141 comments

Comments

@hemanth0525
Copy link

hemanth0525 commented Sep 4, 2024

Description:

This proposal aims to enhance the Go standard library’s math ( math/stats.go )package by introducing several essential statistical functions. The proposed functions are:

  • Mean: Calculates the average value of a data set.
  • Median: Determines the middle value when the data set is sorted.
  • Mode: Identifies the most frequently occurring value in a data set.
  • Variance: Measures the spread of the data set from the mean.
  • StdDev: Computes the standard deviation, providing a measure of data dispersion.
    and many more....

Motivation:

The inclusion of these statistical functions directly in the math package will offer Go developers robust tools for data analysis and statistical computation, enhancing the language's utility in scientific and financial applications. Currently, developers often rely on external libraries for these calculations, which adds dependencies and potential inconsistencies. Integrating these functions into the standard library will:

  • Provide Comprehensive Statistical Analysis: These functions will facilitate fundamental statistical measures, aiding in more thorough data analysis and better understanding of data distributions.
  • Ensure Reliable Behavior: Functions are designed to handle edge cases, such as empty slices, to maintain predictable and accurate results.
  • Optimize Performance and Accuracy: Implemented with efficient algorithms to balance performance with calculation accuracy.
  • Increase Utility: Reduces the need for third-party libraries, making statistical computation more accessible and consistent within the Go ecosystem.

Design:

The functions will be added to the existing math package, ensuring they are easy to use and integrate seamlessly with other mathematical operations. Detailed documentation and examples will be provided to illustrate their usage and edge case handling.

Examples:

  • Mean:
    mean := math.Mean([]float64{1, 2, 3, 4, 5})
  • Median:
    median := math.Median([]float64{1, 3, 3, 6, 7, 8, 9})
  • Mode:
    mode := math.Mode([]float64{1, 2, 2, 3, 4})
  • Variance:
    variance := math.Variance([]float64{1, 2, 3, 4, 5})
  • StdDev:
    stddev := math.StdDev([]float64{1, 2, 3, 4, 5})

@gabyhelp's overview of this issue: #69264 (comment)

@gabyhelp
Copy link

gabyhelp commented Sep 4, 2024

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

@seankhliao seankhliao changed the title math: Implement Statistical Functions for Mean, Median, Mode, Variance, and StdDev proposal: math: Implement Statistical Functions for Mean, Median, Mode, Variance, and StdDev Sep 4, 2024
@gopherbot gopherbot added this to the Proposal milestone Sep 4, 2024
@seankhliao seankhliao changed the title proposal: math: Implement Statistical Functions for Mean, Median, Mode, Variance, and StdDev proposal: math: add Mean, Median, Mode, Variance, and StdDev Sep 4, 2024
@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Sep 4, 2024
@ianlancetaylor
Copy link
Member

ianlancetaylor commented Sep 4, 2024

In general the math package aims to provide the functions that are in the C++ standard library <math>.

@hemanth0525
Copy link
Author

Thanks for the feedback! I get that the math package is meant to mirror the functions in C++'s <cmath>, but I think adding some built-in stats functions could be a nice improvement. A lot of developers deal with stats regularly, so having these in the standard library could make things easier without stepping too far from the package’s core purpose. Happy to chat more about it if needed!

@earthboundkid
Copy link
Contributor

Can you do some detective work to see how people are dealing with this in open source Go now? Is there some go-stats package that has a million stars on Github? Are there ten libraries that are each imported five hundred times? Seeing that something has big demand already is important for bringing something that could be in a third party library into the standard library. Otherwise this will just get closed with "write a third party library." Which has certainly happened to me more than once!

@hemanth0525
Copy link
Author

I’ve done some digging into how statistical functions are currently being handled in the Go community. While libraries like Gonum and others provide statistical methods, there's no single source of truth or dominant package in this space, and many are designed for more complex or specialized tasks. However, the basic statistical functions we're proposing—like Mean, Median, Mode, Variance, and StdDev—are foundational for a wide range of applications, from simple data analysis to more advanced scientific and financial computations.

By integrating these into the standard library, we'd eliminate the need for external dependencies for basic tasks, which is in line with Go's philosophy of having a strong standard library for common use cases. While third-party packages are an option, including these functions in the math package would make Go more self-sufficient for everyday statistical needs, benefiting developers who want a simple, reliable way to compute these without resorting to third-party solutions.

@seankhliao
Copy link
Member

for common use cases

this is the part where we need to see evidence. especially considering the existence of libraries like gonum, how often does the need arise for functions like those proposed where you wouldn't need the extra functionality that other libraries provide.

@jimmyfrasche
Copy link
Member

For what it's worth, python has a statistics package in its standard library: https://docs.python.org/3/library/statistics.html

It would be nice to have a simple package everyone agrees on for common use cases, but that doesn't necessarily need to be in std.

@randall77
Copy link
Contributor

These functions sound pretty simple, but I think there's actually a lot of subtlety here. For instance, what does Mean do for rounding? Do we need to use Kahan's algorithm? What if the sum at some point rounds up to +Inf?

@doggedOwl
Copy link

Can you do some detective work to see how people are dealing with this in open source Go now? Is there some go-stats package that has a million stars on Github? Are there ten libraries that are each imported five hundred times? Seeing that something has big demand already is important for bringing something that could be in a third party library into the standard library. Otherwise this will just get closed with "write a third party library." Which has certainly happened to me more than once

in my experience everytime some numeric problems comes up gonum lib is suggested. they have a stats package https://pkg.go.dev/gonum.org/v1/[email protected]/stat

@hemanth0525
Copy link
Author

Can you do some detective work to see how people are dealing with this in open source Go now? Is there some go-stats package that has a million stars on Github? Are there ten libraries that are each imported five hundred times? Seeing that something has big demand already is important for bringing something that could be in a third party library into the standard library. Otherwise this will just get closed with "write a third party library." Which has certainly happened to me more than once

in my experience everytime some numeric problems comes up gonum lib is suggested. they have a stats package https://pkg.go.dev/gonum.org/v1/[email protected]/stat

Yeah, so think about having it's functionalities in go std lib straight away !

@hemanth0525
Copy link
Author

Gonum library is indeed often suggested for statistical and numerical work in Go, and it has a dedicated stat package. It’s a robust library that covers a wide range of statistical functions, and for more complex needs, it's definitely a go-to solution.

However, my proposal is focused on adding foundational statistical functions like Mean, Median, Mode, Variance, and StdDev,... directly into the standard library. These are basic but essential tools that many developers need in day-to-day tasks, and having them in the standard library could save developers from importing an entire external library like Gonum for simple calculations. I believe integrating these functions would make Go more self-sufficient, particularly for developers who need straightforward statistical calculations without additional dependencies.

@adonovan
Copy link
Member

adonovan commented Sep 6, 2024

IMHO these functions would be very useful in the standard library, even if (or indeed, because) the implementation requires some care. There are many "quick" uses of these basic stats operations in testing, benchmarking, and writing CL descriptions that shouldn't require a heavyweight dependency on a fully-featured third-party stats library. (I often end up moving data out of my Go program to the shell and running the github.com/nferraz/st command.)

Another function I would like is Percentile(n, series), which reports the nth percentile value of a given series.

@jimmyfrasche
Copy link
Member

If it belongs in std, it should probably be in a "math/stats" or "math/statistics" instead of directly in "math".

@meling
Copy link

meling commented Sep 10, 2024

Here is a small experience report with existing stats packages: In some code I was using gonum’s stats package, and a collaborator started using github.com/montanaflynn/stats as well, whose API returns an error (which I felt was annoying.) Luckily, I caught the unnecessary dependency in code review.

These are the types of things that can easily cause unnecessary dependencies to get added in projects. Hence, I think adding common statistics functions would be a great addition to the std.

@hemanth0525
Copy link
Author

It seems like a lot of developers will benefit from this !!

@hemanth0525
Copy link
Author

hemanth0525 commented Sep 22, 2024

Can I know the update on this proposal ??_

@adonovan
Copy link
Member

The proposal review committee will likely look at it this week. It usually takes a few rounds to reach a final decision.

@hemanth0525
Copy link
Author

The proposal review committee will likely look at it this week. It usually takes a few rounds to reach a final decision.

OK, Cool !

@hemanth0525
Copy link
Author

Can I know the update on this proposal please ?

@adonovan
Copy link
Member

Sorry, we didn't get to it last week, but perhaps will this week.

@hemanth0525
Copy link
Author

Yes Please....

@adonovan
Copy link
Member

adonovan commented Oct 2, 2024

Some of the questions raised in the meeting were:

  • Which package should this live in? The scope of the math package aligns with the C++ math package, so it does not seem the appropriate home. Perhaps math/stats? But this might create a temptation to add a lot more statistical functions. Which leads to:
  • If we create a new package, what should be its scope? The proposed set of functions (including Percentile) is roughly the set of statistical functions that every high-school student knows, and perhaps that's the appropriate scope.
  • Should the functions be generic? Should we support the median of an integer series, say? Personally I'm not convinced it's necessary; users can convert integers to floats as needed. This package should make common problems (such as arise during testing and benchmarking) convenient, not aim for maximum generality or efficiency.
  • Is a single result sufficient for the Mode function? What is the mode of [1, 2]?

@hemanth0525
Copy link
Author

hemanth0525 commented Oct 2, 2024

Thanks for the feedback! I totally get the concerns and here’s my take:

  1. Package Location: I agree that a new math/stats package makes sense. It keeps things organized and prevents the core math package from becoming too broad. We can start with the basics—mean, median, mode, variance, etc.—covering foundational stats functions that are universally useful.

  2. Scope: Let’s keep it simple for now. The goal should be to provide common, practical functions that people need for everyday testing, benchmarking, and basic analytics. We don’t need to cover advanced statistical methods yet—just the essentials. And yeah !, potential addons would be [ Percentile, Quartiles, Geometric Mean, Harmonic Mean, Mean Absolute Deviation (MAD), Coefficient of Variation (CV), Cumulative Sum (Cumsum), Root Mean Square (RMS), Skewness, Kurtosis, Covariance, Correlation Coefficient, Z-Score, ..... ]

  3. Generics: I don’t think we need generics here. Users can convert integers to floats if needed, and keeping it focused on simplicity will make the package more accessible.

  4. Mode Function: For cases like [1, 2], we can return nil or an empty slice [] if no mode exists, or return all modes in a slice when there’s more than one. That way, it’s clear and flexible.

Overall, I think this keeps the package lightweight, practical, and easy to use, which should be the priority. Looking forward to hearing your thoughts!

@adonovan
Copy link
Member

adonovan commented Oct 2, 2024

And yeah potential addons would be Percentile, ...[long list]...

I think the goal of limiting the scope would be to ensure that these (other than Percentile) are not potential additions. ;-)

I agree that a slice result for Mode seems appropriate. Perhaps it should be called Modes.

@ldemailly
Copy link

Calculating stats one at a time (or 2 at a time) seems a bit short sighted. I always need min,max,avg,stddev together personally (and often p50, p75, p99,…
too) and wouldn’t go over a slice of data N times to do so, even for a small slice.

here is what I use and wrote ages ago:
https://github.com/fortio/fortio/blob/master/stats/stats.go#L46

@ncruces
Copy link
Contributor

ncruces commented Mar 3, 2025

I know that this is the final comment period, and I'm now late to the party. Sorry.

But it has occurred to me that, given we're specifying that the APIs cannot modify the slices, couldn't/shouldn't these take an iter.Seq[E] instead of an []E?

I mean, now that we have a standard iteration API, does it make sense to continue to pile up APIs that work exclusively on slices? Also, I'm putting this out as a general principle, even if you decide this API should really have slices: always at least consider if a immutable slice based API can use iter.Seq[E] instead.

@earthboundkid
Copy link
Contributor

Using iter.Seq[E] also has the advantage of not needing to specify in the docs that it will/won't clone/modify the slice, since it obviously will clone it.

@Merovius
Copy link
Contributor

Merovius commented Mar 3, 2025

I'll note that I suggested iter.Seq above and it's been already rejected.

@josharian
Copy link
Contributor

I've been watching and mulling over this conversation. Though I was initially 👍🏽, it now appears to me:

  • There isn't a clean composable API for the simple stuff (min, max, mean, variance, stddev). Both the correct shape of the inputs and the correct shape of the outputs vary meaningfully based on your precise needs.
  • There isn't an appetite for the complicated stuff (t-digest, etc.), which might warrant the cost of some abstraction difficulties.
  • Any modern LLM can more or less perfectly write the simple functions the first time, in whatever form is most useful to you, in situ.

Given that, I have a hard time convincing myself that this really pulls its weight in the standard library.

@aclements
Copy link
Member

But it has occurred to me that, given we're specifying that the APIs cannot modify the slices, couldn't/shouldn't these take an iter.Seq[E] instead of an []E?

Another potential downside of an iterator is that it cuts off certain implementation strategies. For mean and standard deviation, it makes it very difficult to do pairwise summation for high accuracy, though maybe that's okay. For median and quantiles, it basically forces the implementation to copy large parts of the sequence. (Though, a streaming API using sketching could deal with this better, at the cost of accuracy.)

There isn't a clean composable API for the simple stuff (min, max, mean, variance, stddev). Both the correct shape of the inputs and the correct shape of the outputs vary meaningfully based on your precise needs.

@josharian , could you expand on this? I see three "shapes" of input: a slice, an iterator, or a stream. Is that what you mean? By the shape of the output, do you mean which subset of stats the caller wants?

I really think that making the API convoluted just to avoid a couple passes over the slice is the wrong engineering trade-off. Most deep stats packages don't even do this, and it's fine.

There isn't an appetite for the complicated stuff (t-digest, etc.), which might warrant the cost of some abstraction difficulties.

I'm certainly not opposed to this, I just think we explored it enough here to conclude that it's going to need its own proposed API and probably an example implementation.

@josharian
Copy link
Contributor

@josharian, could you expand on this?

For input, there's slice vs iterator. Orthogonally, there's float64 vs float32 vs integer types. (Folks here dismissed float32 offhand, but when working with ML, there's lots of float32 flying around. And I definitely use stats with integer inputs.) Orthogonally, there's float64 vs ~float64 and []float64 vs ~[]float64.

(For slice vs iterator in particular, this is reminiscent of the xiter API challenges #61898 (comment). Feels like there's a deeper language problem lurking here.)

For output, yes, which subset of stats. You risk repeating work, doing unnecessary work, or having an awful "I want these fields populated" API.

Again, all of these are surmountable, if the value provided by the package is high: the code is subtle or error-prone or intricate or non-obvious or large or extremely common. On the flip side, the streamlined API might end up being restrictive enough that people don't/can't end up using it frequently enough to warrant it being in the standard library.

That's the argument, anyway. But I'm not going to die on this hill. :)

@ncruces
Copy link
Contributor

ncruces commented Mar 6, 2025

For median and quantiles, it basically forces the implementation to copy large parts of the sequence.

If you want a linear time algorithm, you have to copy anyway (because you can't modify the slice).
Otherwise the best you can do is O(n log n) average (and O(n²) worst case?)

Point taken, though.

@aclements
Copy link
Member

There's enough opposition and divergence in opinion that we're going to take this out of "likely accept."

Let's try putting the slices API I proposed above into x/exp/stats. That way we can get some actual experience with implementation and use and see if it's valuable to the ecosystem.

@aclements
Copy link
Member

(And I want to acknowledge that @Merovius suggested exactly this approach of trying it out in x/exp.)

@aclements aclements changed the title proposal: math/stats: new package with Mean, Median, more proposal: x/exp/stats: new package with Mean, Median, more Mar 6, 2025
@aclements
Copy link
Member

Have all remaining concerns about this proposal been addressed?

The proposal is to add a golang.org/x/exp/stats package with the following API:

// Package stats provides basic descriptive statistics.
//
// This is not intended as a comprehensive statistics package, but is
// intended to provide common, everyday statistical functions.
//
// These functions aim to balance performance and accuracy, but some
// amount of error is inevitable in floating-point computations.
// The underlying implementations may change, resulting in small
// changes in their results from version to version. If the caller
// needs particular guarantees on accuracy and overflow behavior or
// version stability, they should use a more specialized
// implementation.
package stats

// Mean returns the arithmetic mean of the values in x.
//
// Mean does not modify the slice.
//
// If x is an empty slice, it panics.
// If x contains NaN or both Inf and -Inf, it returns NaN.
// If x contains Inf, it returns Inf. If x contains -Inf, it returns -Inf.
func Mean[F ~float64](x []F) F

// MeanAndStdDev returns the arithmetic mean and
// sample standard deviation of x.
//
// MeanAndStdDev does not modify the slice.
//
// If x is an empty slice, it panics.
// If x contains NaN, it returns NaN, NaN.
// If x contains both Inf and -Inf, it returns NaN, Inf.
// If x contains Inf, it returns Inf, Inf. If x contains -Inf, it returns -Inf, Inf.
func MeanAndStdDev[F ~float64](x []F) (mean, stddev F)

// Median returns the median of the values in x.
// If len(x) is even, it returns the mean of the two central values.
//
// Median does not modify the slice.
//
// Median may perform asymptotically faster and allocate
// asymptotically less if the slice is already sorted.
//
// If x is an empty slice, it panics.
// If x contains NaN, it returns NaN.
// -Inf is treated as smaller than all other values, 
// Inf is treated as larger than all other values, and
// -0.0 is treated as smaller than 0.0.
func Median[F ~float64](x []F) F

// Quantiles returns a sequence of quantiles of x.
//
// The returned slice has the same length as the quantiles slice,
// and the elements correspond one-to-one.
// A quantile of 0 corresponds to the minimum value in x and
// a quantile of 1 corresponds to the maximum value in x.
// A quantile of 0.5 is the same as the value returned by [Median].
//
// Quantiles does not modify the slice.
//
// Quantiles may perform asymptotically faster and allocate
// asymptotically less if the slice is already sorted.
//
// There are many methods for computing quantiles. Quantiles uses the
// "inclusive" method, also known as Q7 in Hyndman and Fan, or the
// "linear" or "R-7" method. This assumes that the data is either a
// population or a sample that includes the most extreme values of the
// underlying population.
//
// If x is an empty slice, it panics.
// If x contains NaN, it returns NaN.
// -Inf is treated as smaller than all other values, 
// Inf is treated as larger than all other values, and
// -0.0 is treated as smaller than 0.0.
//
// If any quantile value is < 0 or > 1, Quantiles panics.
func Quantiles[F ~float64](x []F, quantiles... float64) []F

From here we can get experience and see how widely used this package is and later make a decision one whether this should graduate to a new standard library package math/stats.

@aclements aclements moved this from Likely Accept to Active in Proposals Mar 6, 2025
@aclements
Copy link
Member

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— aclements for the proposal review group

@meling
Copy link

meling commented Mar 6, 2025

What will be the criteria for inclusion in std lib? How will people discover it? Will it be included in some release notes (as an experiment)? Would be nice to have answers to such questions.

@ianlancetaylor
Copy link
Member

I don't think we have any specific criteria for inclusion in the standard library. It will depend on whether people find it useful, and whether the API works well in practice.

Mentioning it in the Go release notes sounds reasonable. That's what we did for the x/exp/slices and maps packages (https://go.dev/doc/go1.18#generics).

@glycerine
Copy link

glycerine commented Mar 11, 2025

Generalizing Ian's suggestion, wouldn't it be more general to take ...F arguments, allowing for compact one-liners like

   mean := stats.Mean(1, 2, 3, 4, 5, 6)

so:

func Mean[F ~float64](x ...F) F
func MeanAndStdDev[F ~float64](x ...F) (mean, stddev F)
func Median[F ~float64](x ...F) F

Edit: I guess a small drawback of that would be you would have to add ... to slices

    a := []float64{4, 5, 6}
    m := stats.Mean(a...)

@aclements
Copy link
Member

Generalizing Ian's suggestion, wouldn't it be more general to take ...F arguments

It seems far more likely that a call site will have a slice of data than a statically-known, fixed length sequence. Given that either can be transformed into the other, I'd lean toward slices.

@aclements
Copy link
Member

Based on the discussion above, this proposal seems like a likely accept.
— aclements for the proposal review group

The proposal is to add a golang.org/x/exp/stats package with the following API:

// Package stats provides basic descriptive statistics.
//
// This is not intended as a comprehensive statistics package, but is
// intended to provide common, everyday statistical functions.
//
// These functions aim to balance performance and accuracy, but some
// amount of error is inevitable in floating-point computations.
// The underlying implementations may change, resulting in small
// changes in their results from version to version. If the caller
// needs particular guarantees on accuracy and overflow behavior or
// version stability, they should use a more specialized
// implementation.
package stats

// Mean returns the arithmetic mean of the values in x.
//
// Mean does not modify the slice.
//
// If x is an empty slice, it panics.
// If x contains NaN or both Inf and -Inf, it returns NaN.
// If x contains Inf, it returns Inf. If x contains -Inf, it returns -Inf.
func Mean[F ~float64](x []F) F

// MeanAndStdDev returns the arithmetic mean and
// sample standard deviation of x.
//
// MeanAndStdDev does not modify the slice.
//
// If x is an empty slice, it panics.
// If x contains NaN, it returns NaN, NaN.
// If x contains both Inf and -Inf, it returns NaN, Inf.
// If x contains Inf, it returns Inf, Inf. If x contains -Inf, it returns -Inf, Inf.
func MeanAndStdDev[F ~float64](x []F) (mean, stddev F)

// Median returns the median of the values in x.
// If len(x) is even, it returns the mean of the two central values.
//
// Median does not modify the slice.
//
// Median may perform asymptotically faster and allocate
// asymptotically less if the slice is already sorted.
//
// If x is an empty slice, it panics.
// If x contains NaN, it returns NaN.
// -Inf is treated as smaller than all other values, 
// Inf is treated as larger than all other values, and
// -0.0 is treated as smaller than 0.0.
func Median[F ~float64](x []F) F

// Quantiles returns a sequence of quantiles of x.
//
// The returned slice has the same length as the quantiles slice,
// and the elements correspond one-to-one.
// A quantile of 0 corresponds to the minimum value in x and
// a quantile of 1 corresponds to the maximum value in x.
// A quantile of 0.5 is the same as the value returned by [Median].
//
// Quantiles does not modify the slice.
//
// Quantiles may perform asymptotically faster and allocate
// asymptotically less if the slice is already sorted.
//
// There are many methods for computing quantiles. Quantiles uses the
// "inclusive" method, also known as Q7 in Hyndman and Fan, or the
// "linear" or "R-7" method. This assumes that the data is either a
// population or a sample that includes the most extreme values of the
// underlying population.
//
// If x is an empty slice, it panics.
// If x contains NaN, it returns NaN.
// -Inf is treated as smaller than all other values, 
// Inf is treated as larger than all other values, and
// -0.0 is treated as smaller than 0.0.
//
// If any quantile value is < 0 or > 1, Quantiles panics.
func Quantiles[F ~float64](x []F, quantiles... float64) []F

From here we can get experience and see how widely used this package is and later make a decision one whether this should graduate to a new standard library package math/stats.

@aclements aclements moved this from Active to Likely Accept in Proposals Mar 12, 2025
@aclements
Copy link
Member

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— aclements for the proposal review group

The proposal is to add a golang.org/x/exp/stats package with the following API:

// Package stats provides basic descriptive statistics.
//
// This is not intended as a comprehensive statistics package, but is
// intended to provide common, everyday statistical functions.
//
// These functions aim to balance performance and accuracy, but some
// amount of error is inevitable in floating-point computations.
// The underlying implementations may change, resulting in small
// changes in their results from version to version. If the caller
// needs particular guarantees on accuracy and overflow behavior or
// version stability, they should use a more specialized
// implementation.
package stats

// Mean returns the arithmetic mean of the values in x.
//
// Mean does not modify the slice.
//
// If x is an empty slice, it panics.
// If x contains NaN or both Inf and -Inf, it returns NaN.
// If x contains Inf, it returns Inf. If x contains -Inf, it returns -Inf.
func Mean[F ~float64](x []F) F

// MeanAndStdDev returns the arithmetic mean and
// sample standard deviation of x.
//
// MeanAndStdDev does not modify the slice.
//
// If x is an empty slice, it panics.
// If x contains NaN, it returns NaN, NaN.
// If x contains both Inf and -Inf, it returns NaN, Inf.
// If x contains Inf, it returns Inf, Inf. If x contains -Inf, it returns -Inf, Inf.
func MeanAndStdDev[F ~float64](x []F) (mean, stddev F)

// Median returns the median of the values in x.
// If len(x) is even, it returns the mean of the two central values.
//
// Median does not modify the slice.
//
// Median may perform asymptotically faster and allocate
// asymptotically less if the slice is already sorted.
//
// If x is an empty slice, it panics.
// If x contains NaN, it returns NaN.
// -Inf is treated as smaller than all other values, 
// Inf is treated as larger than all other values, and
// -0.0 is treated as smaller than 0.0.
func Median[F ~float64](x []F) F

// Quantiles returns a sequence of quantiles of x.
//
// The returned slice has the same length as the quantiles slice,
// and the elements correspond one-to-one.
// A quantile of 0 corresponds to the minimum value in x and
// a quantile of 1 corresponds to the maximum value in x.
// A quantile of 0.5 is the same as the value returned by [Median].
//
// Quantiles does not modify the slice.
//
// Quantiles may perform asymptotically faster and allocate
// asymptotically less if the slice is already sorted.
//
// There are many methods for computing quantiles. Quantiles uses the
// "inclusive" method, also known as Q7 in Hyndman and Fan, or the
// "linear" or "R-7" method. This assumes that the data is either a
// population or a sample that includes the most extreme values of the
// underlying population.
//
// If x is an empty slice, it panics.
// If x contains NaN, it returns NaN.
// -Inf is treated as smaller than all other values, 
// Inf is treated as larger than all other values, and
// -0.0 is treated as smaller than 0.0.
//
// If any quantile value is < 0 or > 1, Quantiles panics.
func Quantiles[F ~float64](x []F, quantiles... float64) []F

From here we can get experience and see how widely used this package is and later make a decision one whether this should graduate to a new standard library package math/stats.

@aclements aclements moved this from Likely Accept to Accepted in Proposals Mar 19, 2025
@aclements aclements changed the title proposal: x/exp/stats: new package with Mean, Median, more x/exp/stats: new package with Mean, Median, more Mar 19, 2025
@aclements aclements modified the milestones: Proposal, Backlog Mar 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Accepted
Development

No branches or pull requests