-
Notifications
You must be signed in to change notification settings - Fork 18k
x/exp/stats: new package with Mean, Median, more #69264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Related Issues and Documentation (Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.) |
In general the math package aims to provide the functions that are in the C++ standard library |
Thanks for the feedback! I get that the math package is meant to mirror the functions in C++'s |
Can you do some detective work to see how people are dealing with this in open source Go now? Is there some go-stats package that has a million stars on Github? Are there ten libraries that are each imported five hundred times? Seeing that something has big demand already is important for bringing something that could be in a third party library into the standard library. Otherwise this will just get closed with "write a third party library." Which has certainly happened to me more than once! |
I’ve done some digging into how statistical functions are currently being handled in the Go community. While libraries like Gonum and others provide statistical methods, there's no single source of truth or dominant package in this space, and many are designed for more complex or specialized tasks. However, the basic statistical functions we're proposing—like By integrating these into the standard library, we'd eliminate the need for external dependencies for basic tasks, which is in line with Go's philosophy of having a strong standard library for common use cases. While third-party packages are an option, including these functions in the |
this is the part where we need to see evidence. especially considering the existence of libraries like gonum, how often does the need arise for functions like those proposed where you wouldn't need the extra functionality that other libraries provide. |
For what it's worth, python has a statistics package in its standard library: https://docs.python.org/3/library/statistics.html It would be nice to have a simple package everyone agrees on for common use cases, but that doesn't necessarily need to be in std. |
These functions sound pretty simple, but I think there's actually a lot of subtlety here. For instance, what does |
in my experience everytime some numeric problems comes up gonum lib is suggested. they have a stats package https://pkg.go.dev/gonum.org/v1/[email protected]/stat |
Yeah, so think about having it's functionalities in go std lib straight away ! |
Gonum library is indeed often suggested for statistical and numerical work in Go, and it has a dedicated However, my proposal is focused on adding foundational statistical functions like |
IMHO these functions would be very useful in the standard library, even if (or indeed, because) the implementation requires some care. There are many "quick" uses of these basic stats operations in testing, benchmarking, and writing CL descriptions that shouldn't require a heavyweight dependency on a fully-featured third-party stats library. (I often end up moving data out of my Go program to the shell and running the github.com/nferraz/st command.) Another function I would like is Percentile(n, series), which reports the nth percentile value of a given series. |
If it belongs in |
Here is a small experience report with existing stats packages: In some code I was using gonum’s stats package, and a collaborator started using github.com/montanaflynn/stats as well, whose API returns an error (which I felt was annoying.) Luckily, I caught the unnecessary dependency in code review. These are the types of things that can easily cause unnecessary dependencies to get added in projects. Hence, I think adding common statistics functions would be a great addition to the std. |
It seems like a lot of developers will benefit from this !! |
Can I know the update on this proposal ??_ |
The proposal review committee will likely look at it this week. It usually takes a few rounds to reach a final decision. |
OK, Cool ! |
Can I know the update on this proposal please ? |
Sorry, we didn't get to it last week, but perhaps will this week. |
Yes Please.... |
Some of the questions raised in the meeting were:
|
Thanks for the feedback! I totally get the concerns and here’s my take:
Overall, I think this keeps the package lightweight, practical, and easy to use, which should be the priority. Looking forward to hearing your thoughts! |
I think the goal of limiting the scope would be to ensure that these (other than Percentile) are not potential additions. ;-) I agree that a slice result for Mode seems appropriate. Perhaps it should be called Modes. |
Calculating stats one at a time (or 2 at a time) seems a bit short sighted. I always need min,max,avg,stddev together personally (and often p50, p75, p99,… here is what I use and wrote ages ago: |
I know that this is the final comment period, and I'm now late to the party. Sorry. But it has occurred to me that, given we're specifying that the APIs cannot modify the slices, couldn't/shouldn't these take an I mean, now that we have a standard iteration API, does it make sense to continue to pile up APIs that work exclusively on slices? Also, I'm putting this out as a general principle, even if you decide this API should really have slices: always at least consider if a immutable slice based API can use |
Using |
I'll note that I suggested iter.Seq above and it's been already rejected. |
I've been watching and mulling over this conversation. Though I was initially 👍🏽, it now appears to me:
Given that, I have a hard time convincing myself that this really pulls its weight in the standard library. |
Another potential downside of an iterator is that it cuts off certain implementation strategies. For mean and standard deviation, it makes it very difficult to do pairwise summation for high accuracy, though maybe that's okay. For median and quantiles, it basically forces the implementation to copy large parts of the sequence. (Though, a streaming API using sketching could deal with this better, at the cost of accuracy.)
@josharian , could you expand on this? I see three "shapes" of input: a slice, an iterator, or a stream. Is that what you mean? By the shape of the output, do you mean which subset of stats the caller wants? I really think that making the API convoluted just to avoid a couple passes over the slice is the wrong engineering trade-off. Most deep stats packages don't even do this, and it's fine.
I'm certainly not opposed to this, I just think we explored it enough here to conclude that it's going to need its own proposed API and probably an example implementation. |
For input, there's slice vs iterator. Orthogonally, there's (For slice vs iterator in particular, this is reminiscent of the xiter API challenges #61898 (comment). Feels like there's a deeper language problem lurking here.) For output, yes, which subset of stats. You risk repeating work, doing unnecessary work, or having an awful "I want these fields populated" API. Again, all of these are surmountable, if the value provided by the package is high: the code is subtle or error-prone or intricate or non-obvious or large or extremely common. On the flip side, the streamlined API might end up being restrictive enough that people don't/can't end up using it frequently enough to warrant it being in the standard library. That's the argument, anyway. But I'm not going to die on this hill. :) |
If you want a linear time algorithm, you have to copy anyway (because you can't modify the slice). Point taken, though. |
There's enough opposition and divergence in opinion that we're going to take this out of "likely accept." Let's try putting the slices API I proposed above into x/exp/stats. That way we can get some actual experience with implementation and use and see if it's valuable to the ecosystem. |
Have all remaining concerns about this proposal been addressed? The proposal is to add a
From here we can get experience and see how widely used this package is and later make a decision one whether this should graduate to a new standard library package |
This proposal has been added to the active column of the proposals project |
What will be the criteria for inclusion in std lib? How will people discover it? Will it be included in some release notes (as an experiment)? Would be nice to have answers to such questions. |
I don't think we have any specific criteria for inclusion in the standard library. It will depend on whether people find it useful, and whether the API works well in practice. Mentioning it in the Go release notes sounds reasonable. That's what we did for the x/exp/slices and maps packages (https://go.dev/doc/go1.18#generics). |
Generalizing Ian's suggestion, wouldn't it be more general to take ...F arguments, allowing for compact one-liners like
so:
Edit: I guess a small drawback of that would be you would have to add ... to slices
|
It seems far more likely that a call site will have a slice of data than a statically-known, fixed length sequence. Given that either can be transformed into the other, I'd lean toward slices. |
Based on the discussion above, this proposal seems like a likely accept. The proposal is to add a
From here we can get experience and see how widely used this package is and later make a decision one whether this should graduate to a new standard library package |
No change in consensus, so accepted. 🎉 The proposal is to add a
From here we can get experience and see how widely used this package is and later make a decision one whether this should graduate to a new standard library package |
Description:
This proposal aims to enhance the Go standard library’s
math
(math/stats.go
)package by introducing several essential statistical functions. The proposed functions are:and many more....
Motivation:
The inclusion of these statistical functions directly in the
math
package will offer Go developers robust tools for data analysis and statistical computation, enhancing the language's utility in scientific and financial applications. Currently, developers often rely on external libraries for these calculations, which adds dependencies and potential inconsistencies. Integrating these functions into the standard library will:Design:
The functions will be added to the existing
math
package, ensuring they are easy to use and integrate seamlessly with other mathematical operations. Detailed documentation and examples will be provided to illustrate their usage and edge case handling.Examples:
@gabyhelp's overview of this issue: #69264 (comment)
The text was updated successfully, but these errors were encountered: