Fix two bugs in `cbits` relating to distribution values #35

thoughtpolice · 2019-12-02T19:30:25Z

The included commit messages have extensive details included, and are independent of each other, but I decided to submit them as a batch.

The internal `combine` function that is used to read the value of a `Distribution` contains `min` and `max` values, and as the name implies, combines them in the Semigroup sense: to combine two samples `a` and `b`, for `max` you simply calculate `max(a->max, b->max)`, likewise for `min` While this is fine in theory, in practice it is flawed for this case: to read a metric's current values, you must combine it with an empty sample, where `min = max = 0`. This logic is incorrect, because any value that was previously seen that is greater than zero will be overwritten by zero every time you read the sample. This is a flat out bug. (Likewise for `max` and values less-than zero). In more abstract terms, the identity members of the `min/max` monoids are *not* zero, but `maxBound/minBound`! However, there's an easier way to go forward here: `combine` is *only* ever used with a zero structure on the right hand side, and it is completely internal. Therefore the logic is trivial: just copy the `max/min` values from the old structure into the new structure, since there's no point in even doing the comparison anyway. Signed-off-by: Austin Seipp <[email protected]>

A newly-created a `Distribution` has a flaw: its `mean` value is set to `NaN` by default upon the first reading, providing no samples have been previously added. Why? Because a newly created `Distribution` named `d` has a field `d->count = 0`, which is used as a divisor for a float value. And the numerator ends up zero too, as well. IEEE-754 defines the value of 0.0/0.0 as NaN, and so on the first reading of a `Distribution` with no samples, `NaN` is returned for the `mean`. This is problematic for a use case of mine: I want to use `ekg-statsd` to export `Distribution` values a metric logging system. However, without this patch, the `mean` value is reported as `NaN` (thanks to its `Show` instance), which causes the logging system to reject the metric because it strictly expects floating-point values. I wouldn't be surprised if other systems rejected `NaN` in such a case when they try to scrape metrics. And it's not something clients of `ekg-core` should really check for. In this case, the fix is a little simple: we just check for `count == 0` and return a mean of `0.0` if that's the case. Signed-off-by: Austin Seipp <[email protected]>

23Skidoo · 2020-03-15T19:26:54Z

In more abstract terms, the identity members of the min/max monoids
are not zero, but maxBound/minBound!

Can we fix this by setting cMin to maxBound and cMax to minBound in newCDistrib instead?

23Skidoo · 2020-03-15T20:44:26Z

Answering myself: see this comment by @thoughtpolice.

23Skidoo · 2020-03-15T20:52:55Z

Merged and released on Hackage.

thoughtpolice added 2 commits December 2, 2019 13:10

thoughtpolice mentioned this pull request Dec 2, 2019

Stats min is always 0 #31

Open

23Skidoo merged commit 125e4fe into haskell-github-trust:master Mar 15, 2020

thoughtpolice deleted the aseipp/fix-minmax-combine branch February 14, 2021 03:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix two bugs in `cbits` relating to distribution values #35

Fix two bugs in `cbits` relating to distribution values #35

Uh oh!

thoughtpolice commented Dec 2, 2019

Uh oh!

23Skidoo commented Mar 15, 2020

Uh oh!

23Skidoo commented Mar 15, 2020

Uh oh!

23Skidoo commented Mar 15, 2020

Uh oh!

Uh oh!

Fix two bugs in cbits relating to distribution values #35

Fix two bugs in cbits relating to distribution values #35

Uh oh!

Conversation

thoughtpolice commented Dec 2, 2019

Uh oh!

23Skidoo commented Mar 15, 2020

Uh oh!

23Skidoo commented Mar 15, 2020

Uh oh!

23Skidoo commented Mar 15, 2020

Uh oh!

Uh oh!

Fix two bugs in `cbits` relating to distribution values #35

Fix two bugs in `cbits` relating to distribution values #35