From: plasticchicken@... Date: 2014-11-28T08:38:56+00:00 Subject: [ruby-core:66548] [ruby-trunk - Feature #10552] [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies Issue #10552 has been updated by Brian Hempel. Yes, I would rather see `Hash#map_values` in Ruby before `Enumerable#frequencies`. However, if both `map_values` and `frequencies` were added, then we might not need `relative_frequencies`, since calculating it becomes cleaner: ~~~ruby array = %w[cat bird bird horse] array.frequencies.map_values { |n| n.to_f / array.size } ~~~ I think that counting everything up is more like pre-statistics. I want to count things more often than I want to take a mean or a standard deviation. Also, most statistical measures operate only on collections of numbers. In contrast, counting frequencies works on collections of anything, not just numbers. We could call the method `counts` instead of `frequencies` to make it sound less like statistics and more like counting. To revise your example in favor of this patch: if you want the frequencies sorted, the "manual way" becomes longer: ~~~ruby %w[cat bird bird horse].group_by(&:identity).map_values(&:count).sort_by(&:last).reverse.to_h ~~~ ---------------------------------------- Feature #10552: [PATCH] Add Enumerable#frequencies and Enumerable#relative_frequencies https://bugs.ruby-lang.org/issues/10552#change-50164 * Author: Brian Hempel * Status: Open * Priority: Normal * Assignee: * Category: core * Target version: ---------------------------------------- Counting how many times a value appears in some collection has always been a bit clumsy in Ruby. While Ruby has enough constructs to do it in one line, it still requires knowing the folklore of the optimum solution as well as some acrobatic typing: ~~~ruby %w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 } # => {"cat" => 1, "bird" => 2, "horse" => 1} ~~~ What if Ruby could count for us? This patch adds two methods to enumerables: ~~~ruby %w[cat bird bird horse].frequencies # => {"bird" => 2, "horse" => 1, "cat" => 1} %w[cat bird bird horse].relative_frequencies # => {"bird" => 0.5, "horse" => 0.25, "cat" => 0.25} ~~~ To make programmers happier, the returned hash has the most common values first. This is nice because, for example, finding the most common element of a collection becomes trivial: ~~~ruby most_common, count = %w[cat bird bird horse].frequencies.first ~~~ Whereas the best you can do with vanilla Ruby is: ~~~ruby most_common, count = %w[cat bird bird horse].each_with_object(Hash.new(0)) { |word, hash| hash[word] += 1 }.max_by(&:last) # or... most_common, count = %w[cat bird bird horse].group_by(&:to_s).map { |word, arr| [word, arr.size] }.max_by(&:last) ~~~ While I don't like the long method names, "frequencies" and "relative frequencies" are the terms used in basic statistics. http://en.wikipedia.org/wiki/Frequency_%28statistics%29 ---Files-------------------------------- add_enum_frequencies.patch (5.81 KB) -- https://bugs.ruby-lang.org/