Chap 16 Stream 68 144
Chap 16 Stream 68 144
// Querying an Optional:
// System.out.println(recipe1.getCalories()
// .getAsInt()); // NoSuchElementException
System.out.println((recipe1.getCalories().isPresent()
? recipe1.getCalories().getAsInt()
: "Unknown calories.")); // Unknown calories.
• Searching operations
These operations perform a search operation to determine a match or find an
element as explained below.
All search operations are short-circuit operations; that is, the operation can termi-
nate once the result is determined, whether or not all elements in the stream
have been considered.
Search operations can be further classified into two subgroups:
❍ Matching operations
The three terminal operations anyMatch(), allMatch(), and noneMatch() deter-
mine whether stream elements match a given Predicate specified as an argu-
ment to the method (p. 949). As expected, these operations return a boolean
value to indicate whether the match was successful or not.
❍ Finding operations
The two terminal operations findAny() and findFirst() find any element and
the first element in a stream, respectively, if such an element is available (p.
952). As the stream might be empty and such an element might not exist, these
operations return an Optional.
• Reduction operations
A reduction operation computes a result from combining the stream elements
by successively applying a combining function; that is, the stream elements are
reduced to a result value. Examples of reductions are computing the sum or
average of numeric values in a numeric stream, and accumulating stream ele-
ments into a collection.
We distinguish between two kinds of reductions:
❍ Functional reduction
A terminal operation is a functional reduction on the elements of a stream if it
reduces the elements to a single immutable value which is then returned by the
operation.
The overloaded reduce() method provided by the Stream API can be used to
implement customized functional reductions (p. 955), whereas the terminal
operations count(), min(), and max() implement specialized functional reduc-
tions (p. 953).
Functional reductions on numeric streams are discussed later in this section (p.
972).
❍ Mutable reduction
A terminal operation performs a mutable reduction on the elements of a stream if
it uses a mutable container—for example, a list, a set, or a map—to accumulate
values as it processes the stream elements. The operation returns the mutable
container as the result of the operation.
The Stream API provides two overloaded collect() methods that perform
mutable reduction (p. 964). One overloaded collect() method can be used to
948 CHAPTER 16: STREAMS
CD.cdList.stream().map(CD::title).forEach(printStr); // (1a)
//Java Jive|Java Jam|Lambda Dancing|Keep on Erasing|Hot Generics|
16.7: TERMINAL STREAM OPERATIONS 949
CD.cdList.stream().parallel().map(CD::title).forEach(printStr); // (1b)
//Lambda Dancing|Hot Generics|Keep on Erasing|Java Jam|Java Jive|
CD.cdList.stream().parallel().map(CD::title).forEachOrdered(printStr); // (2b)
//Java Jive|Java Jam|Lambda Dancing|Keep on Erasing|Hot Generics|
The discussion above also applies when the forEach() and forEachOrdered() termi-
nal operations are invoked on numeric streams. The nondeterministic behavior of
the forEach() terminal operation for int streams is illustrated below. The terminal
operation on the sequential int stream at (3a) seems to respect the encounter order,
but should not be relied upon. The terminal operation on the parallel int stream at
(3b) can give different results for different runs.
IntConsumer printInt = n -> out.print(n + "|");
Matching Elements
The match operations determine whether any, all, or none of the stream elements
satisfy a given Predicate. These operations are not reductions, as they do not
always consider all elements in the stream in order to return a result.
Analogous match operations are also provided by the numeric stream interfaces.
boolean anyMatch(Predicate<? super T> predicate)
boolean allMatch(Predicate<? super T> predicate)
boolean noneMatch(Predicate<? super T> predicate)
These three terminal operations determine whether any, all, or no elements of
this stream match the specified predicate, respectively.
950 CHAPTER 16: STREAMS
The methods may not evaluate the predicate on all elements if it is not neces-
sary for determining the result; that is, they are short-circuit operations.
If the stream is empty, the predicate is not evaluated.
The anyMatch() method returns false if the stream is empty.
The allMatch() and noneMatch() methods return true if the stream is empty.
There is no guarantee that these operations will terminate if applied to an infi-
nite stream.
The queries at (1), (2), and (3) below determine whether any, all, or no CDs are jazz
music CDs, respectively. At (1), the execution of the pipeline terminates as soon as
any jazz music CD is found—the value true is returned. At (2), the execution of the
pipeline terminates as soon as a non-jazz music CD is found—the value false is
returned. At (3), the execution of the pipeline terminates as soon as a jazz music
CD is found—the value false is returned.
boolean anyJazzCD = CD.cdList.stream().anyMatch(CD::isJazz); // (1) true
boolean allJazzCds = CD.cdList.stream().allMatch(CD::isJazz); // (2) false
boolean noJazzCds = CD.cdList.stream().noneMatch(CD::isJazz); // (3) false
The query at (4) determines that no CDs were released in 2015. The queries at (5)
and (6) are equivalent. If all CDs were released after 2015, then none were released
in or before 2015 (negation of the predicate gt2015).
boolean noneEQ2015 = CD.cdList.stream().noneMatch(eq2015); // (4) true
boolean allGT2015 = CD.cdList.stream().allMatch(gt2015); // (5) true
boolean noneNotGT2015 = CD.cdList.stream().noneMatch(gt2015.negate()); // (6) true
The code below uses the anyMatch() method on an int stream to determine whether
any year is a leap year.
IntStream yrStream = IntStream.of(2018, 2019, 2020);
IntPredicate isLeapYear = yr -> Year.of(yr).isLeap();
boolean anyLeapYear = yrStream.anyMatch(isLeapYear);
out.println("Any leap year: " + anyLeapYear); // true
import java.util.Arrays;
import java.util.stream.IntStream;
In the code below, the encounter order of the stream is the positional order of the
elements in the list. The first element returned by the findFirst() method at (1) is
the first element in the CD list.
Optional<CD> firstCD1 = CD.cdList.stream().findFirst(); // (1)
out.println(firstCD1.map(CD::title).orElse("No first CD.")); // (2) Java Jive
Since such an element might not exist—for example, the stream might be empty—
the method returns an Optional<T> object. At (2), the Optional<CD> object returned
by the findFirst() method is mapped to an Optional<String> object that encapsu-
lates the title of the CD. The orElse() method on this Optional<String> object
returns the CD title or the argument string if there is no such CD.
If the encounter order is not of consequence, the findAny() method can be used, as
it is nondeterministic—that is, it does not guarantee the same result on the same
16.7: TERMINAL STREAM OPERATIONS 953
The match methods only determine whether any elements satisfy a Predicate, as
seen at (5) below. Typically, a find terminal operation is used to find the first ele-
ment made available to the terminal operation after processing by the intermediate
operations in the stream pipeline. At (6), the filter() operation will filter the jazz
music CDs from the stream. However, the findAny() operation will return the first
jazz music CD that is filtered and then short-circuit the execution.
boolean anyJazzCD = CD.cdList.stream().anyMatch(CD::isJazz); // (5)
out.println("Any Jazz CD: " + anyJazzCD); // Any Jazz CD: true
The code below uses the findAny() method on an IntStream to find whether any
number is divisible by 7.
IntStream numStream = IntStream.of(50, 55, 65, 70, 75, 77);
OptionalInt intOpt = numStream.filter(n -> n % 7 == 0).findAny();
intOpt.ifPresent(System.out::println); // 70
The find operations are guaranteed to terminate when applied to a finite, albeit
empty, stream. However, for an infinite stream in a pipeline, at least one element
must be made available to the find operation in order for the operation to termi-
nate. If the elements of an initial infinite stream are all discarded by the intermedi-
ate operations, the find operation will not terminate, as in the following pipeline:
Stream.generate(() -> 1).filter(n -> n == 0).findAny(); // Never terminates.
Counting Elements
The count() operation performs a functional reduction on the elements of a stream,
as each element contributes to the count which is the single immutable value
returned by the operation. The count() operation reports the number of elements
that are made available to it, which is not necessarily the same as the number of
elements in the initial stream, as elements might be discarded by the intermediate
operations.
The code below finds the total number of CDs in the streams, and how many of
these CDs are jazz music CDs.
long numOfCDS = CD.cdList.stream().count(); // 5
long numOfJazzCDs = CD.cdList.stream().filter(CD::isJazz).count(); // 3
954 CHAPTER 16: STREAMS
The count() method is also defined for the numeric streams. Below it is used on an
IntStream to find how many numbers between 1 and 100 are divisible by 7.
IntStream numStream = IntStream.rangeClosed(1, 100);
long divBy7 = numStream.filter(n -> n % 7 == 0).count(); // 14
long count()
This terminal operation returns the count of elements in this stream—that is,
the length of this stream.
This operation is a special case of a functional reduction.
The operation does not terminate when applied to an infinite stream.
Both methods return an Optional, as the minimum and maximum elements might
not exist—for example, if the stream is empty. The code below finds the minimum
and maximum elements in a stream of CDs, according to their natural order. The
artist name is the most significant field according to the natural order defined for
CDs (p. 883).
Optional<CD> minCD = CD.cdList.stream().min(Comparator.naturalOrder());
minCD.ifPresent(out::println); // <Funkies, "Lambda Dancing", 10, 2018, POP>
out.println(minCD.map(CD::artist).orElse("No min CD.")); // Funkies
In the code below, the max() method is applied to an IntStream to find the largest
number between 1 and 100 that is divisible by 7.
IntStream iStream = IntStream.rangeClosed(1, 100);
OptionalInt maxNum = iStream.filter(n -> n % 7 == 0).max(); // 98
16.7: TERMINAL STREAM OPERATIONS 955
The idiom of using a loop for calculating the sum of a finite number of values is
something that is ingrained into all aspiring programmers. A loop-based solution
to calculate the total number of tracks on CDs in a list is shown below, where the
variable sum will hold the result after the execution of the for(:) loop:
int sum = 0; // (1) Initialize the partial result.
for (CD cd : CD.cdList) { // (2) Iterate over the list.
int numOfTracks = cd.noOfTracks(); // (3) Get the current value.
sum = sum + numOfTracks; // (4) Calculate new partial result.
}
16.7: TERMINAL STREAM OPERATIONS 957
Apart from the for(:) loop at (2) to iterate over all elements of the list and read the
number of tracks in each CD at (3), the two necessary steps are:
• Initialization of the variable sum at (1)
• The accumulative operation at (4) that is applied repeatedly to compute a new
partial result in the variable sum, based on its previous value and the number of
tracks in the current CD
The loop-based solution above can be translated to a stream-based solution, as
shown in Figure 16.11. All the code snippets can be found in Example 16.11.
cd4 cd3 8 + 32
+
Final
cd4 10 42
result
+ Accumulator
(b) Stream pipeline
In Figure 16.11, the stream created at (6) internalizes the iteration over the ele-
ments. The mapToInt() intermediate operation maps each CD to its number of
tracks at (7)—the Stream<CD> is mapped to an IntStream. The reduce() terminal oper-
ation with two arguments computes and returns the total number of tracks:
• Its first argument at (8) is the identity element that provides the initial value for
the operation and is also the default value to return if the stream is empty. In
this case, this value is 0.
• Its second argument at (9) is the accumulator that is implemented as a lambda
expression. It repeatedly computes a new partial sum based on the previous
partial sum and the number of tracks in the current CD, as evident from
958 CHAPTER 16: STREAMS
In Example 16.11, the stream pipeline at (10) prints the actions taken by the accu-
mulator which is now augmented with print statements. The output at (3) shows
that the accumulator actions correspond to those in Figure 16.11.
The single-argument reduce() method only accepts an accumulator. As no explicit
default or initial value can be specified, this method returns an Optional. If the
stream is not empty, it uses the first element as the initial value; otherwise, it
returns an empty Optional. In Example 16.11, the stream pipeline at (13) uses the
single-argument reduce() method to compute the total number of tracks on CDs.
The return value is an OptionalInt that can be queried to extract the encapsulated
int value.
OptionalInt optSumTracks0 = CD.cdList // (13)
.stream()
.mapToInt(CD::noOfTracks)
.reduce(Integer::sum); // (14)
out.println("Total number of tracks: " + optSumTracks0.orElse(0)); // 42
We can again augment the accumulator with print statements as shown at (16) in
Example 16.11. The output at (5) shows that the number of tracks from the first CD
was used as the initial value before the accumulator is applied repeatedly to the
rest of the values.
import java.util.Comparator;
import java.util.Optional;
import java.util.OptionalInt;
import java.util.function.BinaryOperator;
// Compare by CD title.
Comparator<CD> cmpByTitle = Comparator.comparing(CD::title); // (26)
BinaryOperator<CD> maxByTitle =
(cd1, cd2) -> cmpByTitle.compare(cd1, cd2) > 0 ? cd1 : cd2; // (27)
Keep on Erasing
Stream<CD>
0
parallel- reduce()
Stream()
cd0 8 + 8
Stream<CD>
+ 14
0
parallel- reduce()
Stream()
cd1 6 + 6
Stream<CD>
+ 42
0
parallel- reduce()
Stream()
cd2 10 + 10
Stream<CD>
+ 18
0
parallel- reduce()
Stream()
cd3 8 + 8 + 28
Stream<CD>
0
+ Accumulator
parallel- reduce()
Stream() +
cd4 10 + 10 Combiner
Stream<CD> Stream<Integer>
Contents of
CD.cdList
stream() map() collect() [] S
S Supplier
A Accumulator
(b) Sequential mutual reduction
Stream<CD> Stream<Integer>
S []
parallel-
map() collect()
Stream()
cd0 8 8 A [8]
Stream<CD> Stream<Integer>
C [8, 6]
S []
parallel-
map() collect()
Stream()
cd1 6 6 A [6]
Stream<CD> Stream<Integer>
S [] C [8, 6,
parallel- 10, 8,
map() collect() 10]
Stream()
cd2 10 10 A [10]
Stream<CD> Stream<Integer>
C [10, 8]
S []
parallel-
map() collect()
Stream()
cd3 8 8 A [8] C [10, 8, 10]
Stream<CD> Stream<Integer>
S Supplier
S []
parallel- A Accumulator
map() collect()
Stream()
cd4 10 10 A [10] C Combiner
as partial result containers by the accumulator, and are later merged by the com-
biner to a final result container. The containers created by the supplier are mutated
by the accumulator and the combiner to perform mutable reduction. The partial
result containers are also merged in parallel by the combiner. It is instructive to
contrast this combiner with the combiner for parallel functional reduction that is
illustrated in Figure 16.12, p. 963.
In Example 16.12, the stream pipeline at (7) also creates a list containing the num-
ber of tracks on each CD, where the stream is parallel, and the lambda expressions
implementing the argument functions of the collect() method are augmented
with print statements so that actions of the functions can be logged. The output
from this parallel mutable reduction shows that the combiner is executed multiple
times to merge partial result lists. The actions of the argument functions shown in
the output are the same as those illustrated in Figure 16.14b. Of course, multiple
runs of the pipeline can show different sequences of operations in the output, but
the final result in the same. Also note that the elements retain their relative position
in the partial result lists as these are combined, preserving the encounter order of
the stream.
Although a stream is executed in parallel to perform mutable reduction, the merg-
ing of the partial containers by the combiner can impact performance if this is too
costly. For example, merging mutable maps can be costly compared to merging
mutable lists. This issue is further explored for parallel streams in §16.9, p. 1009.
import java.util.ArrayList;
import java.util.List;
import java.util.Set;
import java.util.TreeSet;
import java.util.stream.Stream;
// .stream() // (8a)
.parallelStream() // (8b)
.map(CD::noOfTracks) // (9)
.collect( // (10)
() -> { // (11) Supplier
System.out.println("Supplier: Creating an ArrayList");
return new ArrayList<>();
},
(cont, noOfTracks) -> { // (12) Accumulator
System.out.printf("Accumulator: cont:%s, noOfTracks:%s",
cont, noOfTracks);
cont.add(noOfTracks);
System.out.printf(", mutCont:%s%n", cont);
},
(cont1, cont2) -> { // (13) Combiner
System.out.printf("Combiner: con1:%s, cont2:%s", cont1, cont2);
cont1.addAll(cont2);
System.out.printf(", mutCont:%s%n", cont1);
});
System.out.println("Number of tracks on each CD (parallel): " + tracks1);
System.out.println();
// Query: Go bananas.
StringBuilder goneBananas = Stream // (16)
.iterate("ba", b -> b + "na") // (17)
.limit(5)
.peek(System.out::println)
.collect(StringBuilder::new, // (18)
StringBuilder::append,
StringBuilder::append);
System.out.println("Go bananas: " + goneBananas);
}
}
CD titles: [Hot Generics, Java Jam, Java Jive, Keep on Erasing, Lambda Dancing]
ba
bana
banana
bananana
banananana
Go bananas: babanabananabanananabanananana
Example 16.12 also shows how other kinds of containers can be used for mutable
reduction. The stream pipeline at (14) performs mutable reduction to create an
ordered set with CD titles. The supplier is implemented by the constructor refer-
ence TreeSet::new. The constructor will create a container of type TreeSet<String>
that will maintain the CD titles according to the natural order for Strings. The accu-
mulator and the combiner are implemented by the method references TreeSet::add
and TreeSet::addAll, respectively. The accumulator will add a title to a container of
type TreeSet<String> and the combiner will merge the contents of two containers
of type TreeSet<String>.
In Example 16.12, the mutable reduction performed by the stream pipeline at (16)
uses a mutable container of type StringBuilder. The output from the peek() method
shows that the strings produced by the iterate() method start with the initial
string "ba" and are iteratively concatenated with the postfix "na". The limit() inter-
mediate operation truncates the infinite stream to five elements. The collect()
method appends the strings to a StringBuilder. The supplier creates an empty
StringBuilder. The accumulator and the combiner append a CharSequence to a
StringBuilder. In the case of the accumulator, the CharSequence is a String—that is, a
stream element—in the call to the append() method. But in the case of the combiner,
the CharSequence is a StringBuilder—that is, a partial result container when the
stream is parallel. One might be tempted to use a string instead of a StringBuilder,
but that would not be a good idea as a string is immutable.
Note that the accumulator and combiner of the collect() method do not return a
value. The collect() method does not terminate if applied to an infinite stream, as
the method will never finish processing all the elements in the stream.
Because mutable reduction uses the same mutable result container for accumulat-
ing new results by changing the state of the container, it is more efficient than a
functional reduction where a new partial result always replaces the previous par-
tial result.
16.7: TERMINAL STREAM OPERATIONS 971
Collecting to an Array
The overloaded method toArray() can be used to collect or accumulate into an
array. It is a special case of a mutable reduction, and as the name suggests, the
mutable container is an array. The numeric stream interfaces also provide a coun-
terpart to the toArray() method that returns an array of a numeric type.
Object[] toArray()
This terminal operation returns an array containing the elements of this
stream. Note that the array returned is of type Object[].
<A> A[] toArray(IntFunction<A[]> generator)
This terminal operation returns an array containing the elements of this
stream. The provided generator function is used to allocate the desired array.
The type parameter A is the element type of the array that is returned. The size
of the array (which is equal to the length of the stream) is passed to the gener-
ator function as an argument.
Examples of numeric streams whose elements are collected into an array are
shown at (3) and (4). The limit() intermediate operation at (3) converts the infinite
stream into a finite one whose elements are collected into an int array.
int[] intArray1 = IntStream.iterate(1, i -> i + 1).limit(5).toArray();// (3)
// [1, 2, 3, 4, 5]
int[] intArray2 = IntStream.range(-5, 5).toArray(); // (4)
// [-5, -4, -3, -2, -1, 0, 1, 2, 3, 4]
Not surprisingly, when applied to infinite streams the operation results in a fatal
OutOfMemoryError, as the method cannot determine the length of the array and keeps
storing the stream elements, eventually running out of memory.
972 CHAPTER 16: STREAMS
Like any other mutable reduction operation, the toArray() method does not termi-
nate when applied to an infinite stream, unless it is converted into a finite stream
as at (3) above.
Collecting to a List
The method Stream.toList() implements a terminal operation that can be used to
collect or accumulate the result of processing a stream into a list. Compared to the
toArray() instance method, the toList() method is a default method in the Stream
interface. The default implementation returns an unmodifiable list; that is, elements
cannot be added, removed, or sorted. This unmodifiable list is created from the
array into which the elements are accumulated first.
If the requirement is an unmodifiable list that allows null elements, the Stream.to-
List() is the clear and concise choice. Many examples of stream pipelines encoun-
tered so far in this chapter use the toList() terminal operation.
List<String> titles = CD.cdList.stream().map(CD::title).toList();
// [Java Jive, Java Jam, Lambda Dancing, Keep on Erasing, Hot Generics]
titles.add("Java Jingles"); // UnsupportedOperationException!
Like any other mutable reduction operation, the toList() method does not termi-
nate when applied to an infinite stream, unless the stream is converted into a finite
stream.
default List<T> toList()
Accumulates the elements of this stream into a List, respecting any encounter
order the stream may have. The returned List is unmodifiable (§12.2, p. 649),
and calls to any mutator method will always result in an UnsupportedOperation-
Exception. The unmodifiable list returned allows null values.
See also the toList() method in the Collectors class (p. 980).
The Collectors.toCollection(Supplier) method is recommended for greater
control.
Summation
The sum() terminal operation is a special case of a functional reduction that calcu-
lates the sum of numeric values in a stream. The stream pipeline below calculates
the total number of tracks on the CDs in a list. Note that the stream of CD is mapped
to an int stream whose elements represent the number of tracks on a CD. The int
values are cumulatively added to compute the total number of tracks.
int totNumOfTracks = CD.cdList
.stream() // Stream<CD>
.mapToInt(CD::noOfTracks) // IntStream
.sum(); // 42
The query below sums all even numbers between 1 and 100.
int sumEven = IntStream
.rangeClosed(1, 100)
.filter(i -> i % 2 == 0)
.sum(); // 2550
The count() operation is equivalent to mapping each stream element to the value 1
and adding the 1s:
int numOfCDs = CD.cdList
.stream()
.mapToInt(cd -> 1) // CD => 1
.sum(); // 5
Averaging
Another common statistics to calculate is the average of values, defined as the
sum of values divided by the number of values. A loop-based solution to calculate
the average would explicitly sum the values, count the number of values, and
do the calculation. In a stream-based solution, the average() terminal operation can
be used to calculate this value. The stream pipeline below computes the average
number of tracks on a CD. The CD stream is mapped to an int stream whose values
are the number of tracks on a CD. The average() terminal operation adds the number
of tracks and counts the values, returning the average as a double value encapsu-
lated in an OptionalDouble.
OptionalDouble optAverage = CD.cdList
.stream()
.mapToInt(CD::noOfTracks)
.average();
System.out.println(optAverage.orElse(0.0)); // 8.4
The reason for using an Optional is that the average is not defined if there are no
values. The absence of a value in the OptionalDouble returned by the method means
that the stream was empty.
Summarizing
The result of a functional reduction is a single value. This means that for calculat-
ing different results—for example, count, sum, average, min, and max—requires
separate reduction operations on a stream.
The method summaryStatistics() does several common reductions on a stream in a sin-
gle operation and returns the results in an object of type NumTypeSummaryStatistics,
where NumType is Int, Long, or Double. An object of this class encapsulates the count,
sum, average, min, and max values of a stream.
The classes IntSummaryStatistics, LongSummaryStatistics, and DoubleSummaryStatistics
in the java.util package define the following constructor and methods, where NumType
is Int (but it is Integer when used as a type name), Long, or Double, and the corresponding
numtype is int, long, or double:
NumTypeSummaryStatistics()
Creates an empty instance with zero count, zero sum, a min value as Num-
Type.MAX_VALUE, a max value as NumType.MIN_VALUE, and an average value of zero.
double getAverage()
Returns the arithmetic mean of values recorded, or zero if no values have been
recorded.
long getCount()
Returns the count of values recorded.
16.7: TERMINAL STREAM OPERATIONS 975
numtype getMax()
Returns the maximum value recorded, or NumType.MIN_VALUE if no values have
been recorded.
numtype getMin()
Returns the minimum value recorded, or NumType.MAX_VALUE if no values have
been recorded.
numtype getSum()
Returns the sum of values recorded, or zero if no values have been recorded.
The method in the IntSummaryStatistics and LongSummaryStatistics classes
returns a long value. The method in the DoubleSummaryStatistics class returns a
double value.
The default format of the statistics printed by the toString() method of the
IntSummaryStatistics class is shown below:
System.out.println(stats1);
//IntSummaryStatistics{count=2, sum=14, min=6, average=7.000000, max=8}
Below, the accept() method records the value 10 (the number of tracks on CD.cd2)
into the summary information referenced by stats1. The resulting statistics show
the new count is 3 (=2 +1), the new sum is 24 (=14+10), and the new average is 8.0
(=24.0/3.0). However, the min value was not affected but the max value has
changed to 10.
stats1.accept(CD.cd2.noOfTracks()); // Add the value 10.
System.out.println(stats1);
//IntSummaryStatistics{count=3, sum=24, min=6, average=8.000000, max=10}
976 CHAPTER 16: STREAMS
The code below creates another IntSummaryStatistics object that summarizes the
statistics from two other CDs.
IntSummaryStatistics stats2 = List.of(CD.cd3, CD.cd4)
.stream()
.mapToInt(CD::noOfTracks)
.summaryStatistics();
System.out.println(stats2);
//IntSummaryStatistics{count=2, sum=18, min=8, average=9.000000, max=10}
The combine() method incorporates the state of one IntSummaryStatistics object into
another IntSummaryStatistics object. In the code below, the state of the IntSummary-
Statistics object referenced by stats2 is combined with the state of the IntSummary-
Statistics object referenced by stats1. The resulting summary information is
printed, showing that the new count is 5 (=3 +2), the new sum is 42 (=24+18), and
the new average is 8.4 (=42.0/5.0). However, the min and max values were not
affected.
stats1.combine(stats2); // Combine stats2 with stats1.
System.out.println(stats1);
//IntSummaryStatistics{count=5, sum=42, min=6, average=8.400000, max=10}
The summary statistics classes are not exclusive for use with streams, as they pro-
vide a constructor and appropriate methods to incorporate numeric values in
order to calculate common statistics, as we have seen here. We will return to calcu-
lating statistics when we discuss built-in collectors (p. 978).
Any type
parameter
Method name + Function type
(ref.) return type Functional interface parameters of parameters
forEach (p. 948) void (Consumer<T> action) T -> void
toArray (p. 971) <A> A[] (IntFunction<A[]> generator) int -> A[]
16.8 Collectors
A collector encapsulates the functions required for performing reduction: the sup-
plier, the accumulator, the combiner, and the finisher. It can provide these func-
tions since it implements the Collector interface (in the java.util.stream package)
that defines the methods to create these functions. It is passed as an argument to
the collect(Collector) method in order to perform a reduction operation. In con-
trast, the collect(Supplier, BiConsumer, BiConsumer) method requires the functions
supplier, accumulator, and combiner, respectively, to be passed as arguments in the
method call.
Details of implementing a collector are not necessary for our purposes, as we will
exclusively use the extensive set of predefined collectors provided by the static fac-
tory methods of the Collectors class in the java.util.stream package (Table 16.7,
p. 1005). In most cases, it should be possible to find a predefined collector for the task
at hand. The collectors use various kinds of containers for performing reduction—
for example, accumulating to a map, or finding the minimum or maximum ele-
ment. For example, the Collectors.toList() factory method creates a collector that
performs mutable reduction using a list as a mutable container. It can be passed to
the collect(Collector) terminal operation of a stream.
It is a common practice to import the static factory methods of the Collectors class
in the code so that the methods can be called by their simple names.
import static java.util.stream.Collectors.*;
However, the practice adopted in this chapter is to assume that only the Collectors
class is imported, enforcing the connection between the static methods and the
class to be done explicitly in the code. Of course, static import of factory methods
can be used once familiarity with the collectors is established.
import java.util.stream.Collectors;
Collecting to a Collection
The method toCollection(Supplier) creates a collector that uses a mutable con-
tainer of a specific Collection type to perform mutable reduction. A supplier to cre-
ate the mutable container is specified as an argument to the method.
The following stream pipeline creates an ArrayList<String> instance with the titles
of all CDs in the stream. The constructor reference ArrayList::new returns an empty
ArrayList<String> instance, where the element type String is inferred from the con-
text.
ArrayList<String> cdTitles1 = CD.cdList.stream() // Stream<CD>
.map(CD::title) // Stream<String>
.collect(Collectors.toCollection(ArrayList::new));
//[Java Jive, Java Jam, Lambda Dancing, Keep on Erasing, Hot Generics]
Collecting to a List
The method toList() creates a collector that uses a mutable container of type List
to perform mutable reduction. This collector guarantees to preserve the encounter
order of the input stream, if it has one. For more control over the type of the list,
the toCollection() method can be used. This collector can be used as a downstream
collector.
The following stream pipeline creates a list with the titles of all CDs in the stream
using a collector returned by the Collectors.toList() method. Although the
returned list is modified, this is implementation dependent and should not be
relied upon.
List<String> cdTitles3 = CD.cdList.stream() // Stream<CD>
.map(CD::title) // Stream<String>
.collect(Collectors.toList());
//[Java Jive, Java Jam, Lambda Dancing, Keep on Erasing, Hot Generics]
titles.add("Java Jingles"); // OK
Collecting to a Set
The method toSet() creates a collector that uses a mutable container of type Set to
perform mutable reduction. The collector does not guarantee to preserve the
encounter order of the input stream. For more control over the type of the set, the
toCollection() method can be used.
The following stream pipeline creates a set with the titles of all CDs in the stream.
Set<String> cdTitles2 = CD.cdList.stream() // Stream<CD>
.map(CD::title) // Stream<String>
.collect(Collectors.toSet());
//[Hot Generics, Java Jive, Lambda Dancing, Keep on Erasing, Java Jam]
16.8: COLLECTORS 981
Collecting to a Map
The method toMap() creates a collector that performs mutable reduction to a muta-
ble container of type Map.
static <T,K,U> Collector<T,?,Map<K,U>> toMap(
Function<? super T,? extends K> keyMapper,
Function<? super T,? extends U> valueMapper)
The collector returned by the method toMap() uses either a default map or one that
is supplied. To be able to create an entry in a Map<K,U> from stream elements of type
T, the collector requires two functions:
• keyMapper: T -> K, which is a Function to extract a key of type K from a stream ele-
ment of type T.
• valueMapper: T -> U, which is a Function to extract a value of type U for a given
key of type K from a stream element of type T.
Additional functions as arguments allow various controls to be exercised on the
map:
• mergeFunction: (U,U) -> U, which is a BinaryOperator to merge two values that are
associated with the same key. The merge function must be specified if collision
of values can occur during the mutable reduction, or a resounding exception
will be thrown.
982 CHAPTER 16: STREAMS
Stream<CD>
Contents of
the cdList
Map<String,Year>
collect()
Title Year
cd4 cd3 cd2 cd1 cd0
<"Keep on Erasing", 2018>
cd4 cd3 cd2 cd1 <"Java Jam" , 2017>
cd4 cd3
<"Java Jive" , 2017>
cd4
<"Lambda Dancing" , 2018>
As there were no duplicates of the key in the previous two examples, there was no
collision of values in the map. In the list dupList below, there are duplicates of CDs
(CD.cd0, CD.cd1). Executing the pipeline results in a runtime exception at (1).
List<CD> dupList = List.of(CD.cd0, CD.cd1, CD.cd2, CD.cd0, CD.cd1);
Map<String, Year> mapTitleToYear1 = dupList.stream()
.collect(Collectors.toMap(CD::title, CD::year)); // (1)
// IllegalStateException: Duplicate key 2017
16.8: COLLECTORS 983
The collision values can be resolved by specifying a merge function. In the pipeline
below, the arguments of the merge function (y1, y2) -> y1 at (1) have the same value
for the year if we assume that a CD can only be released once. Note that y1 and y2
denote the existing value in the map and the value to merge, respectively. The
merge function can return any one of the values to resolve the collision.
Map<String, Year> mapTitleToYear2 = dupList.stream()
.collect(Collectors.toMap(CD::title, CD::year, (y1, y2) -> y1)); // (1)
The stream pipeline below creates a map of CD titles released each year. As more
than one CD can be released in a year, collision of titles can occur for a year. The
merge function (tt, t) -> tt + ":" + t concatenates the titles in each year separated
by a colon, if necessary. Note that tt and t denote the existing value in the map and
the value to merge, respectively.
Map<Year, String> mapTitleToYear3 = CD.cdList.stream()
.collect(Collectors.toMap(CD::year, CD::title,
(tt, t) -> tt + ":" + t));
//{2017=Java Jive:Java Jam, 2018=Lambda Dancing:Keep on Erasing:Hot Generics}
The stream pipeline below creates a map with the longest title released each year.
For greater control over the type of the map in which to accumulate the entries, a
supplier is specified. The supplier TreeMap::new returns an empty instance of a
TreeMap in which the entries are accumulated. The keys in such a map are sorted in
their natural order—the class java.time.Year implements the Comparable<Year>
interface.
TreeMap<Year, String> mapYearToLongestTitle = CD.cdList.stream()
.collect(Collectors.toMap(CD::year, CD::title,
BinaryOperator.maxBy(Comparator.naturalOrder()),
TreeMap::new));
//{2017=Java Jive, 2018=Lambda Dancing}
Collecting to a ConcurrentMap
If the collector returned by the Collectors.toMap() method is used in a parallel
stream, the multiple partial maps created during parallel execution are merged by
the collector to create the final result map. Merging maps can be expensive if keys
from one map are merged into another. To address the problem, the Collectors
class provides the three overloaded methods toConcurrentMap(), analogous to the
three toMap() methods, that return a concurrent collector—that is, a collector that
uses a single concurrent map to perform the reduction. A concurrent map is thread-
safe and unordered. A concurrent map implements the java.util.concurrent.Concur-
rentMap interface, which is a subinterface of java.util.Map interface (§23.7, p. 1482).
Joining
The joining() method creates a collector for concatenating the input elements of
type CharSequence to a single immutable String. However, internally it uses a muta-
ble StringBuilder. Note that the collector returned by the joining() methods per-
forms functional reduction, as its result is a single immutable string.
static Collector<CharSequence,?,String> joining()
static Collector<CharSequence,?,String> joining(CharSequence delimiter)
static Collector<CharSequence,?,String> joining(CharSequence delimiter,
CharSequence prefix,
CharSequence suffix)
Return a Collector that concatenates CharSequence elements into a String. The
first method concatenates in encounter order. So does the second method, but
this method separates the elements by the specified delimiter. The third
method in addition applies the specified prefix and suffix to the result of the
concatenation.
The wildcard ? is a type parameter that is used internally by the collector.
The methods preserve the encounter order, if the stream has one.
Among the classes that implement the CharSequence interface are the String,
StringBuffer, and StringBuilder classes.
The stream pipelines below concatenate CD titles to illustrate the three overloaded
joining() methods. The CharSequence elements are Strings. The strings are concate-
nated in the stream encounter order, which is the positional order for lists. The
zero-argument joining() method at (1) performs string concatenation of the CD
titles using a StringBuilder internally, and returns the result as a string.
String concatTitles1 = CD.cdList.stream() // Stream<CD>
.map(CD::title) // Stream<String>
.collect(Collectors.joining()); // (1)
//Java JiveJava JamLambda DancingKeep on ErasingHot Generics
The single-argument joining() method at (2) concatenates the titles using the spec-
ified delimiter.
String concatTitles2 = CD.cdList.stream()
.map(CD::title)
.collect(Collectors.joining(", ")); // (2) Delimiter
//Java Jive, Java Jam, Lambda Dancing, Keep on Erasing, Hot Generics
16.8: COLLECTORS 985
The three-argument joining() method at (3) concatenates the titles using the spec-
ified delimiter, prefix, and suffix.
Grouping
Classifying elements into groups based on some criteria is a very common opera-
tion. An example is classifying CDs into groups according to the number of tracks
on them (this sounds esoteric, but it will illustrate the point). Such an operation can
be accomplished by the collector returned by the groupingBy() method. The method
is passed a classifier function that is used to classify the elements into different
groups. The result of the operation is a classification map whose entries are the dif-
ferent groups into which the elements have been classified. The key in a map entry
is the result of applying the classifier function on the element. The key is extracted
from the element based on some property of the element—for example, the num-
ber of tracks on the CD. The value associated with a key in a map entry comprises
those elements that belong to the same group. The operation is analogous to the
group-by operation in databases.
There are three versions of the groupingBy() method that provide increasingly more
control over the grouping operation.
static <T,K> Collector<T,?,Map<K,List<T>>> groupingBy(
Function<? super T,? extends K> classifier)
Stream<CD>
Contents of
CD.cdList
collect() Map<Integer,List<CD>>
No. of tracks List of CDs
cd4 cd3 cd2 cd1 cd0
<6 , [ cd1 ]>
cd4 cd3 cd2 cd1
cd4 cd3
<10 , [ cd2 , cd4 ]>
cd4
The three stream pipelines below result in a classification map that is equivalent to
the one in Figure 16.16. The call to the groupingBy() method at (2) specifies the
downstream collector explicitly, and is equivalent to the call in Figure 16.16.
Map<Integer, List<CD>> map22 = CD.cdList.stream()
.collect(Collectors.groupingBy(CD::noOfTracks, Collectors.toList())); // (2)
The call to the groupingBy() method at (3) specifies the supplier TreeMap:new so that
a TreeMap<Integer, List<CD>> is used as the classification map.
Map<Integer, List<CD>> map33 = CD.cdList.stream()
.collect(Collectors.groupingBy(CD::noOfTracks, // (3)
TreeMap::new,
Collectors.toList()));
The call to the groupingBy() method at (4) specifies the downstream collector
Collector.toSet() that uses a set to accumulate the CDs for a group.
Map<Integer, Set<CD>> map44 = CD.cdList.stream()
.collect(Collectors.groupingBy(CD::noOfTracks, Collectors.toSet())); // (4)
The classification maps created by the pipelines above will contain the three entries
shown below, but only the groupingBy() method call at (3) can guarantee that the
entries will be sorted in a TreeMap<Integer, List<CD>> according to the natural order
for the Integer keys.
{
6=[<Jaav, "Java Jam", 6, 2017, JAZZ>],
8=[<Jaav, "Java Jive", 8, 2017, POP>,
<Genericos, "Keep on Erasing", 8, 2018, JAZZ>],
10=[<Funkies, "Lambda Dancing", 10, 2018, POP>,
<Genericos, "Hot Generics", 10, 2018, JAZZ>]
}
Multilevel Grouping
The downstream collector in a groupingBy() operation can be created by another
groupingBy() operation, resulting in a multilevel grouping operation—also known as
a multilevel classification or cascaded grouping operation. We can extend the multi-
level groupingBy() operation to any number of levels by making the downstream
collector be a groupingBy() operation.
988 CHAPTER 16: STREAMS
The stream pipeline below creates a classification map in which the CDs are first
grouped by the number of tracks in a CD at (1), and then grouped by the musical
genre of a CD at (2).
Map<Integer, Map<Genre, List<CD>>> twoLevelGrp = CD.cdList.stream()
.collect(Collectors.groupingBy(CD::noOfTracks, // (1)
Collectors.groupingBy(CD::genre))); // (2)
Printing the contents of the resulting classification map would show the following
three entries, not necessarily in this order:
{
6={JAZZ=[<Jaav, "Java Jam", 6, 2017, JAZZ>]},
8={JAZZ=[<Genericos, "Keep on Erasing", 8, 2018, JAZZ>],
POP=[<Jaav, "Java Jive", 8, 2017, POP>]},
10={JAZZ=[<Genericos, "Hot Generics", 10, 2018, JAZZ>],
POP=[<Funkies, "Lambda Dancing", 10, 2018, POP>]}
}
The entries of the resulting classification map can also be illustrated as a two-
dimensional matrix, as shown in Figure 16.16, where the CDs are first grouped into
rows by the number of tracks, and then grouped into columns by the musical
genre. The value of an element in the matrix is a list of CDs which have the same
number of tracks (row) and the same musical genre (column).
Genre
JAZZ POP
6 [ cd1 ]
10 [ cd2 ] [ cd4 ]
The number of groups in the classification map returned by the above pipeline is
equal to the number of distinct values for the number of tracks, as in the single-
level groupingBy() operation. However, each value associated with a key in the outer
classification map is now an inner classification map that is managed by the second-
level groupingBy() operation. The inner classification map has the type Map<Genre,
List<CD>>; in other words, the key in the inner classification map is the musical
genre of the CD and the value associated with this key is a List of CDs with this
musical genre. It is the second-level groupingBy() operation that is responsible for
grouping each CD in the inner classification map. Since no explicit downstream
collector is specified for the second-level groupingBy() operation, it uses the default
downstream collector Collector.toList().
We can modify the multilevel groupingBy() operation to count the CDs that have
the same musical genre and the same number of tracks by specifying an explicit
downstream collector for the second-level groupingBy() operation, as shown at (3).
16.8: COLLECTORS 989
Printing the contents of the resulting classification map produced by this multi-
level groupingBy() operation would show the following three entries, again not nec-
essarily in this order:
{6={JAZZ=1}, 8={JAZZ=1, POP=1}, 10={JAZZ=1, POP=1}}
Grouping to a ConcurrentMap
If the collector returned by the Collectors.groupingBy() method is used in a parallel
stream, the partial maps created during execution are merged to create the final
map—as in the case of the Collectors.toMap() method (p. 983). Merging maps can
carry a performance penalty. The Collectors class provides the three groupingBy-
Concurrent() overloaded methods, analogous to the three groupingBy() methods,
that return a concurrent collector—that is, a collector that uses a single concurrent
map to perform the reduction. The entries in such a map are unordered. A concur-
rent map implements the java.util.concurrent.ConcurrentMap interface (§23.7,
p. 1482).
Usage of the groupingByConcurrent() method is illustrated by the following example
of a parallel stream to create a concurrent map of the number of CDs that have the
same number of tracks.
ConcurrentMap<Integer, Long> map66 = CD.cdList
.parallelStream()
.collect(Collectors.groupingByConcurrent(CD::noOfTracks,
Collectors.counting()));
//{6=1, 8=2, 10=2}
Partitioning
Partitioning is a special case of grouping. The classifier function that was used for
grouping is now a partitioning predicate in the partitioningBy() method. The predi-
cate function returns the boolean value true or false. As the keys of the resulting
map are determined by the classifier function, the keys are determined by the par-
titioning predicate in the case of partitioning. Thus the keys are always of type
990 CHAPTER 16: STREAMS
Boolean, implying that the classification map can have, at most, two map entries. In
other words, the partitioningBy() method can only create, at most, two partitions
from the input elements. The map value associated with a key in the resulting map
is managed by a downstream collector, as in the case of the groupingBy() method.
There are two versions of the partitioningBy() method:
static <T> Collector<T,?,Map<Boolean,List<T>>> partitioningBy(
Predicate<? super T> predicate)
The stream pipeline at (2) is equivalent to the one in Figure 16.18, where the down-
stream collector is specified explicitly.
Map<Boolean, List<CD>> map2 = CD.cdList.stream()
.collect(Collectors.partitioningBy(CD::isPop, Collectors.toList())); // (2)
16.8: COLLECTORS 991
We could have composed a stream pipeline to filter the CDs that are pop music
CDs and collected them into a list. We would have to compose a second pipeline
to find the CDs that are not pop music CDs. However, the partitioningBy() method
does both in a single operation.
Stream<CD>
Contents of
CD.cdList
collect() Map<Boolean,List<CD>>
cd4 cd3
cd4
Multilevel Partitioning
Like the groupingBy() method, the partitioningBy() operation can be used in mul-
tilevel classification. The downstream collector in a partitioningBy() operation can
be created by another partitioningBy() operation, resulting in a multilevel partition-
ing operation—also known as a cascaded partitioning operation. The downstream
collector can also be a groupingBy() operation.
In the stream pipeline below, the CDs are partitioned at (1): one partition for CDs
that are pop music CDs, and one for those that are not. The CDs that are associated
with a key are grouped by the year in which they were released. Note that the CDs
992 CHAPTER 16: STREAMS
that were released in a year are accumulated into a List by the default downstream
collector Collector.toList() that is employed by the groupingBy() operation at (2).
Map<Boolean, Map<Year, List<CD>>> map1 = CD.cdList.stream()
.collect(Collectors.partitioningBy(CD::isPop, // (1)
Collectors.groupingBy(CD::year))); // (2)
Printing the contents of the resulting map would show the following two entries,
not necessarily in this order.
{false={2017=[<Jaav, "Java Jam", 6, 2017, JAZZ>],
2018=[<Genericos, "Keep on Erasing", 8, 2018, JAZZ>,
<Genericos, "Hot Generics", 10, 2018, JAZZ>]},
true={2017=[<Jaav, "Java Jive", 8, 2017, POP>],
2018=[<Funkies, "Lambda Dancing", 10, 2018, POP>]}}
The following code uses the filtering() operation at (2) to group pop music CDs
according to the number of tracks on them. The groupingBy() operation at (1) cre-
ates the groups based on the number of tracks on the CDs, but the filtering() oper-
ation only allows pop music CDs to pass downstream to be accumulated.
// Filtering downstream from grouping.
Map<Integer, List<CD>> grpByTracksFilterByPopCD = CD.cdList.stream()
.collect(Collectors.groupingBy(CD::noOfTracks, // (1)
Collectors.filtering(CD::isPop, Collectors.toList()))); // (2)
Printing the contents of the resulting map would show the entries below, not nec-
essarily in this order. Note that the output shows that there was one or more CDs
with six tracks, but there were no pop music CDs. Hence the list of CDs associated
with key 6 is empty.
{6=[],
8=[<Jaav, "Java Jive", 8, 2017, POP>],
10=[<Funkies, "Lambda Dancing", 10, 2018, POP>]}
However, if we run the same query using the filter() intermediate stream opera-
tion at (1) prior to grouping, the contents of the result map are different, as shown
below.
16.8: COLLECTORS 993
Contents of the result map show that only entries that have a non-empty list as a
value are contained in the map. This is not surprising, as any non-pop music CD is
discarded before grouping, so only pop music CDs are grouped.
{8=[<Jaav, "Java Jive", 8, 2017, POP>],
10=[<Funkies, "Lambda Dancing", 10, 2018, POP>]}
Both queries at (1) and (2) above will result in the same entries in the result map:
{false=[<Genericos, "Keep on Erasing", 8, 2018, JAZZ>,
<Genericos, "Hot Generics", 10, 2018, JAZZ>],
true=[<Funkies, "Lambda Dancing", 10, 2018, POP>]}
The mapping() method at (1) creates an adapter that accumulates a set of CD titles
in each year for a stream of CDs. The mapper function maps a CD to its title so that
the downstream collector can accumulate the titles in a set.
Map<Year, Set<String>> titlesByYearInSet = CD.cdList.stream()
.collect(Collectors.groupingBy(
CD::year,
Collectors.mapping( // (1)
CD::title, // Mapper
Collectors.toSet()))); // Downstream collector
System.out.println(titlesByYearInSet);
// {2017=[Java Jive, Java Jam],
// 2018=[Hot Generics, Lambda Dancing, Keep on Erasing]}
The mapping() method at (2) creates an adapter that joins CD titles in each year for
a stream of CDs. The mapper function maps a CD to its title so that the down-
stream collector can join the titles.
Map<Year, String> joinTitlesByYear = CD.cdList.stream()
.collect(Collectors.groupingBy(
CD::year,
Collectors.mapping( // (2)
CD::title,
Collectors.joining(":"))));
System.out.println(joinTitlesByYear);
// {2017=Java Jive:Java Jam,
// 2018=Lambda Dancing:Keep on Erasing:Hot Generics}
The mapping() method at (3) creates an adapter that counts the number of CD tracks
for each year for a stream of CDs. The mapper function maps a CD to its number
of tracks so that the downstream collector can count the total number of tracks.
Map<Year, Long> TotalNumOfTracksByYear = CD.cdList.stream()
.collect(Collectors.groupingBy(
CD::year,
Collectors.mapping( // (3)
CD::noOfTracks,
Collectors.counting())));
System.out.println(TotalNumOfTracksByYear); // {2017=2, 2018=3}
That is, the method adapts a downstream collector accepting elements of type U
to one accepting elements of type T by applying a flat mapping function to each
input element before accumulation, where type parameter A is the intermedi-
ate accumulation type of the downstream collector.
The flat mapping function maps an input element to a mapped stream whose
elements are flattened (p. 924) and passed downstream. Each mapped stream is
closed after its elements have been flattened. An empty stream is substituted
if the mapped stream is null.
Given the lists of CDs below, we wish to find all unique CD titles in the lists. A
stream of CD lists is created at (1). Each CD list is used to create a stream of CDs
whose elements are flattened into the output stream of CDs at (2). Each CD is then
mapped to its title at (3), and unique CD titles are accumulated into a set at (4).
(Compare this example with the one in Figure 16.9, p. 925, using the flatMap()
stream operation.)
// Given lists of CDs:
List<CD> cdListA = List.of(CD.cd0, CD.cd1);
List<CD> cdListB = List.of(CD.cd0, CD.cd1, CD.cd1);
(10) Each unique CD title is accumulated into the result set of each radio station
(Set<String>).
The query at (5) uses four collectors. The result map has the type Map<String,
List<String>>. The output shows the unique titles of CDs played by each radio sta-
tion.
import java.util.List;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.stream.Collectors;
// Map of radio station names and set of CD titles they played: (5)
Map<String, Set<String>> map = radioPlaylists.stream() // (6)
.collect(Collectors.groupingBy(RadioPlaylist::getRadioStationName, // (7)
Collectors.flatMapping(rpl -> rpl.getPlaylist().stream(), // (8)
Collectors.mapping(CD::title, // (9)
Collectors.toSet())))); // (10)
System.out.println(map);
}
}
16.8: COLLECTORS 997
Counting
The collector created by the Collectors.counting() method performs a functional
reduction to count the input elements.
static <T> Collector<T,?,Long> counting()
The collector returned counts the number of input elements of type T. If there
are no elements, the result is Long.valueOf(0L). Note that the result is of type
Long.
The wildcard ? represents any type, and in the method declaration, it is the
type parameter for the mutable type that is accumulated by the reduction
operation.
Finding Min/Max
The collectors created by the Collectors.maxBy() and Collectors.minBy() methods
perform a functional reduction to find the maximum and minimum elements in
the input elements, respectively. As there might not be any input elements, an
Optional<T> is returned as the result.
The natural order comparator for CDs defined at (1) is used in the stream pipelines
below to find the maximum CD. The collector Collectors.maxBy() is used as a
standalone collector at (2), using the natural order comparator to find the maxi-
mum CD. The Optional<CD> result can be queried for the value.
Comparator<CD> natCmp = Comparator.naturalOrder(); // (1)
In the pipeline below, the CDs are grouped by musical genre, and the CDs in
each group are reduced to the maximum CD by the downstream collector
Collectors.maxBy() at (3). Again, the downstream collector uses the natural order
comparator, and the Optional<CD> result in each group can be queried.
// Group CDs by musical genre, and max CD in each group.
Map<Genre, Optional<CD>> grpByGenre = CD.cdList.stream()
.collect(Collectors.groupingBy(
CD::genre,
Collectors.maxBy(natCmp))); // (3) Downstream collector
System.out.println(grpByGenre);
//{JAZZ=Optional[<Jaav, "Java Jam", 6, 2017, JAZZ>],
// POP=Optional[<Jaav, "Java Jive", 8, 2017, POP>]}
Summing
The summing collectors perform a functional reduction to produce the sum of the
numeric results from applying a numeric-valued function to the input elements.
static <T> Collector<T,?,NumType> summingNumType(
ToNumTypeFunction<? super T> mapper)
Returns a collector that produces the sum of a numtype-valued function applied
to the input elements. If there are no input elements, the result is zero. The
result is of NumType.
NumType is Int (but it is Integer when used as a type name), Long, or Double, and
the corresponding numtype is int, long, or double.
In the pipeline below, the CDs are grouped by musical genre, and the number of
tracks on CDs in each group summed by the downstream collector is returned by
the Collectors.summingInt() method at (2).
Map<Genre, Integer> grpByGenre = CD.cdList.stream()
.collect(Collectors.groupingBy(
CD::genre,
Collectors.summingInt(CD::noOfTracks))); // (2) Downstream collector
System.out.println(grpByGenre); // {POP=18, JAZZ=24}
System.out.println(grpByGenre.get(Genre.JAZZ)); // 24
Averaging
The averaging collectors perform a functional reduction to produce the average of
the numeric results from applying a numeric-valued function to the input elements.
static <T> Collector<T,?,Double> averagingNumType(
ToNumTypeFunction<? super T> mapper)
Returns a collector that produces the arithmetic mean of a numtype-valued func-
tion applied to the input elements. If there are no input elements, the result is
zero. The result is of type Double.
NumType is Int, Long, or Double, and the corresponding numtype is int, long, or double.
16.8: COLLECTORS 1001
In the pipeline below, the CDs are grouped by musical genre, and the downstream
collector Collectors.averagingInt() at (2) calculates the average number of tracks
on the CDs in each group.
Map<Genre, Double> grpByGenre = CD.cdList.stream()
.collect(Collectors.groupingBy(
CD::genre,
Collectors.averagingInt(CD::noOfTracks) // (2) Downstream collector
));
System.out.println(grpByGenre); // {POP=9.0, JAZZ=8.0}
System.out.println(grpByGenre.get(Genre.JAZZ)); // 8.0
Summarizing
The summarizing collector performs a functional reduction to produce summary
statistics (count, sum, min, max, average) on the numeric results of applying a
numeric-valued function to the input elements.
static <T> Collector<T,?,NumTypeSummaryStatistics> summarizingNumType(
ToNumTypeFunction<? super T> mapper)
Returns a collector that applies a numtype-valued mapper function to the input
elements, and returns the summary statistics for the resulting values.
NumType is Int (but it is Integer when used as a type name), Long, or Double, and
the corresponding numtype is int, long, or double.
System.out.println(stats1);
// IntSummaryStatistics{count=5, sum=42, min=6, average=8.400000, max=10}
Reducing
Collectors that perform common statistical operations, such as counting, averag-
ing, and so on, are special cases of functional reduction that can be implemented
using the Collectors.reducing() method.
static <T> Collector<T,?,Optional<T>> reducing(BinaryOperator<T> bop)
Returns a collector that performs functional reduction, producing an Optional
with the cumulative result of applying the binary operator bop on the input ele-
ments: e1 bop e2 bop e3 ..., where each ei is an input element. If there are no
input elements, an empty Optional<T> is returned.
Note that the collector reduces input elements of type T to a result that is an
Optional of type T.
The pipeline below groups CDs according to the year they were released. For each
group, the collector returned by the three-argument Collectors.reducing() method
performs a map-reduce operation at (8) to map each CD to its number of tracks and
accumulate the tracks in each group. This map-reduce operation is equivalent to
the collector returned by the Collectors.summingInt() method at (9).
Map<Year, Integer> noOfTracksByYear = CD.cdList.stream()
.collect(Collectors.groupingBy(
CD::year,
Collectors.reducing( // (8) Downstream collector
0, CD::noOfTracks, Integer::sum)));
System.out.println(noOfTracksByYear); // {2017=14, 2018=28}
System.out.println(noOfTracksByYear.get(Year.of(2018)));// 28
16.8: COLLECTORS 1005
Function
Method name Functional interface type of
(ref.) Return type parameters parameters
averagingDouble Collector<T,?,Double> (ToDoubleFunction<T> T -> double
(p. 1000) mapper)
counting Collector<T,?,Long> ()
(p. 998)
Function
Method name Functional interface type of
(ref.) Return type parameters parameters
flatMapping Collector<T,?,R> (Function<T, T->Stream<U>,
(p. 994) Stream<U>>
mapper,
Collector<U,A,R> (U,A) -> R
downstream)
joining Collector ()
(p. 984) <CharSequence,?,String>
Function
Method name Functional interface type of
(ref.) Return type parameters parameters
reducing Collector<T,?,Optional<T>> (BinaryOperator<T> (T,T) -> T
(p. 1002) op)
toList Collector<T,?,List<T>> ()
toUnmodifiableList
(p. 980)
Function
Method name Functional interface type of
(ref.) Return type parameters parameters
toMap Collector<T,?,Map<K,U>> (Function<T,K> T -> K,
(p. 981) keyMapper,
Function<T,U> T -> U,
valueMapper,
BinaryOperator<U> (U,U) -> U,
mergeFunction,
Supplier<Map<K,U>> ()-> Map<K,U>
mapSupplier)
toSet Collector<T,?,Set<T>> ()
toUnmodifiableSet
(p. 980)
Table 16.8 shows a comparison of methods in the stream interfaces that perform
reduction operations and static factory methods in the Collectors class that imple-
ment collectors with equivalent functionality.
Table 16.8 Method Comparison: The Stream Interfaces and the Collectors Class
Figure 16.14, p. 967, illustrates parallel mutable reduction using the three-argument
collect(supplier, accumulator, combiner) terminal operation (p. 966).
1010 CHAPTER 16: STREAMS
Benchmarking
In general, increasing the number of CPU cores and thereby the number of threads
that can execute in parallel only scales performance up to a threshold for a given
size of data, as some threads might become idle if there is no data left for them to
process. The number of CPU cores boosts performance to a certain extent, but it is
not the only factor that should be considered when deciding to execute a stream in
parallel.
Inherent in the total cost of parallel processing is the start-up cost of setting up the
parallel execution. At the onset, if this cost is already comparable to the cost of
sequential execution, not much can be gained by resorting to parallel execution.
A combination of the following three factors can be crucial in deciding whether a
stream should be executed in parallel:
• Sufficiently large data size
The size of the stream must be large enough to warrant parallel processing;
otherwise, sequential processing is preferable. The start-up cost can be too pro-
hibitive for parallel execution if the stream size is too small.
• Computation-intensive stream operations
If the stream operations are small computations, then the stream size should be
proportionately large as to warrant parallel execution. If the stream operations
are computation-intensive, the stream size is less significant, and parallel exe-
cution can boost performance.
• Easily splittable stream
If the cost of splitting the stream into substreams is higher than processing
the substreams, employing parallel execution can be futile. Collections like
Array-Lists, HashMaps, and simple arrays are efficiently splittable, whereas
LinkedLists and IO-based data sources are less efficient in this regard.
In Example 16.14, the methods measurePerf() at (6) and xqtFunctions() at (13) create
the benchmarks for functions passed as parameters. In the measurePerf() method,
the system clock is read at (8) and the function parameter func is applied at (9). The
system clock is read again at (10) after the function application at (9) has com-
pleted. The execution time calculated at (10) reflects the time for executing the
function. Applying the function func evaluates the lambda expression or the method
reference implementing the LongFunction interface. In Example 16.14, the function
parameter func is implemented by method references that call methods, at (1) through
(5), in the StreamBenchmarks class whose execution time we want to measure.
public static <R> double measurePerf(LongFunction<R> func, long n) { // (6)
// ...
double start = System.nanoTime(); // (8)
result = func.apply(n); // (9)
double duration = (System.nanoTime() - start)/1_000_000; // (10) ms.
// ...
}
import java.util.function.LongFunction;
import java.util.stream.LongStream;
/*
* Benchmark the execution time to sum numbers from 1 to n values
* using streams.
*/
public final class StreamBenchmarks {
/*
* Applies the function parameter func, passing n as parameter.
* Returns the average time (ms.) to execute the function 100 times.
*/
public static <R> double measurePerf(LongFunction<R> func, long n) { // (6)
int numOfExecutions = 100;
double totTime = 0.0;
R result = null;
for (int i = 0; i < numOfExecutions; i++) { // (7)
double start = System.nanoTime(); // (8)
result = func.apply(n); // (9)
double duration = (System.nanoTime() - start)/1_000_000; // (10)
totTime += duration; // (11)
}
double avgTime = totTime/numOfExecutions; // (12)
return avgTime;
}
/*
* Executes the functions in the varargs parameter funcs
* for different stream sizes.
*/
public static <R> void xqtFunctions(LongFunction<R>... funcs) { // (13)
long[] sizes = {1_000L, 10_000L, 100_000L, 1_000_000L}; // (14)
Side Effects
Efficient execution of parallel streams that produces the desired results requires the
stream operations (and their behavioral parameters) to avoid certain side effects.
• Non-interfering behaviors
The behavioral parameters of stream operations should be non-interfering
(p. 909)—both for sequential and parallel streams. Unless the stream data
source is concurrent, the stream operations should not modify it during the
execution of the stream. See building streams from collections (p. 897).
• Stateless behaviors
The behavioral parameters of stream operations should be stateless (p. 909)—
both for sequential and parallel streams. A behavioral parameter implemented
as a lambda expression should not depend on any state that might change dur-
ing the execution of the stream pipeline. The results from a stateful behavioral
parameter can be nondeterministic or even incorrect. For a stateless behavioral
parameter, the results are always the same.
Shared state that is accessed by the behavior parameters of stream operations in
a pipeline is not a good idea. Executing the pipeline in parallel can lead to race
conditions in accessing the global state, and using synchronization code to pro-
vide thread-safety may defeat the purpose of parallelization. Using the three-
argument reduce() or collect() method can be a better solution to encapsulate
shared state.
The intermediate operations distinct(), skip(), limit(), and sorted() are state-
ful (p. 915, p. 915, p. 917, p. 929). See also Table 16.3, p. 938. They can carry extra
16.9: PARALLEL STREAMS 1015
Ordering
An ordered stream (p. 891) processed by operations that preserve the encounter
order will produce the same results, regardless of whether it is executed sequen-
tially or in parallel. However, repeated execution of an unordered stream—
sequential or parallel—can produce different results.
Preserving the encounter order of elements in an ordered parallel stream can incur
a performance penalty. The performance of an ordered parallel stream can be
improved if the ordering constraint is removed by calling the unordered() interme-
diate operation on the stream (p. 932).
The three stateful intermediate operations distinct(), skip(), and limit() can
improve performance in a parallel stream that is unordered, as compared to one
that is ordered (p. 915, p. 915, p. 917). The distinct() operation need only buffer
any occurrence of a duplicate value in the case of an unordered parallel stream,
rather than the first occurrence. The skip() operation can skip any n elements in the
case of an unordered parallel stream, not necessarily the first n elements. The
limit() operation can truncate the stream after any n elements in the case of an
unordered parallel stream, and not necessarily after the first n elements.
The terminal operation findAny() is intentionally nondeterministic, and can return
any element in the stream (p. 952). It is specially suited for parallel streams.
The forEach() terminal operation ignores the encounter order, but the forEachOrdered()
terminal operation preserves the order (p. 948). The sorted() stateful intermediate
operation, on the other hand, enforces a specific encounter order, regardless of
whether it executed in a parallel pipeline (p. 929).
Review Questions
Which of the following statements when inserted independently at (1) will result
in a compile-time error?
Select the two correct answers.
(a) int sum = values.reduce(0, (x, y) -> x + y);
(b) int sum = values.parallel().reduce(0, (x, y) -> x + y);
REVIEW QUESTIONS 1017
Which of the following statements, when inserted independently at (1), will result
in the value 4 being printed?
Select the two correct answers.
(a) int value = values.reduce(0, (x, y) -> x + 1);
(b) int value = values.reduce((x, y) -> x + 1).orElse(0);
(c) int value = values.reduce(0, (x, y) -> y + 1);
(d) int value = values.reduce(0, (x, y) -> y);
(e) int value = values.reduce(1, (x, y) -> y + 1);
(f) long value = values.count();
Which statement when inserted independently at (1) will result in the output
1 [C]?
Select the one correct answer.
(a) map = values.stream()
.collect(Collectors.groupingBy(s -> s.length(),
Collectors.filtering(s -> !s.contains("C"),
Collectors.toList())));
(b) map = values.stream()
.collect(Collectors.groupingBy(s -> s.length(),
Collectors.filtering(s -> s.contains("C"),
Collectors.toList())));
1018 CHAPTER 16: STREAMS
Which of the following statements will produce the same result as the program?
Select the two correct answers.
(a) IntStream.rangeClosed(0, 5)
.filter(i -> i % 2 != 0)
.forEach(i -> System.out.println(i));
(b) IntStream.range(0, 10)
.takeWhile(i -> i < 5)
.filter(i -> i % 2 != 0)
.forEach(i -> System.out.println(i));
(c) IntStream.range(0, 10)
.limit(5)
.filter(i -> i % 2 != 0)
.forEach(i -> System.out.println(i));
(d) IntStream.generate(() -> {int x = 0; return x++;})
.takeWhile(i -> i < 4)
.filter(i -> i % 2 != 0)
.forEach(i -> System.out.println(i));
(e) var x = 0;
IntStream.generate(() -> return x++)
.limit(5)
.filter(i -> i % 2 != 0)
.forEach(i -> System.out.println(i));
(c) ZYXCBA
(d) CBAZYX
16.10 Which statement produces a different result from the other statements?
Select the one correct answer.
(a) Stream.of("A", "B", "C", "D", "E")
.filter(s -> s.compareTo("B") < 0)
.collect(Collectors.groupingBy(s -> "AEIOU".contains(s)))
.forEach((x, y) -> System.out.println(x + " " + y));
(b) Stream.of("A", "B", "C", "D", "E")
.filter(s -> s.compareTo("B") < 0)
.collect(Collectors.partitioningBy(s -> "AEIOU".contains(s)))
.forEach((x, y) -> System.out.println(x + " " + y));
(c) Stream.of("A", "B", "C", "D", "E")
.collect(Collectors.groupingBy(s -> "AEIOU".contains(s),
Collectors.filtering(s -> s.compareTo("B") < 0,
Collectors.toList())))
.forEach((x, y) -> System.out.println(x + " " + y));
(d) Stream.of("A", "B", "C", "D", "E")
.collect(Collectors.partitioningBy(s -> "AEIOU".contains(s),
Collectors.filtering(s -> s.compareTo("B") < 0,
Collectors.toList())))
.forEach((x, y) -> System.out.println(x + " " + y));
16.12 Which of the following statements are true about the Stream methods?
Select the two correct answers.
(a) The filter() method accepts a Function.
(b) The peek() method accepts a Function.
(c) The peek() method accepts a Consumer.
REVIEW QUESTIONS 1021
set1.stream()
.mapToInt(v -> v.length())
.sorted()
.forEach(v -> System.out.print(v));
(b) Set<Integer> set2 = Stream.of("XX", "XXXX", "", null, "XX", "X")
.map(v -> (v == null) ? 0 : v.length())
.filter(v -> v != 0)
.collect(Collectors.toSet());
set2.stream()
.sorted()
.forEach(v -> System.out.print(v));
(c) List<Integer> list1 = Stream.of("XX", "XXXX", "", null, "XX", "X")
.map(v -> (v == null) ? 0 : v.length())
.filter(v -> v != 0)
.toList();
list1.stream()
.sorted()
.forEach(v -> System.out.print(v));
(d) List<Integer> list2 = Stream.of("XX", "XXXX", "", null, "XX", "X")
.map(v -> (v == null) ? 0 : v.length())
.filter(v -> v != 0)
.distinct()
.toList();
list2.stream()
.sorted()
.forEach(v -> System.out.print(v));