Sometimes it isn't good enough just to present data. You also have to categorize it.

And when you need to do that with a Java Stream, you'll likely turn to the groupingBy() static method from the Collectors class.

In this guide, I'll show you how to group data into specific sets. 

In fact, I'll pick up where we left off in the guide on how to use map() with Java Streams.

If you read that guide, you might have noticed that I didn't quite "take it home." The requirement said you needed to produce data for a histogram that showed when contacts completed the web form.

You ended up returning a List object that looked like this:

[07:00, 07:00, 02:00]

Meh. It fulfills the requirement.

But wouldn't it be nice to return a Map that associates the number of form completions with each hour?

In other words, you want to end up with something that looks like this:

{02:00=1, 07:00=2}

Yes. Yes that would be nice.

And that's exactly what you'll do.

The CRM App

Let's say you're building a CRM app. It does the normal stuff that CRM apps do: tracks activities between sales reps and contacts.

Those activities get stored in a MongoDB database as documents. Each document persists info like the title of the activity as well as the type, outcome, location, start time, end time, notes, and the contact involved.

As it stands right now, if you retrieve all the documents from the activities collection, the resulting data set looks like the JSON dump at this link.

As you can see, we're not messing around here. You're going to be working with real-world data.

On the Java side, the Activity class with its related classes mimic the data set that you see above. You can see examples of those classes over on GitHub.

So you can just do a findAll() on that collection above and get a List of Java objects that represent that JSON output. Then, you can use a Java Stream to filter, find, and map as you see fit.

That's what you'll do in this guide.

And, yes, you could do that kind of stuff with MongoDB aggregations. But you're not here to learn about aggregations are you?

Picking Up Where You Left Off

So here you are again handling a requirement to produce data that can be piped into a histogram.

Fortunately, you don't need to worry about the UI part here.

But this time, instead of returning a List of String objects representing each hour that contacts completed a web form, you'll throw back a Map object. 

Well, you saw it all above so I don't need to explain it again.

To make that happen, here's how you need to update the code:

List<Activity> activities = activityRepo.findAllByOrderByStartDateDesc();

Map<String, Long> times = activities
                        .filter(activity -> activity.getType() != null && "Web Form Completion".equals(activity.getType().getName()))
                        .map(activity -> activity.getStartDate())
                        .map(dateAsNumber -> DATE_FORMAT.format(new Date(dateAsNumber)))
                        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));


The big change from the previous guide is in that collect() method.

However, the return type also changed. It went from List<String> to Map<String, Long>.

Instead of Collectors.toList() the code now uses Collectors.groupingBy().

But what's that all about?

Before I can answer that, this would be an excellent time to explain the Collector interface and map/reduce.

What Is Map/Reduce?

The whole concept of map/reduce (or MapReduce) originated as a means to handle Big Data. Even though I'm not using Big Data here, it's still important to understand the concept.

Why? Because the Collector interface uses reduction operations. That's why.

But back to map/reduce. You won't be surprised to learn that map/reduce implementations are comprised of two parts: mapping and reduction.

Mapping involves filtering and sorting a sequence of data elements. You see that in the code above before you even get to the collect() method.

A reduction process, on the other hand, takes that filtered and sorted data and performs some type of operation on it, like a count or an average.

In this case, the reduction operation will count the number of web form completions during specific hours of the day. So you're going with the "count" option here.

Take Up a Collection

The Collector interface handles reduction for a Java Stream. It translates the final state of the sequence into something meaningful.

If you want to implement your own Collector, you can do that. But it's a pain.

Fortunately, the Collectors class (note the "s" on the end there) includes several static methods that you can use to instantiate a Collector and make your life easier. In fact, Collectors.groupingBy() is one of those methods.

So what does groupingBy() do? I'll answer that now.

For starters, groupingBy() is an overloaded method. That means Collectors has more than one static method named groupingBy(). Each one takes different parameters.

Here, I'll cover the method that takes a Function and a Collector as its two parameters.

That method groups the elements in the sequence according to the category specified by the Function. Then, it performs a reduction operation as specified by the Collector.

Go back to the code above and you'll see that the Function parameter is implemented as Function.identity().

That's a function that always returns its input argument. In this case, the input argument is the time-formatted String object returned from the previous map() operation.

But it's the second parameter that takes it to the next level: that parameter implements Collector with Collectors.counting().

That implementation counts the number of input elements of a given type. Here, the type is a String object representing the hour of the day.

So the collect() method here basically says: "Count every instance of a specific hour-of-day String you got back from the previous mapping operation. Then, put that number as a value in a Map using the associated String as its key."

So if there are three instances of "08:00" in the sequence then the operation would perform this:

map.put ("08:00", 3);

And after it's done with all of that, it returns the Map object.

That's why you see Map<String, Long> as the return type instead of List<String> as in the previous guide.

Testing It Out

I think I've already spoiled the ending here. You know how this will turn out.

But if you run the code above with the dataset I referenced previously, you'll get this output:

{02:00=1, 07:00=2}

And that's what you'd expect because that's what's in the data.

Now you can return data much more suitable for presenting in a histogram!

Wrapping It Up

There's quite a bit more that you can do with Collectors.groupingBy() than what I've covered here. Take some time to tinker with it and learn about its potential.

Also, adapt what you've learned here to suit your own requirements.

Use different objects. Try different reduction strategies. Add more operations.

Just make sure you have fun!

Photo by Buro Millennial from Pexels