Java Streams: Create Pivot Tables That Highlight Important Info

Sometimes, you gotta crunch the numbers yourself and put them in a report that makes everything look pretty. You can do that with pivot tables.

And you can create those pivot tables with a Java Stream object. In this guide, I'll show you how to do it.

Even better: I'll show you how to make that happen with something resembling a real-world business application. No foo::bar stuff around here.

Ready to have some fun? Let's get started.

The CRM App

Let's say you're building a CRM app. It does the normal stuff that CRM apps do: tracks activities between sales reps and contacts.

Those activities get stored in a MongoDB database as documents. Each document persists info like the title of the activity as well as the type, outcome, location, start time, end time, notes, and the contact involved.

As it stands right now, if you retrieve all the documents from the activities collection, the resulting data set looks like the JSON dump at this link.

As you can see, we're not messing around here. You're going to be working with real-world data.

On the Java side, the Activity class with its related classes mimic the data set that you see above. You can see examples of those classes over on GitHub.

So you can just do a findAll() on that collection above and get a List of Java objects that represent that JSON output. Then, you can use a Java Stream to filter, find, and map as you see fit.

That's what you'll do in this guide.

And, yes, you could do that kind of stuff with MongoDB aggregations. But you're not here to learn about aggregations are you?

The Reqs

Management wants you to create a service that delivers activity data in a pivot table. Specifically, the boss-folks want to see the number of activity types by date.

So the table will show all activity types along the y-axis. It will show the dates at the top along the x-axis. And in between it will show the number of activity types for each date.

Good news, though: you won't have to worry about the UI component here. All you need to do is deliver the data.

And you're confident you can do that with the aid of a Java Stream.

The Model

Even though you don't need to do diddly-squat with the UI, what's this "table" going to look like from a data perspective?

It's going to be a Map.

The key in the Map will be the name of the activity type (e.g., "Appointment," "Phone Call," Chat," etc.). The value in the Map will be another Map.

Yes, another Map.

Why? Because for each activity type you're going to associate a series of dates with a value. That value will be the number of activity types that took place on each date.

By the way, the date in that second Map will be represented as a String for readability purposes. The number will be represented as a Long because that's the default with the API you're using.

So the Map will look like this:

Map<String, Map<String, Long>>

But but but... hold on. You don't just want the dates in any order, do you? I hope not.

You need to sort the dates.

So that second Map object will be an instance of TreeMap with the keys sorted in ascending order.

Okay, now that you know what you're doing, it's time to get coding.

Double Grouping

I've already covered how to handle grouping with Java Streams. If you're not familiar with how to do that, you might want to read that article before you go any further. It's kind of a prerequisite here.

For this solution, you'll need to use Collectors.groupingBy() twice. Once for the outer Map and once for the inner Map.

Now that you understand the basics, there's no point in wasting any more time. Here's the code:

List<Activity> activities = activityRepo.findAll();

Map<String, Map<String, Long>> pivot = activities
                                       .stream()
                                       .collect(Collectors.groupingBy(a -> a.getType().getName(), 
                                               Collectors.groupingBy(a -> DATE_FORMAT.format(new Date(a.getStartDate())), TreeMap::new, Collectors.counting())));
           
try {
    ObjectMapper objectMapper = new ObjectMapper();
    System.err.println(objectMapper.writerWithDefaultPrettyPrinter().writeValueAsString(pivot));
} catch (Exception e) {
    e.printStackTrace();
}

The first line goes to the MongoDB collection, fetches all activity documents, and translates them to Activity objects on the Java side.

Then the code creates the pivot table. To do so, it first translates the Java List object into a Stream object. You see that with the stream() method invocation above.

Then it starts in right away with the collection process.

First, it groups the objects by activity type. You see that in this part of the code:

Collectors.groupingBy(a -> a.getType().getName(),

Pay special attention to how the Function is implemented in that first parameter. Lots of times, you'll see developers use a method reference in that spot. Something like: MyObject::myMethod.

That often works just fine. Here, though, the code needs to translate the whole ActivityType object to just a name. So it does that with the lambda expression you see above.

With that part done, the grouping will be by activity type name. A good start.

But that groupingBy() method you see above requires another parameter. It requires a Collector.

Here's what that looks like:

Collectors.groupingBy(a -> DATE_FORMAT.format(new Date(a.getStartDate())), TreeMap::new, Collectors.counting())));

And there's that second grouping I was talking about.

There's quite a bit going on there so pull up a rock while I explain it all.

For starters, note the Function implementation again in the first parameter. That's this thing:

a -> DATE_FORMAT.format(new Date(a.getStartDate()))

That lambda expression takes the activity start date and translates it to something human-readable.

Why is it not already human-readable? Because it's stored as a Long number representing the number of milliseconds since 1970 began.

That number by itself isn't going to do anybody any good so the code translates it to something people can comprehend. It does that with the aid of the DATE_FORMAT constant defined as follows:

private static final DateFormat DATE_FORMAT = new SimpleDateFormat("MM/dd/yyyy");

Simple U.S.-style date representation going on there. Nothing fancy.

So now the second level of categorization (or grouping) is by date. Remember, the first level is by activity type name.

Okay, so now take a look at the next two parameters in this rendition of groupingBy():

TreeMap::new, Collectors.counting()

If you look carefully, you'll see that the code is using this overloaded version of groupingBy(). It uses three parameters unlike the two-parameter option I showed you in the last guide.

That third parameter is a Supplier. And it sits right in the middle between the other two parameters.

A Supplier, in Java, is a functional interface that doesn't accept any arguments but instead produces a value. of a specific type.

Here, the value is a type of TreeMap. That's why you see TreeMap::new above.

Remember: it's important to use TreeMap instead of just plain old Map here because the requirement calls for sorting the dates. And the dates are stored as keys in a Map.

So if you left out that TreeMap::new and instead opted for the old two-parameter method, you'd get decent results with the dates all out of order.

By the way, if you're wondering how the sorting happens in the TreeMap with no sort-like method or Comparator defined above, it's because Java will use the natural sorting order of the keys. That works beautifully for dates stored as String objects in MM/dd/yyyy format.

Finally, that Collectors.counting() bit just tallies up the number of activity types for each date.

Then the whole thing is returned as a Map.

Testing Kit

If you look way back at that large code block above you'll find a try/catch block at the end. Inside there, the code prints out the results from the Stream operations.

Run that code with the dataset I referenced above. You should see output that looks like this:

{
  "Appointment" : {
    "01/13/2021" : 2,
    "01/28/2021" : 2,
    "01/29/2021" : 1,
    "02/02/2021" : 1,
    "02/09/2021" : 1
  },
  "Email" : {
    "01/05/2021" : 1,
    "01/12/2021" : 1,
    "01/13/2021" : 1,
    "01/14/2021" : 1,
    "01/15/2021" : 2,
    "01/18/2021" : 1,
    "02/02/2021" : 1,
    "02/03/2021" : 1
  },
  "Chat" : {
    "01/11/2021" : 1,
    "01/14/2021" : 1,
    "01/15/2021" : 1,
    "01/21/2021" : 1,
    "02/07/2021" : 1
  },
  "Web Page Visitied" : {
    "01/13/2021" : 1
  },
  "Web Form Completion" : {
    "01/10/2021" : 1,
    "01/11/2021" : 1,
    "01/12/2021" : 1
  },
  "Text Message" : {
    "01/15/2021" : 1
  },
  "Phone Call" : {
    "01/12/2021" : 1,
    "01/14/2021" : 1,
    "01/18/2021" : 1,
    "01/19/2021" : 1,
    "02/03/2021" : 1,
    "02/05/2021" : 1
  }
}

And that, my friend, is an accurate pivot table. You can see that for each activity type you've got a variety of dates. For each date, you see the number of times that a sales rep logged that activity type on that date.

It works.

Wrapping It Up

Chances are very good that the solution I covered here won't work for the problem you're trying to solve. However, you can take what you learned and tinker with it a bit to get what you're looking for.

And if you're just experimenting, feel free to continue down that path. Try creating a different pivot table with other data sets.

But whatever you do, just make sure you have fun!

Photo by Engin Akyurt from Pexels