Sometimes, it isn't good enough just to collect data. It's often the case that you need to take that collected data and transform it into something useful.

Fortunately, you can do that with a Java Stream using the Collectors.collectingAndThen() static method.

And I will be happy to show you how to do it.

In fact, I'll even show you how to do it with something resembling real-world data from a CRM app. Because chances are pretty good that you're here to learn how to do something for a business application.

So let's get going.

The CRM App

Let's say you're building a CRM app. It does the normal stuff that CRM apps do: tracks activities between sales reps and contacts.

Those activities get stored in a MongoDB database as documents. Each document persists info like the title of the activity as well as the type, outcome, location, start time, end time, notes, and the contact involved.

As it stands right now, if you retrieve all the documents from the activities collection, the resulting data set looks like the JSON dump at this link.

As you can see, we're not messing around here. You're going to be working with real-world data.

On the Java side, the Activity class with its related classes mimic the data set that you see above. You can see examples of those classes over on GitHub.

So you can just do a findAll() on that collection above and get a List of Java objects that represent that JSON output. Then, you can use a Java Stream to filter, find, and map as you see fit.

That's what you'll do in thisAnd  guide.

And, yes, you could do that kind of stuff with MongoDB aggregations. But you're not here to learn about aggregations are you?

Lies & Statistics

About that CRM app you're working on: management wants stats.

Specifically, they want to know the average number of times each contact engaged in any kind of sales activity. That activity could be anything like a web page visit or lunch appointment with a sales rep.

The folks in the C-Suite want to be sure that potential customers are properly engaged.

Good enough. Now it's up to you to deliver one simple number: an average of all the times each contact has participated in any kind of sales activity.

Fortunately, all sales activities are stored in a MongoDB collection. All you need to do is grab the activities, group them by contact, count the activities per contact, and get the average count per contact.

And you can do it all with a couple of Java Streams and this method called collectingAndThen().

Collecting and What?

As I mentioned in the lead, it's often not good enough just to collect data. Once you've collected it, you still need to transform it in some way.

In this case, collecting the data by counting all activities per contact isn't good enough. You need to do something with that collection.

You need to get an average.

So you'll collect the activities per contact and then you'll do something else.

That's why collectingAndThen() is the perfect candidate for this requirement.

That method, by the way, takes a Collector as its first parameter. I've covered the Collector interface before so I won't rehash it again here. Take a peek at that guide if you're unfamiliar with it.

The method accepts a Function as its second parameter. That's the part that handles the transformation from the collection into the average.

For the purposes of this assignment, I think it's a good idea to start by creating that Function.

Fun With the Function

A Function, as you probably know, accepts an input and returns an output in response. The input and the output don't need to be of the same type.

However, when you specify the Function, you declare the input and output types using standard type-safety notation.

So for the purposes of getting an average from a collection of contact activity counts, the Function implementation will look like this:

Function<Map<String, Long>, Double> avgFunction = activityMap -> activityMap.values().stream().mapToLong(a -> a).average().getAsDouble();

The Function accepts a Map as its input. Then, it uses the data in that Map to return a Double.

That's why you see Map<String, Long> and Double in the type-safety declaration.

The Map stores contact last names as keys. It stores the number of times each contact engaged in any kind of sales activity as a Long.

If you crunch through the dataset I referenced above, you'll find that the Map looks like this:

{
  "Mei" : 4,
  "Rover" : 3,
  "Joy" : 7,
  "Windsor" : 4,
  "Simmz" : 2,
  "Lezilia" : 4,
  "Cheng" : 4,
  "Scene" : 4
}

Now the Function needs to return the average of all those numbers. Do some quick arithmetic in your head and you'll find the average is 4.

The Function calculates that average in its apply() method. That method is implicitly implemented with the lambda expression you see here:

activityMap -> activityMap.values().stream().mapToLong(a -> a).average().getAsDouble();

The Function accepts the Map as its input (that's activityMap on the left side of the arrow) and goes through the values in that Map to calculate the average.

As you can see, it's enlisting the aid of the Stream object to make that happen. But a Stream object here isn't good enough. The code needs a LongStream object.

Why? So it can invoke the subsequent average() method that lives up to its name.

The code can't get an average of a Stream. That's because the elements in the Stream might be something other than numbers.

But a LongStream, by definition, is numbers. That's why mapToLong() exists above.

However, the average() method returns OptionalDouble. So the code still needs to invoke the getAsDouble() method to get the final average.

Why is the average a Double when it's an even 4 in this case? Because it won't always be an even 4.

If you want precision with averages, you need to use Double not Integer.

Collecting and Send

Now it's time to collect contact activity data so you can average it and send it back to the client.

Here's what that code looks like:

Double avg = activities
                .stream()
                .collect(Collectors.collectingAndThen(Collectors.groupingBy(a -> a.getContact().getLastName(), Collectors.counting()), avgFunction));

System.err.println("" + avg);

As you can see, the code above goes right from stream() to collect(). No other intermediate operations are necessary here.

But there's a lot going on in that collect() method. So I guess I'd better break it down.

For starters, note that it's using the Collectors.collectingAndThen() method that might have brought you here from a search engine.

As I noted above, the first parameter in that method accepts a Collector. In this case, it's a Collector that's implemented with the Collectors.groupingBy() static method.

Specifically, that method groups the contact's by last name. That's why you see this lambda expression:

a -> a.getContact().getLastName()

But groupingBy() also accepts... wait for it... another Collector as its second parameter here. The code above uses Collectors.counting() to get a sum of the number of activities per contact.

Now that Collector returns a type of Map<String, Long>. That type should look familiar to you.

It's the same type as the input of the Function I described earlier. That's not a coincidence.

What's also not a coincidence is that same Function, referenced here as avgFunction, is the second parameter in the collectingAndThen() method above.

That's going to apply the Map<String, Long> object to the Function and create a Double. That Double, of course, is the averge number of activities per contact.

A Test If You Please

If you take a look at that first code block in the previous section you'll see that the last line uses System.err to print out the result the Double returned in pretty red numbers.

Now run that code against the dataset I referenced earlier. You should see this output:

4.0

And that's exactly right because there's an average of four activities per contact.

You win.

Wrapping It Up

It's over. You now know how to use collectingAndThen() to massage data according to your requirements.

Why not take some time now to tinker with that method a little more? Use a different data set. Get a summation instead of an average. Calculate a standard deviation.

But above all, have fun!

Photo by Zen Chung from Pexels