Need to trim the fat from your List? Gotta get rid of some unwanted duplicate elements? If so, then have a look at the distinct() method on Stream.

Of course, you could also convert the whole thing to a Set. But, for the sake of this guide, let's assume you need to de-dupe the collection as part of a larger set of Stream operations.

And, in that situation, moving to Set might not be the correct answer.

Let me show you how it's done with the aid of distinct().

The CRM App

Let's say you're building a CRM app. It does the normal stuff that CRM apps do: tracks activities between sales reps and contacts.

Those activities get stored in a MongoDB database as documents. Each document persists info like the title of the activity as well as the type, outcome, location, start time, end time, notes, and the contact involved.

As it stands right now, if you retrieve all the documents from the activities collection, the resulting data set looks like the JSON dump at this link.

As you can see, we're not messing around here. You're going to be working with real-world data.

On the Java side, the Activity class with its related classes mimic the data set that you see above. You can see examples of those classes over on GitHub.

So you can just do a findAll() on that collection above and get a List of Java objects that represent that JSON output. Then, you can use a Java Stream to filter, find, and map as you see fit.

That's what you'll do in this guide.

And, yes, you could do that kind of stuff with MongoDB aggregations. But you're not here to learn about aggregations are you?

Location, Location, Location...

You've got a new requirement. Potentially.

Management is noticing that sales reps keep booking the same places for lunch appointments. However, the field to enter the name of the location is currently free-text. They're considering turning it into a dropdown.

All you need to do is get a list of the places where sales reps are wining and dining potential customers. Later on, you might use that list to populate the options in a drop-down.

Fortunately, the activity documents stored in the MongoDB collection include location info. However, not all activities have locations (phone calls, for example). So you'll need to filter out the activities with no location.

Then, just grab a unique set of all the location names.

Dream a Little Stream of Me

You're going to fulfill this requirement with the aid of a Stream. That's because you've got a couple of operations you need to perform on the List and Stream makes it easy to handle those tasks.

So here's what the code looks like:

List<Activity> activities = activityRepo.findAll();

String names = activities
               .stream()
               .filter(activity -> !StringUtils.isBlank(activity.getLocation()))
               .map(Activity::getLocation)
               .collect(Collectors.joining("\n"));

System.err.println(names); 

The first line grabs all activity documents from the MongoDB collection. It returns those documents as a List of Activity objects.

Then it converts the List to a Stream object. That's the stream() method you see above.

Next, it filters out the activities that don't have activity locations. You can check out my guide on the filter() method if you'd like to learn more.

The map() method just grabs the activity location that's guaranteed to be non-null at this point (see above).

Once map() is completed, the sequence includes a list of location names as String objects.

Finally, the code employs the collect() method to create an uber-String that includes all locations on a separate line.

Test It!

Okay. Now that you've got the code in place, run it. That System.err line at the end will print out the results.

Given the dataset that I referenced above, you should end up with this output:

Bruno's
That Italian Joynt
Banana Joe's
Izzy's
Izzy's
Loony's
Izzy's

Well it looks good except for... Izzy's is in there three (3) times.

Not good. You don't want Izzy's to show up three (3) times in a drop-down, do you?

Of course not.

Fortunately, it's easy to fix.

Try this:

List<Activity> activities = activityRepo.findAll();

String names = activities
               .stream()
               .filter(activity -> !StringUtils.isBlank(activity.getLocation()))
               .map(Activity::getLocation)
               .distinct()
               .collect(Collectors.joining("\n"));

System.err.println(names); 

The only difference between that code block and the last code block is the distinct() method you may or may not have noticed.

That method lives up to its name. It only includes distinct values from the sequence created by the previous map() operation.

Now, run that code. You should get this:

Bruno's
That Italian Joynt
Banana Joe's
Izzy's
Loony's

Success! No more dupes!

Wrapping It Up

That was easy, wasn't it? Now you know how to remove duplicates from your sequences in Streams.

Over to you. Take what you've learned here and apply it to your current development efforts. 

In other words: make this method your own.

And, as always, have fun!

Photo by Armin Rimoldi from Pexels