Have you had this problem?

You call an API and get back a string with a lot of leading or trailing whitespace. Then you decide to use the trusty ol' trim() method to get rid of the whitespace.

But it doesn't work.

What's going on?

I'll tell you what's going on. At least for the situation that I faced.

More importantly, though, I'll tell you how to fix it.

It's Not Whitespace

If trim() isn't working (in either Java or TypeScript) then the problem almost certainly isn't with trim(). That thing has proven itself time and time again.

The problem is with your string.

Specifically, you've likely got whitespace that isn't whitespace. At least not according to the trim() methods you're working with.

You see, there's such a thing as a zero-width non-joiner (ZWNJ). That's a character that could end up in your string.

I found several of them in my string when I got back a response from the Gmail API. I thought they were spaces, but my heart broke when I saw that trim() didn't work.

But what are they?

They're invisible characters that make use of ligatures.

A ligature, by the way, joins two characters together into a single "character." Like this thing: æ

Apparently, the Gmail API likes to send those ZWNJs along to connect spaces together. There's probably a really good reason for it. But I have no idea what it is.

Anyhoo, that's the problem. Now let me tell you about the solution.

On the Java Side

I think it's best to handle these kinds of string manipulation functions on the back end. So I took care of it on the Java side of the house.

Here's what I did:

String snippet = message.getSnippet();
email.setSnippet(snippet.replaceAll("[\\p{Cf}]", "").trim());

In the code above, the snippet String represents a snippet of a body from a Gmail message. It's the "preview" you usually see in your email client.

But that snippet includes a bunch of the ZWNJs that I just described. Especially at the end.

So I got rid of them with that second line of code. 

The replaceAll() method on the String class replaces each substring that matches the regular expression with a new String

In the code above, the regular expression looks like this: [\\p{Cf}]. That's the first parameter of replaceAll().

And the replacement string is just an empty string with nothing in it. That's the second parameter of replaceAll().

Now you probably have some questions about that regular expression so let me break it down.

That's actually Unicode regex. And if you look at this page, you'll see that the invisible formatting indicator is represented by \p{Cf}.

The \p notation in regex, by the way, indicates either a Unicode character or number. 

So why is there a double backslash in the regex? Because it's inside a Java String object. And, therefore, must be escaped.

Use similar code to that in your Java applications and you'll get rid of those pesky ZWNJs.

On the TypeScript Side

You can get away with doing almost the same thing in TypeScript:

let result: string = str.replace(/[\p{Cf}]/gu, '');

You already know what's going on in the "guts" of that regex. Let me explain everything else.

First, note that you don't have to escape the backslash here because the regex isn't in a string. So only one backslash is necessary.

The forward slashes that you see mark the beginning and end of the regex.

That gu at the end means two things. The g means that you're doing a global search. The u means that you're searching for Unicode characters.

And that's it. That will clear out the ZWNJs.

Wrapping It Up

There you have it. Two easy ways to get rid of those pesky ZWNJs from your strings.

So stop pulling your hair out and have fun!

Photo by Meru Bi from Pexels