So you've got an HTML string and you want your Java application to escape it? No problem, I'll show you how to do that here.

In fact, I'll show you how to do that with the aid of a friend from the Apache tribe.

But first, an overview on what the yark it means to escape HTML.

Escape Artist

Let's say you've got the following HTML:

<html lang="en-US">
<head>
    <meta http-equiv="content-type" content="text/html; charset=utf-8" />
    <link rel="shortcut icon" href="https://careydevelopment.us/img/branding/careydevelopment-favicon.png" type="image/x-icon">
</head>

In fact, I actually do have that HTML because it's part of the beginning of my blog's homepage.

The point here is that those tags you see ("<html>", "<head>", etc.) have meaning to your browser. In fact, they have so much meaning that your browser won't even show those tags on the screen.

Go to my homepage right now and you'll see what I'm talking about. You won't find the code above on the page unless you view source.

But sometimes you want to spit out HTML for an end-user to see. And you want to display it within a browser.

In that situation, you don't want the browser to interpret the HTML but rather just display the raw source.

That's why you escape it. 

But how? 

With the assitance of entity names.

Note that the tags are bounded by less-than/more-than signs. If you use entity names instead of the raw characters for that bounding, your browser will just display the text instead of interpreting the tags.

So replace < with the entity name &lt;. And replace > with the entity name &gt;.

Then your browser will display what I've shown you above. 

But note that escaping an HTML string will involve more than just replacing less-than and more-than signs. It will escape other characters as well.

The quote sign will become &quot; for example.

Here's what the whole thing looks like if you print out the escaped HTML to your console:

&lt;html lang=&quot;en-US&quot;&gt;
&lt;head&gt;
    &lt;meta http-equiv=&quot;content-type&quot; content=&quot;text/html; charset=utf-8&quot; /&gt;
    &lt;link rel=&quot;shortcut icon&quot; href=&quot;https://careydevelopment.us/img/branding/careydevelopment-favicon.png&quot; type=&quot;image/x-icon&quot;&gt;
&lt;/head&gt;

Now that you know a little more about escaping HTML, let me show you how to do it in Java.

Going Native

Apache is a software foundation named after a Native American tribe from the southwestern part of the United States.

The (community-led) developers who produce software for Apache make some really cool tools that you can use to expedite the development process.

One of those tools enables you to escape HTML in a string. All you have to do is include the right dependency in your POM file.

Speaking of that dependency, here's what it looks like:

<dependency>
	<groupId>org.apache.commons</groupId>
	<artifactId>commons-text</artifactId>
	<version>1.9</version>
</dependency>

Just plop that in your POM and then proceed as follows.

As Follows

Now write this code:

String html = "<html lang=\"en-US\">\r\n"
        + "<head>\r\n"
        + "    <meta http-equiv=\"content-type\" content=\"text/html; charset=utf-8\" />\r\n"
        + "    <link rel=\"shortcut icon\" href=\"https://careydevelopment.us/img/branding/careydevelopment-favicon.png\" type=\"image/x-icon\">\r\n"
        + "</head>";

String output = StringEscapeUtils.escapeHtml4(html);

System.out.println(output);

The String definition at the beginning of that code block is nothing to write home about. It's just the HTML snippet you saw in the first section.

It's the second line that does the escaping.

It does that with the aid of the StringEscapeUtils class from Apache. That class includes several static methods, one of which is escapteHtml4().

That method does exactly what its name makes you think it does. It escapes the input HTML and produces an escaped String.

But what's that 4 all about?

That means it supports HTML 4.0 entities. But even if you're using HTML5 (and you probably are), it should still do the trick.

Now if you run that code above, you'll get the escaped output you saw in the first section.

Going Back In

You can also unescape the escaped string with the assistance of that same Java class.

Here's how to make that happen:

String html = "<html lang=\"en-US\">\r\n"
        + "<head>\r\n"
        + "    <meta http-equiv=\"content-type\" content=\"text/html; charset=utf-8\" />\r\n"
        + "    <link rel=\"shortcut icon\" href=\"https://careydevelopment.us/img/branding/careydevelopment-favicon.png\" type=\"image/x-icon\">\r\n"
        + "</head>";

String output = StringEscapeUtils.escapeHtml4(html);

String original = StringEscapeUtils.unescapeHtml4(output);

System.out.println(original);

Note the use of the unescapeHtml4() method right after the escapeHtml4() method.

Run that code and you'll get this output:

<html lang="en-US">
<head>
    <meta http-equiv="content-type" content="text/html; charset=utf-8" />
    <link rel="shortcut icon" href="https://careydevelopment.us/img/branding/careydevelopment-favicon.png" type="image/x-icon">
</head>

Back to Square One.

Wrapping It Up

That wasn't so bad, was it?

Now you know how to escape HTML with Java code.

Feel free to take what you've learned here and put it to work in your own applications.

Have fun!

Photo by Andrea Piacquadio from Pexels