What is String interning in Java?

What does interning mean?

In a Java application, you can create a lot of String instances with the same content. Without String interning all of these would occupy separate memory areas as separate objects. If there are lots of them, this could mean a significant amount of memory.

String interning in Java is basically the way of storing only one copy of each distinct String value. As you will see, in a lot of cases it happens automatically, but in some cases, you have to do it manually by calling the String.intern() method.

How does it work?

Because Strings can be the same often, a so called String Pool is implemented in Java. The String Pool lives in the Heap memory and it stores each of the interned Strings in your application. The String Pool is privately maintained in the String object.

When you create a new String and intern it, Java checks if this String is already present in the String Pool. If it is there, a reference to that String object is returned and no new object is created. If it is not there, a new String object is created and stored in the String Pool and a reference to the new object is returned.

When should I use interning?

Automatic interning

As I mentioned, there are cases when interning happens automatically. Let me quote from the JavaDoc of the intern() method:

“All literal strings and string-valued constant expressions are interned.”

So the following two Strings will be interned automatically (first is a literal, second is a string-valued constant expression):

String a = "John";
String b = "Jane" + "Doe";

Manual interning

Other than the above-mentioned cases, interning does not happen automatically. Let’s take an example when you create two String variables and concatenate them into new variables:

String a = "John";
String a2 = "John";
String b = "Doe";
String c1 = a + b;
String c2 = a + b;

In this case, the “John” and “Doe” Strings are interned (stored in the String Pool), so only one object containing “John” and one with “Doe” is created.

However, the “JohnDoe” String is not interned and as a result, not stored in the String Pool. Because of this, the above code will create two String objects that contain the “JohnDoe” String.

To use interning manually, you can call the intern method on the resulting String like this:

String c1 = (a + b).intern();
String c2 = (a + b).intern();

This ensures that only one String object will be created with the content of “JohnDoe” and it will be stored in the String Pool.

Memory savings by interning

As I described above, interning allows you to reuse the same String objects multiple times. Of course, this results in less memory usage in the Heap space, where objects are stored.

The savings depend on your application. In small apps, you might not notice the difference, but in case of a lot of Strings the memory gain can be significant.

Example

I have put together a small test application to demonstrate that:

package com.jtuts;

import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

public class Main {

    public static void main(String[] args) throws InterruptedException {

        Thread.sleep(10000);
        List<List<String>> listOfLists = new ArrayList<>();

        IntStream.range(0, 10).forEach(i -> {
            try {

                System.out.println("Starting iteration " + i);

                List<String> list = IntStream.range(0, 100000).mapToObj(j -> {
                    String s1 = "ABCABCABCABCABCABCABCABCABCABCABCABCABC";
                    String s2 = "123123123123123123123123123123123123123";
                    return s1 + s2;
                }).collect(Collectors.toList());

                listOfLists.add(list);

                Thread.sleep(5000);
            } catch (InterruptedException e) {}
        });
    }

}

This application will attempt to create 10 x 100 000 quite long String objects. As you can see, I am creating two Strings via literals (these will be interned automatically), but then I concatenate them, creating a brand new String that will not be automatically interned.

I am collecting all these Strings in a List, so the Garbage Collector won’t remove any of them from the String Pool until the end of my application. I also added some delays so the memory allocation changes are more visible.

I started the application and also started up JConsole, and this is what I saw:

As you can see, my application is using more than 200 MB of Heap memory. That’s a lot. It is because we have created 1 million pretty big Strings and did not intern them.

Now let’s see what happens if I change the row where I return s1 + s2 to the following:

(s1 + s2).intern()

If I run the application again, now I get the following chart:

As you can see, the Heap memory usage doesn’t even reach 30 MB, which is way less than the first version.

The takeaway from this example is that if you have a large enough application that is for some reason producing a lot of Strings with the same content, then using interning can save you a lot of memory.

The effect of interning on comparison performance

As you probably know, when you want to compare two Strings by content you have to use the equals() method of the String object. This goes through the Strings and compares them character by character. In case of a lot of comparisons, this could take a considerable amount of time.

But wait! There is another way of comparing two objects. By using the double equals operator (==). However, the problem is, that we cannot use this for String content comparison as they only compare the reference of the objects, which might not be the same, even for Strings with the same content.

This example shows it to you very well:

String s1 = "John";
String s2 = "Doe";
String s3 = "JohnDoe";
String s4 = s1 + s2;

System.out.println(s3.equals(s4));
System.out.println(s3 == s4);
true
false

But what if we intern the Strings first and then do the double equals comparison? Exactly! That will work, because as I mentioned earlier, by interning you make sure that two Strings with the same content will refer to the same object in the String Pool.

String s1 = "John";
String s2 = "Doe";
String s3 = "JohnDoe";
String s4 = (s1 + s2).intern();

System.out.println(s3.equals(s4));
System.out.println(s3 == s4);
true
true

As you can see, this works and comparing two references if much faster than comparing two Strings character by character.

So, should I use interning all the time?

No, definitely not.

First of all, not all applications would benefit considerably from interning. For a noticeable benefit, the application has to handle a lot of large String objects that have the same content. I think that is rarely the case.

Secondly, using interning means a lot of overhead. You have to add the intern() method call to a lot of places and if you plan to use the double equals on your Strings for performance reasons, you have to be absolutely sure that intern() is used everywhere, where it is necessary. Because if you miss some occurrences, your comparisons could give you wrong results.

All in all, I think that by default interning should be skipped completely. However, if you suspect that your application could benefit considerably from using it, it is worth to check out the possible benefits.

How to monitor your local Java web application through JMX

Java Management Extensions (JMX) is a Java technology that supplies tools for monitoring applications. Monitoring a locally running Java web application is really straightforward.

You do not need any extra configuration to set up, you only need a tool called JConsole, and it’ll be able to see all your locally running JVMs and monitor them.

JConsole can be found in the bin directory inside your JDK’s installation directory. For me it is: C:\Program Files\Java\jdk1.8.0_91\bin 

Start jconsole.exe and you should see your local processes:

On this example screenshot, the first process is a Java web application which I started in a standalone tomcat. The second one is the process of JConsole itself.

Even you are running your application using the Maven tomcat plugin and not in a standalone tomcat, it’ll still work.

Select one of the processes and double click on it. You should see the management interface open up with a slew of useful statistics and information:

Troubleshooting

If you do not see any processes in the “Local Processes” list, please check out this post for a possible solution.

JConsole is not showing any local processes

JConsole is a great tool to monitor Java applications using JMX. One common issue that can happen, is that when you start JConsole, you don’t see any process in the “Local Processes” list.

Normally, you should at least see the process for JConsole itself:

If you do not see any process in the list, not even JConsole’s, then you probably have a permission issue. Let’s see how can that happen.

When you are running a JVM, it generates a log file for each of your JVM processes. These log files contain performance related information, and are by default stored in a folder called hsperfdata_yourusername under your operating system’s temp folder.

JConsole is using these log files, to show you the list of local processes. So if the log files do not exist, you will see an empty list under “Local Processes”. The reason why there are no log files for your processes could be, that the JVM does not have enough permissions to create them.

The easiest solution to this is to just delete this log folder and let the JVM recreate it with the correct permissions.

To find it in it’s default location on Windows, you can just check the contents of your TMP environment variable:

> echo %TMP%
C:\Users\Username\AppData\Local\Temp

Look for a folder called hsperfdata_yourusername in this temp diretory and simply delete it.

Relaunch JConsole and your applications that need monitoring, and you should now see your local processes.