What does interning mean?
In a Java application, you can create a lot of String instances with the same content. Without String interning all of these would occupy separate memory areas as separate objects. If there are lots of them, this could mean a significant amount of memory.
String interning in Java is basically the way of storing only one copy of each distinct String value. As you will see, in a lot of cases it happens automatically, but in some cases, you have to do it manually by calling the String.intern()
method.
How does it work?
Because Strings can be the same often, a so called String Pool is implemented in Java. The String Pool lives in the Heap memory and it stores each of the interned Strings in your application. The String Pool is privately maintained in the String object.
When you create a new String and intern it, Java checks if this String is already present in the String Pool. If it is there, a reference to that String object is returned and no new object is created. If it is not there, a new String object is created and stored in the String Pool and a reference to the new object is returned.
When should I use interning?
Automatic interning
As I mentioned, there are cases when interning happens automatically. Let me quote from the JavaDoc of the intern()
method:
“All literal strings and string-valued constant expressions are interned.”
So the following two Strings will be interned automatically (first is a literal, second is a string-valued constant expression):
String a = "John"; String b = "Jane" + "Doe";
Manual interning
Other than the above-mentioned cases, interning does not happen automatically. Let’s take an example when you create two String variables and concatenate them into new variables:
String a = "John"; String a2 = "John"; String b = "Doe"; String c1 = a + b; String c2 = a + b;
In this case, the “John” and “Doe” Strings are interned (stored in the String Pool), so only one object containing “John” and one with “Doe” is created.
However, the “JohnDoe” String is not interned and as a result, not stored in the String Pool. Because of this, the above code will create two String objects that contain the “JohnDoe” String.
To use interning manually, you can call the intern method on the resulting String like this:
String c1 = (a + b).intern(); String c2 = (a + b).intern();
This ensures that only one String object will be created with the content of “JohnDoe” and it will be stored in the String Pool.
Memory savings by interning
As I described above, interning allows you to reuse the same String objects multiple times. Of course, this results in less memory usage in the Heap space, where objects are stored.
The savings depend on your application. In small apps, you might not notice the difference, but in case of a lot of Strings the memory gain can be significant.
Example
I have put together a small test application to demonstrate that:
package com.jtuts; import java.util.ArrayList; import java.util.List; import java.util.stream.Collectors; import java.util.stream.IntStream; public class Main { public static void main(String[] args) throws InterruptedException { Thread.sleep(10000); List<List<String>> listOfLists = new ArrayList<>(); IntStream.range(0, 10).forEach(i -> { try { System.out.println("Starting iteration " + i); List<String> list = IntStream.range(0, 100000).mapToObj(j -> { String s1 = "ABCABCABCABCABCABCABCABCABCABCABCABCABC"; String s2 = "123123123123123123123123123123123123123"; return s1 + s2; }).collect(Collectors.toList()); listOfLists.add(list); Thread.sleep(5000); } catch (InterruptedException e) {} }); } }
This application will attempt to create 10 x 100 000 quite long String objects. As you can see, I am creating two Strings via literals (these will be interned automatically), but then I concatenate them, creating a brand new String that will not be automatically interned.
I am collecting all these Strings in a List, so the Garbage Collector won’t remove any of them from the String Pool until the end of my application. I also added some delays so the memory allocation changes are more visible.
I started the application and also started up JConsole, and this is what I saw:
As you can see, my application is using more than 200 MB of Heap memory. That’s a lot. It is because we have created 1 million pretty big Strings and did not intern them.
Now let’s see what happens if I change the row where I return s1 + s2
to the following:
(s1 + s2).intern()
If I run the application again, now I get the following chart:
As you can see, the Heap memory usage doesn’t even reach 30 MB, which is way less than the first version.
The takeaway from this example is that if you have a large enough application that is for some reason producing a lot of Strings with the same content, then using interning can save you a lot of memory.
The effect of interning on comparison performance
As you probably know, when you want to compare two Strings by content you have to use the equals()
method of the String object. This goes through the Strings and compares them character by character. In case of a lot of comparisons, this could take a considerable amount of time.
But wait! There is another way of comparing two objects. By using the double equals operator (==). However, the problem is, that we cannot use this for String content comparison as they only compare the reference of the objects, which might not be the same, even for Strings with the same content.
This example shows it to you very well:
String s1 = "John"; String s2 = "Doe"; String s3 = "JohnDoe"; String s4 = s1 + s2; System.out.println(s3.equals(s4)); System.out.println(s3 == s4);
true false
But what if we intern the Strings first and then do the double equals comparison? Exactly! That will work, because as I mentioned earlier, by interning you make sure that two Strings with the same content will refer to the same object in the String Pool.
String s1 = "John"; String s2 = "Doe"; String s3 = "JohnDoe"; String s4 = (s1 + s2).intern(); System.out.println(s3.equals(s4)); System.out.println(s3 == s4);
true true
As you can see, this works and comparing two references if much faster than comparing two Strings character by character.
So, should I use interning all the time?
No, definitely not.
First of all, not all applications would benefit considerably from interning. For a noticeable benefit, the application has to handle a lot of large String objects that have the same content. I think that is rarely the case.
Secondly, using interning means a lot of overhead. You have to add the intern()
method call to a lot of places and if you plan to use the double equals on your Strings for performance reasons, you have to be absolutely sure that intern()
is used everywhere, where it is necessary. Because if you miss some occurrences, your comparisons could give you wrong results.
All in all, I think that by default interning should be skipped completely. However, if you suspect that your application could benefit considerably from using it, it is worth to check out the possible benefits.