Firstly, the golden rule of memory optimization should always be to start in your own code. I won’t go into any deeper details on JVM tweaking, there are plenty of articles out there if you google. Sufficient to say that if you find yourself tweaking around with GC strategy, then you have dug too deep. The JVM will pick the strategy that it deems best for your HW config and if you feel a need to change that your app is in some way badly designed. Only once have I encountered a situation where changing GC strategy was the only solution (involved a huge lucene index and limited memory), but in general it should never be required.
Some common indications of excessive memory usage includes:
- OutOfMemory error “Java heap space”
Simple enough, your application requires more memory than you actually have allocated
- OutOfMemory error “GC overhead limit exceeded”
Often comes hand in hand with the “Java heap space” error and it’s very similar but subtly different. In short, it means that while the JVM percieves that it may have enough memory in absolute terms, the rate of consumption is so high that the time it has to spend garbage collecting is leaving it with no time to actually execute your application. If this is the prevailing error you are getting, it’s an indication that you have a combination of lots and or very big long-lived objects in memory and a high rate of consumption of short-lived objects. This will force frequent and long-running full GC cycles since the “old gen” heap will be near full and “young gen” will reach it’s limit in very short cycles.
- Long periods when the system “freezes” up for full GC cycles
Often tied to high memory consumption rate (lots and lots of short-lived objects) and not enough CPU cores. The collector generally seems to favour dropping short-lived, de-referenced objects down to the “old gen” in the small cycles rather than throwing them out, letting the full GC cycle take care of them. Adding more memory to the JVM may decrease the frequency of these “freezes”, but unless you back that with more CPU power, the length each “freeze” will increase. In my personal opinion, a server-grade machine should always have more than 2 CPU cores, to at least be able to leverage the Parallell GC in the JVM
With this in mind, there are some configurations available in Tomcat that do have their specific usages and I thought I’d share my experiences/thoughts about a few of them. This applies primarily to Tomcat version 6 as that is where I have done the most experimenting and analysis.
This is a tomcat system property (http://tomcat.apache.org/tomcat-6.0-doc/config/systemprops.html). The documentation on it refers to the “tag buffer”, which may or may not mean anything to you. If not, this is what it is, in short:
Each time you use a taglib tag on your JSP page like a <c:set> JSTL tag or really any tag (except the <jsp:*> tags as they are not really “tags” as such and handled differently), a char buffer will be set up to receive the body of that tag. The Servlet/JSP specs demand that there be no limit to the size of a tag’s body, so this buffer can grow infinitly. Additionally, if you nest a tag inside another tag, an additional buffer is set up for the nested tag, and so on. These buffers are all maintained in a stack in the PageContext, but never actually dereferenced. Because of this, all these character buffers will contiue to live and be re-used by later requests.
What LIMIT_BUFFER does is to force Tomcat to discard the buffer before each usage if it’s larger than the default size (512 characters) and allocate a new buffer of the default size (which may of course grow if it’s not enough to handle the tag body).
When is this relevant
This is mainly an issue if you have tags with very large bodies. For instance, HTML fragment-caching tags (like oscache) or taglib-based templating frameworks (like tiles). If this is not the case, the sum of these buffers will be negligable. For instance, 400 worker-threads with an average tag nesting depth of 3 = 400*3*512 =~ 614 KB. But say you’re using tiles and you have a page that’s 1 MB large and 5 levels of templates. Then you’re looking at 2 GB of memory indefinitly allocated. And you have to consider the worst case since eventually, every worker-thread will have served that one large page at least once and without LIMIT_BUFFER, once those buffers have been sized up, they will never size down.
Two other system properties which have similar implications as LIMIT_BUFFER. Each worker thread (the threads that handles the actual requests in tomcat) will, by default, pool PageContext objects for use on JSP pages in a thread-local context. As I mentioned in connection to LIMIT_BUFFER, the tag buffers are maintained in a stack in the PageContext. By setting USE_POOL=false, Tomcat will discard a PageContext after the request it served is completed and create a new PageContext for each new request. Since this will, in effect, throw away all the tag buffers as well, the implications are very similar.
Why pool PageContexts on a per-thread basis to begin with, you might ask yourself? Simply because each time you do an include, or a forward, a new PageContext is required to handle the included page, even though it’s handled by the same thread as the “top level” request. This is analogous to nesting tags – each nesting requires a new buffer (“context”) in which to live. Because of this, one request may require more than on PageContext, so Tomcat pools them so it doesn’t have to recreate a bunch of new ones all the time.
When is this relevant
Sizing the pool is relevant primarily if your application relies heavily on dynamic includes (<jsp:include> or <c:import>) and you have many levels of “includes within includes”. In combination with large tag bodies, tweaking the pool size (or not pooling at all) can have a very significant impact on memory-usage.
There is definitly a trade-off to consider with both these settings. By limiting the pooling and size of the tag buffers, you reduce the amount of “base” memory needed, but at the cost of more garbage collection and re-allocation of objects. My own experiments with a real-life, tiles-based application gave me pretty much exactly the type of behaviour I expected. Limiting buffers would reduce the amount of “base memory” needed vastly (that is, the amount of heap still used after a full GC) but it would increase the CPU load significantly and reduce overall performance. In my case, the average response time nearly doubled. So if you’re going to limit buffer usage, be prepared to back that up with either more CPU cores or with more servers in your cluster if you are near your load limit to begin with.
However, “more memory, more CPU” may not always be an option. If you are forced (or you prefer) 32-bit environments, memory will be your bottleneck. Some Cloud services (like Amazon AWS) charges significantly more for 64-bit instances and CPU is cheaper to scale than memory on AWS as well. In other scenarious, you may have an application that already uses a lot of base memory (like big caches or indexes like Lucene) so having Tomcat eat up all your heap is simply not an option.