Understanding Tomcat Executor thread pooling

In the default configuration, Tomcat will always create a bounded worker-thread pool for each Connector (with max-size 200). Mostly, this is not something that you’ll need to change (other than maybe increasing the max threads to accomodate for higher load). However, like I discussed in my previous post, Tomcat has a propensity for caching a lot of scaffolding objects (like PageContext and tag buffers) in thread-local context in each worker thread. Because of this, there are instances where you might want Tomcat to be able to close threads down to clean out some memory. Also, having each connector maintaining it’s own pool makes it harder to set a firm top-limit on what load your server will accept. The answer to this is to use a shared Executor.

By having all connectors share the same executor, you can configure with more predictability how many simultaneous requests that is allowed to run across your entire application. The Executor also brings the ability of having a thread-pool that can shrink as well as grow to accomodate load. At least in theory…

org.apache.catalina.core.StandardThreadExecutor

The standard, built-in executor that Tomcat uses by default is the StandardThreadExecutor. The configuration is documented here: http://tomcat.apache.org/tomcat-6.0-doc/config/executor.html
The configuration options include the somewhat misnamed paremter “maxIdleTime” and here is what you need to be aware of regarding the standard executor and closing idle threads.

The standard executor internally uses a java.util.concurrent.ThreadPoolExecutor. This works (somewhat simplified) by having a variable size pool of worker-threads that, once they have completed a task, will wait on a blocking queue until a new task is entered. Or until it has waited for a set amount of time, in which case it will have “timed out” and the thread will be closed. The crux of this is that since the first thread to complete a task will be first in line to get a new task, the pool will behave in a First-In-First-Out (FIFO) way. This is important to keep in mind when we examine how this will affect the Tomcat executor.

maxIdleTime is really minIdleTime

Because of the FIFO behaviour of the java ThreadPoolExecutor, each thread will at minimum wait for a new task for “maxIdleTime” before being eligable for closure. Moreover, again because of the FIFO behaviour of the thread pool, for a thread to be closed it’s required that a period of time at least equal to maxIdleTime passes without ANY request coming in, since the thread that has been idle the longest will be first in line for a new task. The effect is that the executor will not really be able to size the pool to fit the average load (concurrent requests), it will rather be sized according to the rate at which requests come in. This may sound like a distinction without a difference but in terms of a web-server, it’s quite significant. For example, 40 requests come in at the same time. The thread-pool will be expanded to 40 to accomodate the load. After that, you have a period where only one request comes in at a time. Say each request takes 500 ms to complete, that means it would take 20 seconds to cycle through the entire thread-pool (remember, FIFO). Unless you have your maxIdleTime set to less than 20 seconds, the pool will continue to hold 40 threads indefinitly, even though the concurrent load is never more than 1. And you don’t want to set your maxIdleTime too low either – that will risk flapping behaviour where threads are killed too soon.

Conclusions

To get a more predictable thread-pooling behaviour that attempts to size to average load rather than to rate of requests coming in, it would be preferable to have an executor that worked on a “Last-In-First-Out” (LIFO) basis. If the pool would always assign the thread that had been idle to SHORTEST period of time to incoming tasks, the server would be better equipped to close down threads during periods of lower load (and in a more predictable manner). In the very simplistic example above, the initial load of 40 followed by a period with a load of 1, a LIFO pool would correctly size down to 1 after the maxIdleTime period. Of course, it may not always be required (or desired) to have such an aggressive puring strategy but if your goal is to minimize the amount of resources reserved by Tomcat, the standard executor might unfortunately not be able do what you expect it to do for you.

Some notes on tweaking Tomcat memory usage

Firstly, the golden rule of memory optimization should always be to start in your own code. I won’t go into any deeper details on JVM tweaking, there are plenty of articles out there if you google. Sufficient to say that if you find yourself tweaking around with GC strategy, then you have dug too deep. The JVM will pick the strategy that it deems best for your HW config and if you feel a need to change that your app is in some way badly designed. Only once have I encountered a situation where changing GC strategy was the only solution (involved a huge lucene index and limited memory), but in general it should never be required.

Some common indications of excessive memory usage includes:

  1. OutOfMemory error “Java heap space”
    Simple enough, your application requires more memory than you actually have allocated
  2. OutOfMemory error “GC overhead limit exceeded”
    Often comes hand in hand with the “Java heap space” error and it’s very similar but subtly different. In short, it means that while the JVM percieves that it may have enough memory in absolute terms, the rate of consumption is so high that the time it has to spend garbage collecting is leaving it with no time to actually execute your application. If this is the prevailing error you are getting, it’s an indication that you have a combination of lots and or very big long-lived objects in memory and a high rate of consumption of short-lived objects. This will force frequent and long-running full GC cycles since the “old gen” heap will be near full and “young gen” will reach it’s limit in very short cycles.
  3. Long periods when the system “freezes” up for full GC cycles
    Often tied to high memory consumption rate (lots and lots of short-lived objects) and not enough CPU cores. The collector generally seems to favour dropping short-lived, de-referenced objects down to the “old gen” in the small cycles rather than throwing them out, letting the full GC cycle take care of them. Adding more memory to the JVM may decrease the frequency of these “freezes”, but unless you back that with more CPU power, the length each “freeze” will increase. In my personal opinion, a server-grade machine should always have more than 2 CPU cores, to at least be able to leverage the Parallell GC in the JVM

With this in mind, there are some configurations available in Tomcat that do have their specific usages and I thought I’d share my experiences/thoughts about a few of them. This applies primarily to Tomcat version 6 as that is where I have done the most experimenting and analysis.

org.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true

This is a tomcat system property (http://tomcat.apache.org/tomcat-6.0-doc/config/systemprops.html). The documentation on it refers to the “tag buffer”, which may or may not mean anything to you. If not, this is what it is, in short:

Each time you use a taglib tag on your JSP page like a <c:set> JSTL tag or really any tag (except the <jsp:*> tags as they are not really “tags” as such and handled differently), a char buffer will be set up to receive the body of that tag. The Servlet/JSP specs demand that there be no limit to the size of a tag’s body, so this buffer can grow infinitly. Additionally, if you nest a tag inside another tag, an additional buffer is set up for the nested tag, and so on. These buffers are all maintained in a stack in the PageContext, but never actually dereferenced. Because of this, all these character buffers will contiue to live and be re-used by later requests.

What LIMIT_BUFFER does is to force Tomcat to discard the buffer before each usage if it’s larger than the default size (512 characters) and allocate a new buffer of the default size (which may of course grow if it’s not enough to handle the tag body).

When is this relevant

This is mainly an issue if you have tags with very large bodies. For instance, HTML fragment-caching tags (like oscache) or taglib-based templating frameworks (like tiles). If this is not the case, the sum of these buffers will be negligable. For instance, 400 worker-threads with an average tag nesting depth of 3 = 400*3*512 =~ 614 KB. But say you’re using tiles and you have a page that’s 1 MB large and 5 levels of templates.  Then you’re looking at 2 GB of memory indefinitly allocated. And you have to consider the worst case since eventually, every worker-thread will have served that one large page at least once and without LIMIT_BUFFER, once those buffers have been sized up, they will never size down.

org.apache.jasper.runtime.JspFactoryImpl.USE_POOL and POOL_SIZE

Two other system properties which have similar implications as LIMIT_BUFFER. Each worker thread (the threads that handles the actual requests in tomcat) will, by default, pool PageContext objects for use on JSP pages in a thread-local context. As I mentioned in connection to LIMIT_BUFFER, the tag buffers are maintained in a stack in the PageContext. By setting USE_POOL=false, Tomcat will discard a PageContext after the request it served is completed and create a new PageContext for each new request. Since this will, in effect, throw away all the tag buffers as well, the implications are very similar.

Why pool PageContexts on a per-thread basis to begin with, you might ask yourself? Simply because each time you do an include, or a forward, a new PageContext is required to handle the included page, even though it’s handled by the same thread as the “top level” request. This is analogous to nesting tags – each nesting requires a new buffer (“context”) in which to live. Because of this, one request may require more than on PageContext, so Tomcat pools them so it doesn’t have to recreate a bunch of new ones all the time.

When is this relevant

Sizing the pool is relevant primarily if your application relies heavily on dynamic includes (<jsp:include> or <c:import>) and you have many levels of “includes within includes”. In combination with large tag bodies, tweaking the pool size (or not pooling at all) can have a very significant impact on memory-usage.

Conclusions

There is definitly a trade-off to consider with both these settings. By limiting the pooling and size of the tag buffers, you reduce the amount of “base” memory needed, but at the cost of more garbage collection and re-allocation of objects. My own experiments with a real-life, tiles-based application gave me pretty much exactly the type of behaviour I expected. Limiting buffers would reduce the amount of “base memory” needed vastly (that is, the amount of heap still used after a full GC) but it would increase the CPU load significantly and reduce overall performance. In my case, the average response time nearly doubled. So if you’re going to limit buffer usage, be prepared to back that up with either more CPU cores or with more servers in your cluster if you are near your load limit to begin with.

However, “more memory, more CPU” may not always be an option. If you are forced (or you prefer) 32-bit environments, memory will be your bottleneck. Some Cloud services (like Amazon AWS) charges significantly more for 64-bit instances and CPU is cheaper to scale than memory on AWS as well. In other scenarious, you may have an application that already uses a lot of base memory (like big caches or indexes like Lucene) so having Tomcat eat up all your heap is simply not an option.

Annotation-driven aspects and Spring MVC

The built-in AspectJ support is a very strong selling-point for Spring and I’m a huge fan of aspect-oriented elegance. Generally though, I tend to approach it with what I consider to be a healthy dose of respect. It’s an immensly powerful tool, to be sure, but it also comes with the risk of making your system very hard to predict and to ultimately understand. Many IDE:s (like Eclipse) have good tooling available for resolving aspect applications at design-time, but it still makes it a lot more difficult for other programmers (and, ultimately, yourself) to reconstruct code-paths by just following method-calls.

One of my favourite patterns has become the annotation-driven aspect. Springs AspectJ supports pointcut declarations based on custom method or class annotations and to me, it’s the best of both worlds. Using annotations makes it clear which aspects are applied to which methods or classes, while at the same time giving you the non-intrusiveness if aspect oriented programming so you don’t have to rely on the tight coupling of inheritance or boiler-plate template code inside your methods.

When this pattern really comes to shine for me is in combination with Spring MVC. Even though request handlers in Spring MVC 2.5 are not forced to comply to a given interface, they nevertheless have a strong convention supported by the framework, ie a method returning a String, indicating a “redirect:” or a view mapping. Coupled together with the DispatcherServlets default behaviour of maintaining request attribute collections (and, by extension, the request itself) in a thread-local context, all relevant pieces of information are in place. So the basic pattern becomes:

  • A custom annotation and an aspect with a pointcut applied to all request handler methods with that annotation
  • An advise at that pointcut with the ability to access the current request parameters and modify it’s attributes, through the thread-local context
  • The ability to override which view or redirect to continue to before or after the controller
  • Any other processing you require

The same can be accomplished with interceptors, but due to the configuration limitation in Spring MVC, you can’t explicitly specify on a controller level which interceptors should be applied, so I find that the annotation-driven aspects becomes both clearer and much more flexible.

Other frameworks have different ways of enabling similar patterns. Struts 2 lets you specify the interceptor-stack by annotating your actions (in fact, it’s the core pattern of the entire framework). Many component-oriented frameworks (like Tapestry, Wicket, JSF etc) will give you secondary “hooks” into different steps of the component and request lifecycle where you can inject common functionality in a similar manner. With this pattern, you get the same power in Spring MVC.

A very simple example, a Trackable controller:

In this example, a controller wishes to register a request with some internal tracking service based on the existence of a cookie, whith the option of including a username, if one exists in the session.

First, the annotation:


@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.METHOD)
public @interface Trackable {
   String trackAction();
}

Second, annotate your controller


@Controller
public class SomeController {

   @RequestMapping("/some/url.do")
   @Trackable(trackAction="someUrlTrackAction")
   public String someUrlHandler() {
      return url;
   }
}

Lastly, the aspect that binds it all together

@Service
@Aspect
public class TrackableAspect {
   @Autowired
   private TrackingService trackingService;

   @Pointcut("@annotation(Trackable) && @annotation(org.springframework.web.bind.annotation.RequestMapping)")
   public void trackablePointcut() {
   }

   @Around("trackablePointcut()")
   public Object trackableAdvise(ProceedingJoinPoint pjp) {
      // RequestContextHolder managed by Spring MVC DispatcherServlet
      HttpServletRequest request = ((ServletRequestAttributes) RequestContextHolder.getRequestAttributes()).getRequest();
      // Spring WebUtils
      Cookie cookie = WebUtils.getCookie(request, "trackingCookie");
      if (cookie != null) {
         // Get the trackAction specified on the annotation
         String trackAction = ((MethodSignature) pjp.getSignature()).getMethod().getAnnotation(Trackable.class).trackAction();
         String trackingCode = cookieookie.getValue();
         String userName = request.getSession().getAttribute("userName");
         if (userName != null) {
            trackingService.trackLoggedInAction(trackAction, trackingCode, userName);
         } else {
            trackingService.trackAction(trackAction, trackingCode);
         }
      }
      // continue with normal execution
      return pjp.proceed();
   }
}

As you can see, the around-advice will in this example always proceed with normal execution (ie, the @RequestMapping handler method) but there is nothing preventing you from ignoring normal execution in your advise and instead return some other value (which will then appear to the caller as the value returned from the advised method). As mentioned already, Spring MVC works on a convention-driven pattern so a handler method returning a String will forward execution to the view matching that string, or a String starting with “redirect:” will cause Spring MVC to send a client-side redirect response back to the browser.