Oracle index and SQL OR clause

So as it turns out, the SQL OR clause is pretty devious when it comes to index usage in Oracle.

Consider a table with two columns, lets call them A and B – both indexed to allow null values

CREATE INDEX MY_SCHEMA.IX_1 ON MY_SCHEMA.MY_TABLE (A, 1) TABLESPACE USERS;
CREATE INDEX MY_SCHEMA.IX_2 ON MY_SCHEMA.MY_TABLE (B, 1) TABLESPACE USERS;

Now consider the following queries

select * from MY_TABLE where A='a' or (A is null and B='b')

and

select * from MY_TABLE where A='a'
union
select * from MY_TABLE where A is null and B='b'

While logically identical, intuitively you would think that the first one is better from a performance viewpoint. Only one select, right? As it turns out, the second query is a LOT cheaper. In a similar real-world scenario I recently worked on optimizing, using an OR clause compared to a union resulted in an execution-plan with a calculated cost 10 times higher. Why?

The OR clause doesn’t always utilize indexes the way you would expect. In this case, the OR clause forced a full table-scan while the union could be executed by two index-scans.

So the lesson I learned is that when you are down and working to optimize your database operations, the presence of an OR clause should raise a red flag. It’s worth attempting to re-write the query as a union instead and indexing based on each separate union-query. Chances are, all of a sudden your queries will start hitting your indexes instead of causing full table-scans.

Advertisements

Understanding Tomcat Executor thread pooling

In the default configuration, Tomcat will always create a bounded worker-thread pool for each Connector (with max-size 200). Mostly, this is not something that you’ll need to change (other than maybe increasing the max threads to accomodate for higher load). However, like I discussed in my previous post, Tomcat has a propensity for caching a lot of scaffolding objects (like PageContext and tag buffers) in thread-local context in each worker thread. Because of this, there are instances where you might want Tomcat to be able to close threads down to clean out some memory. Also, having each connector maintaining it’s own pool makes it harder to set a firm top-limit on what load your server will accept. The answer to this is to use a shared Executor.

By having all connectors share the same executor, you can configure with more predictability how many simultaneous requests that is allowed to run across your entire application. The Executor also brings the ability of having a thread-pool that can shrink as well as grow to accomodate load. At least in theory…

org.apache.catalina.core.StandardThreadExecutor

The standard, built-in executor that Tomcat uses by default is the StandardThreadExecutor. The configuration is documented here: http://tomcat.apache.org/tomcat-6.0-doc/config/executor.html
The configuration options include the somewhat misnamed paremter “maxIdleTime” and here is what you need to be aware of regarding the standard executor and closing idle threads.

The standard executor internally uses a java.util.concurrent.ThreadPoolExecutor. This works (somewhat simplified) by having a variable size pool of worker-threads that, once they have completed a task, will wait on a blocking queue until a new task is entered. Or until it has waited for a set amount of time, in which case it will have “timed out” and the thread will be closed. The crux of this is that since the first thread to complete a task will be first in line to get a new task, the pool will behave in a First-In-First-Out (FIFO) way. This is important to keep in mind when we examine how this will affect the Tomcat executor.

maxIdleTime is really minIdleTime

Because of the FIFO behaviour of the java ThreadPoolExecutor, each thread will at minimum wait for a new task for “maxIdleTime” before being eligable for closure. Moreover, again because of the FIFO behaviour of the thread pool, for a thread to be closed it’s required that a period of time at least equal to maxIdleTime passes without ANY request coming in, since the thread that has been idle the longest will be first in line for a new task. The effect is that the executor will not really be able to size the pool to fit the average load (concurrent requests), it will rather be sized according to the rate at which requests come in. This may sound like a distinction without a difference but in terms of a web-server, it’s quite significant. For example, 40 requests come in at the same time. The thread-pool will be expanded to 40 to accomodate the load. After that, you have a period where only one request comes in at a time. Say each request takes 500 ms to complete, that means it would take 20 seconds to cycle through the entire thread-pool (remember, FIFO). Unless you have your maxIdleTime set to less than 20 seconds, the pool will continue to hold 40 threads indefinitly, even though the concurrent load is never more than 1. And you don’t want to set your maxIdleTime too low either – that will risk flapping behaviour where threads are killed too soon.

Conclusions

To get a more predictable thread-pooling behaviour that attempts to size to average load rather than to rate of requests coming in, it would be preferable to have an executor that worked on a “Last-In-First-Out” (LIFO) basis. If the pool would always assign the thread that had been idle to SHORTEST period of time to incoming tasks, the server would be better equipped to close down threads during periods of lower load (and in a more predictable manner). In the very simplistic example above, the initial load of 40 followed by a period with a load of 1, a LIFO pool would correctly size down to 1 after the maxIdleTime period. Of course, it may not always be required (or desired) to have such an aggressive puring strategy but if your goal is to minimize the amount of resources reserved by Tomcat, the standard executor might unfortunately not be able do what you expect it to do for you.

Some notes on tweaking Tomcat memory usage

Firstly, the golden rule of memory optimization should always be to start in your own code. I won’t go into any deeper details on JVM tweaking, there are plenty of articles out there if you google. Sufficient to say that if you find yourself tweaking around with GC strategy, then you have dug too deep. The JVM will pick the strategy that it deems best for your HW config and if you feel a need to change that your app is in some way badly designed. Only once have I encountered a situation where changing GC strategy was the only solution (involved a huge lucene index and limited memory), but in general it should never be required.

Some common indications of excessive memory usage includes:

  1. OutOfMemory error “Java heap space”
    Simple enough, your application requires more memory than you actually have allocated
  2. OutOfMemory error “GC overhead limit exceeded”
    Often comes hand in hand with the “Java heap space” error and it’s very similar but subtly different. In short, it means that while the JVM percieves that it may have enough memory in absolute terms, the rate of consumption is so high that the time it has to spend garbage collecting is leaving it with no time to actually execute your application. If this is the prevailing error you are getting, it’s an indication that you have a combination of lots and or very big long-lived objects in memory and a high rate of consumption of short-lived objects. This will force frequent and long-running full GC cycles since the “old gen” heap will be near full and “young gen” will reach it’s limit in very short cycles.
  3. Long periods when the system “freezes” up for full GC cycles
    Often tied to high memory consumption rate (lots and lots of short-lived objects) and not enough CPU cores. The collector generally seems to favour dropping short-lived, de-referenced objects down to the “old gen” in the small cycles rather than throwing them out, letting the full GC cycle take care of them. Adding more memory to the JVM may decrease the frequency of these “freezes”, but unless you back that with more CPU power, the length each “freeze” will increase. In my personal opinion, a server-grade machine should always have more than 2 CPU cores, to at least be able to leverage the Parallell GC in the JVM

With this in mind, there are some configurations available in Tomcat that do have their specific usages and I thought I’d share my experiences/thoughts about a few of them. This applies primarily to Tomcat version 6 as that is where I have done the most experimenting and analysis.

org.apache.jasper.runtime.BodyContentImpl.LIMIT_BUFFER=true

This is a tomcat system property (http://tomcat.apache.org/tomcat-6.0-doc/config/systemprops.html). The documentation on it refers to the “tag buffer”, which may or may not mean anything to you. If not, this is what it is, in short:

Each time you use a taglib tag on your JSP page like a <c:set> JSTL tag or really any tag (except the <jsp:*> tags as they are not really “tags” as such and handled differently), a char buffer will be set up to receive the body of that tag. The Servlet/JSP specs demand that there be no limit to the size of a tag’s body, so this buffer can grow infinitly. Additionally, if you nest a tag inside another tag, an additional buffer is set up for the nested tag, and so on. These buffers are all maintained in a stack in the PageContext, but never actually dereferenced. Because of this, all these character buffers will contiue to live and be re-used by later requests.

What LIMIT_BUFFER does is to force Tomcat to discard the buffer before each usage if it’s larger than the default size (512 characters) and allocate a new buffer of the default size (which may of course grow if it’s not enough to handle the tag body).

When is this relevant

This is mainly an issue if you have tags with very large bodies. For instance, HTML fragment-caching tags (like oscache) or taglib-based templating frameworks (like tiles). If this is not the case, the sum of these buffers will be negligable. For instance, 400 worker-threads with an average tag nesting depth of 3 = 400*3*512 =~ 614 KB. But say you’re using tiles and you have a page that’s 1 MB large and 5 levels of templates.  Then you’re looking at 2 GB of memory indefinitly allocated. And you have to consider the worst case since eventually, every worker-thread will have served that one large page at least once and without LIMIT_BUFFER, once those buffers have been sized up, they will never size down.

org.apache.jasper.runtime.JspFactoryImpl.USE_POOL and POOL_SIZE

Two other system properties which have similar implications as LIMIT_BUFFER. Each worker thread (the threads that handles the actual requests in tomcat) will, by default, pool PageContext objects for use on JSP pages in a thread-local context. As I mentioned in connection to LIMIT_BUFFER, the tag buffers are maintained in a stack in the PageContext. By setting USE_POOL=false, Tomcat will discard a PageContext after the request it served is completed and create a new PageContext for each new request. Since this will, in effect, throw away all the tag buffers as well, the implications are very similar.

Why pool PageContexts on a per-thread basis to begin with, you might ask yourself? Simply because each time you do an include, or a forward, a new PageContext is required to handle the included page, even though it’s handled by the same thread as the “top level” request. This is analogous to nesting tags – each nesting requires a new buffer (“context”) in which to live. Because of this, one request may require more than on PageContext, so Tomcat pools them so it doesn’t have to recreate a bunch of new ones all the time.

When is this relevant

Sizing the pool is relevant primarily if your application relies heavily on dynamic includes (<jsp:include> or <c:import>) and you have many levels of “includes within includes”. In combination with large tag bodies, tweaking the pool size (or not pooling at all) can have a very significant impact on memory-usage.

Conclusions

There is definitly a trade-off to consider with both these settings. By limiting the pooling and size of the tag buffers, you reduce the amount of “base” memory needed, but at the cost of more garbage collection and re-allocation of objects. My own experiments with a real-life, tiles-based application gave me pretty much exactly the type of behaviour I expected. Limiting buffers would reduce the amount of “base memory” needed vastly (that is, the amount of heap still used after a full GC) but it would increase the CPU load significantly and reduce overall performance. In my case, the average response time nearly doubled. So if you’re going to limit buffer usage, be prepared to back that up with either more CPU cores or with more servers in your cluster if you are near your load limit to begin with.

However, “more memory, more CPU” may not always be an option. If you are forced (or you prefer) 32-bit environments, memory will be your bottleneck. Some Cloud services (like Amazon AWS) charges significantly more for 64-bit instances and CPU is cheaper to scale than memory on AWS as well. In other scenarious, you may have an application that already uses a lot of base memory (like big caches or indexes like Lucene) so having Tomcat eat up all your heap is simply not an option.