Working Java Web Servers
September 04, 2010
Having worked with a number of different Java Web Servers, I've run into a number of 'gotchas' over the years. I'm going to go over a few of them here to help myself remember them. I might add to this list in the future, so this page might be a little dis-jointed with some 'high-level' items, and some pretty 'low-level' items. However, I'll try and be as clear as I can, and hopefully this will help somebody else out as well.
1. Working with 3rd Parties. If you (or you project manager) have promised the business 99.99% uptime for your website, you must be absolutely sure that any external services (3rd parties) that you web application calls guarantee 99.999% uptime. If part of you application is dependant on information from a third party, and they are not consistently responding to requests, you won't be responding to your users. Have good SLA's (Service Level Agreements) in place with your third party services that specify procedures to follow in the event of a service interruption, the responsibilities of each business at all times, and even penalties for poor service or lack of propery procedure.
2. Transaction Management. Invariably, you will likely run into some kind of issue related to how a transaction or set of transactions are managed in your code. Keep you eye open for calls to 3rd party services inside a transaction. Having a slowly responding 3rd party service nested inside a transaction in your web application can bring it to a grinding halt, especially if it's part of the critial process flow of the application (like a signup). Try to atomize transactions as much as possible. Sometimes complex business rules make this difficult, so ensure that someone (your project manager?) can dialogue with the business to work on solutions with them.
One poorly crafted database query nested in a transaction can be just as deterimental to your web app's performance. Make sure you've either done some query profiling or had a DBA review your query to ensure optimal performance.
3. Database Tuning. As soon as your application has a database of any decent size, you need to ensure you have indexes in the right places. A lack of indexes in a big database can bring your web application to it's knees. You can throw all the hardware at it you want, the slow performance isn't going to go away until you add some meaningful indexes. Also, database auditing tools can have a potentially large negetive impact on performance in production, especially when they're targeting key tables. Consideration needs to be made when changes to audited tables are made. Auditing tools have been known to be somewhat grumpy about having to re-index/cache a new column on the fly.
4. Data Archiving. Somewhat related to the last point, if you application gets to be large enough, you'll have to think about archiving some of your data. In other words, taking some of you production data that isn't accessed so often anymore - data that is 6 months to a year old, for example - and moving it to a database outside your production environment. There it can be available to be used by your business intelligence team for reporting in conjuction with your production data, without hampering performance in production. The business should determine what is a reasonable amount of time for data to be valid in production. You can communicate to your end users the new policy and suddenly your users are zipping through your web application at light speed, now that they aren't dredging up records from 18 months ago.
5. Memory Management. Java applications have a reputation for being memory hogs. So it's no surprise that many web developers end up learning how to set the min and max heap size for their jboss or tomcat servers in the JAVA_OPTS property. If you find yourself running out of memory and you haven't looked at these setting yet, you should consider it. They can be overridden in the setclasspath.sh file in the
*************************UPDATE TO THIS ADVICE ABOVE!!!****************************
Most of these servers ship with preset (default) Xmx and Xms settings (min and max heap size settings). It turns out that if you set these too high, you can run into Out of Memory exceptions as well, even if you server seems to have enough memory. The best way to solve this problem is take the min and max heaps size settings right out of the config - Don't define any!! This way the jvm will automatically allocate the proper amount of memory. This has worked very successfully for us.
Which jdk you use can make a huge difference as well. Sun's jdk (1.4 through 1.6) has been known to core dump if you're running a significant load and you're servers have large app contexts or classpaths. The easiest solution to this is switch to BEA's jrockit jdk. If you don't want to do that, you'll have to mess around with the -XXSetMaxPerm=256m jdk param to make things happy. While this setting can help, it can be pretty finicky. Increasing the value won't always work - sometimes decreasing it (128m) helps.
6. IO Management. Depending on how you're managing your sessions and/or clustering, you may find that some of the server's IO setting might need to be modified. File handles are a good case in point. I've run into situations before where we've had to increase the allowed open file handles (using ulimit -n 4096) in order for our tomcat servers to run efficiently. Ulimit -a will give you what your currently system settings are.
7. ModJK settings. When configuring an Apache web server to communicate with a tomcat server using Modjk and the AJP connector, be aware of how many connections you're allowing through the Apache server vs. the configuration for the AJP connector in Tomcat's server.xml file. If you do not set the 'maxProcessors' attribute (which is deprecated in 5.5.x) or the 'maxThreads' attribute in the AJP connector element of the server.xml file, it defaults to 200 connections. If your Apache web server is set to 256 threads, there's potential for thread blocking/locking to happen on your tomcat box. Sure signs of this are log entries in the Catalina.out file that say something like 'max threads (200) reached. Increase maxThreads size..', or if performing a netstat -anp | grep 8009 shows numerous connections with a SYN_SEND or SYN_RECIEVED state.
8. Favicon issue. In case you aren't sure what favicon is, it's the small little picture (icon) that shows up on some sites on the left hand side of the address bar. It is also displayed in your 'favorites' or you 'bookmarks' depending on the browser you're using. It turns out that favicons are a little buggy in IE, and seem to work better in Firefox. However...
It turns out that favicon is requested by the browser and can, in some cases, overwrite the cookie for your website with a new cookie that makes the user loose his session. We've only seen this on logins (because after that the favicon cookie is cached and doesn't require a new one). It was a pretty big issue for one of our customers, though. I should also mention that we have only seen this behaviour when we're using apache rewrite rules to right a particular tomcat webapp context to something other than /ROOT.
We discovered this was the issue by inspecting all the requests from the browser for the login page using a tool called Http Analyzer. Another popular, free network protocol analyzer we could have used is Ethereal. By inspecting the cookies we saw that favicon's cookie was actually different than the cookie for the login page itself. That was also the cookie that was getting to the app server. To get around this, there is a attribute in the AJP connector element (mod_jk) in Tomcat's server.xml file called emptySessionPath which you can set to true. This sets all paths for session cookies to /. However the tomcat documentation warns that it can greatly affect performance. We have yet to test this under load. I'll let you know how that goes soon.
9. The three finger salute. For those of you who don't know, performing the three finger salute (Ctrl-Alt-Delete) on Unix/Linux boxes configured a certain way will reboot the box. I have seen a production server get rebooted by an ISP because this command was not disabled. You need to have root access on the box to be able to perform this command, and you also need to have root access to disable it. To disable it on an RHEL3.0 server, comment out the following line in the /etc/inittab file: 'ca::ctrlaltdel:/sbin/shutdown -t3 -r now'
10. Decompiling Classes or viewing Source Code. Sometimes it's very handy to be able to decompile java classes that you've received in jars from other companies - not for malicious purposes, but so you can understand how the code works. Jad is a free command line java decompiler that I've used in the past. With open source projects, it's just a matter of downloading the source code and poking around. I've seen situations with the Jboss server, for example, where the documentation says one thing, support is saying another, and when you poke into the code yourself, you actually get a third (true) picture of what is going on.
11. Shutting down a hurting server. One little trick we've found very helpful when we've got a server that's performing erratically is to run a kill -3 'serverPID' before we shut it down. On Tomcat servers, this does a thread dump in the Catalina.out log. While this thread dump is sometimes a lot to digest, it does provide valuable information about errors/exceptions the server is running into at a lower level. Lately, this trick has been helping us debug class loader locking issues we've been experiencing.
I hope you enjoyed this little overview. More to come in the future. If you have questions or comments, please send them to me @ perry.mckenzie@netfocusconsulting.com.