Aarrgghh!!

ColdFusion 8 Monitoring Heisenberg Errors

I ran into my first inexplicable crash that I eventually traced back to the ColdFusion Server Monitor. Now first off, this isn't a problem or bug with the Server Monitor. This is to be expected. The server Monitor adds overhead to requests, and if you have an intense process, it's going to generate a lot of monitoring data. It's possible that you might reach its limit.

I just wanted to let people know what a crash caused by the monitoring service looks like, because it doesn't give you a message that "You have left the monitoring service on in production!"

I had a long running complicated process crashing on my local workstation. It did work on our communal development server. So it wasn't just the process itself. I thought maybe it was that my laptop wasn't a server class machine. But actually, the virtual machine that we are testing on wasn't tremendously more powerful.

The browser session would error out with a message that said:

500

Java heap space

java.lang.OutOfMemoryError: Java heap space

After digging in the JRun logs for awhile I found this:

javax.servlet.ServletException: ROOT CAUSE:

java.lang.OutOfMemoryError: Java heap space

at coldfusion.monitor.event.MonitoringServletFilter. doFilter(MonitoringServletFilter.java:70)
at coldfusion.bootstrap.BootstrapFilter.doFilter(BootstrapFilter.java:46)
at jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
at jrun.servlet.FilterChain.service(FilterChain.java:101)
at jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106)
at jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42)
at jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:284)
at jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543)
at jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:203)
at jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:320)
at jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
at jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:266)
at jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)

java.lang.OutOfMemoryError: GC overhead limit exceeded

Of course I didn't bother actually reading this error until just now when I copied and pasted it. It clearly indicates that the problem is in the Monitoring Servlet Filter. In any case, after much trial and error, I turned off memory tracking and then turned off profiling. Once I turned off profiling the error went away.


June 24, 2007 Posted by Terrence Ryan at 6:51 PM

ColdFusion, Web Development,



Comments

The fact that the OutOfMemoryError is thrown from the MonitoringServletFilter does not mean that monitoring is the root cause. The MonitoringServletFilter is the "perimeter" of the monitoring system - when exceptions are thrown from within CF, they're caught there, logged by monitoring, and rethrown. I would suggest you trace down the logs some more, and you'll probably find an entry beneath the one for this exception indicating the root cause exception. And do keep in mind that OutOfMemoryErrors occur when, well, the JVM is out of memory - is there any possibility that your application is creating objects, and not throwing them away, eating all the JVM memory? Also, as we've noted before, do not run production systems with Memory Tracking on - that can quickly bring a server to its knees. If neither of these is a potential root cause, do drop me a mail with more details, and we'll look into it ASAP.


Posted by: Ashwin at June 25, 2007 1:01 AM

Aarrgghh Guy! Actually, Ashwin, creating many, many objects and holding them over the course of one request was EXACTLY what I was doing. But with profiling and memory monitoring turned off, I was giving myself more rope?

In any case, my goal here wasn't to snipe at CF monitoring. It was to point out what it looks like if you're doing something crazy that pushes monitoring to the point where it breaks.




Posted by: Terrence Ryan at June 25, 2007 2:21 AM

Yep, definitely plenty of rope there! ;) Going by our testing, profiling is safe to use in production, but as I noted, memory tracking could kill a server, especially if it's creating too many objects. Try your test with memory tracking turned off, and let us know what happens. I didn't at all mean to suggest that you were sniping at CF monitoring - just providing the background so you know why the stacktrace for the error looks the way it does.


Posted by: Ashwin at June 25, 2007 2:35 AM

Aarrgghh Guy! I definitely tried it with memory tracking turned off, and profiling turned on. It still crashed.

What can I say? I was doing weird stuff.




Posted by: Terrence Ryan at June 25, 2007 2:44 AM

How exactly do you turn off memory tracking and profiling, I have this exact problem on a clustered pair of 2x Servers with 8GB of RAM each :(


Posted by: Mike Faulkner at January 4, 2008 7:20 PM

Aarrgghh Guy! In the CF administrator: Go to Server Monitoring Launch Server Monitor Up at the top there should be 3 options that say Stop Monitoring, Stop Profiling, Stop Memory Tracking. Turn them off.

However, these are turned off by default and if you have never turned them on this error could be caused by something else per Ashwin's comment earlier in this thread.




Posted by: Terrence Ryan at January 7, 2008 10:18 AM

Thanks for posting this. I have now found warnings to this effect buried in the user documentation, but it seems to me incumbent on Adobe to post this warning in big red letters on the Server Monitor screen so it's clear to everyone that it should not be kept running on a production server. At CFUnited in June, there were a lot of Adobe people generating excitement about the Server Monitor, but no mention of its dangers. Of what use, exactly, is the Server Monitor if it can't run on production? This is a blow to my confidence in Adobe products.


Posted by: Rebecca Younes at January 8, 2008 10:16 AM

Aarrgghh Guy! Well there are a lot of things you can do in the CFadministrator to really screw up the server. None of them get the same treatment. I think that Adobe acts responsibly here in that they don't install CF with the monitoring running.

I do think mention of these dangers should be included in future documentation, and guides to setting up ColdFusion, but in reality the load burden of monitoring only comes up on heavily trafficked sites or in the case I discuss above, very complex sites.




Posted by: Terrence Ryan at January 14, 2008 11:58 AM

We had a similar problem with Fusebox applications on our servers. We finally did several thread dumps and determined that there were locking issues. Turning off all server monitoring functions cleared the problems instantly.


Posted by: Rob Commarota at May 20, 2008 11:29 AM

If you have an object that has a lot of objects created in its variables scope, you may want to try this.

After you are done with that object, clean up so it can be garbage collected such as:

structDelete( variables, "objOrder" );

This combined with turning off the monitoring as listed above solved our problem.


Posted by: Ryan Duckworth at August 28, 2008 6:17 PM

If you're looking for a server monitor which you can run on ColdFusion production environments - you may want to check out FusionReactor. This is compatible with CF 6,7 and 8


Posted by: David at February 15, 2009 8:30 AM

FusionReactor is great, but it lacks the memory tracking which is very useful if you have problems with all your memory being used up. At least I thought it was until I discovered that feature actually causes memory problems itself, so now I am back at square one for finding the cause of my problem.


Posted by: Russ at March 9, 2009 2:56 PM

Hi, How soon will you update your blog? I'm interested in reading some more information on this issue.


Posted by: memoire pc at September 26, 2009 5:54 AM

Posted by Who at February 9, 2012 5:52 PM

Post a comment











Remember personal info?