digerata
02-10-2009, 08:44 AM
We are currently stress testing a simple application that consists of essentially:
INSSL -> WEB5 -> NET
(There is also a monitor and a second INSSL for the monitor)
The WEB5 is the branched WEB5 + extra PHP I spoke of in other posts. I am running 2.4.7 e2677.
We are using LoadRunner from Mercury Interactive to push as many requests to the website on this application as possible to see its breaking point.
With all sorts of apache tuning and resource tuning, the most active requests I can get the application to handle is 127. (Give or take a couple) What happens is we are sending 250 virtual users at the application. They make requests between 5 seconds and 60 seconds randomly. When the Active Requests number reaches the 120s, the whole thing comes to a screeching halt.
I'm not asking for help in tuning apache here. What I'm asking for is help in eliminating the grid and the appliances from the equation.
You see, no matter how much bandwidth I add to INSSL and WEB5, that number above (127) never changes. What happens when we get to that point is that the apache counters in the monitor application stop refreshing. (Because the server-status GET request never returns...) Additionally, the application stops serving all requests. The lack of response is from multiple locations that attempt to connect, not just the location of the test driver. So the server definitely stops serving new requests. The CPU of WEB5 is hovering at 1%. The RAM of WEB5 is at around 60% utilization. Disk I/O against the content volume is well within normal usage ( far south of 1 Mb per second). The whole website will fit into memory. There is no PHP being used for this test. It is all static HTML files and flash content. (We are just trying to get a baseline of performance before kicking on the PHP site)
Here is a screen shot of the monitor when the application stops serving new requests:
http://flowz.com/screens/pz-load-test.png
That sounds like a classic case of MaxClients and ServerLimit misconfiguration, right? Well, the appliance has them set already to 256. Just to be on the safe side I upped that to 512 and then, later, 1024. (I verified in our error log that apache did not complain about requiring a recompile to take >256 values. But regardless, we are dying well before 256) This had no affect on further tests, everything died at 127. I've dug into just about every performance setting and there is no change, we always die at 127. Here are the main ones:
(Using Prefork MPM and adjusting that config, not worker MPM config)
StartServers 32
MinSpareServers 20
MaxSpareServers 80
ServerLimit 1024
MaxClients 1024
MaxKeepAliveRequests 500
KeepAlive on
KeepAliveTimeout 5
I have tried upping the RAM on WEB5 to 2 GB and the RAM on INSSL to 256MB, no change in the results. Additionally, I assigned .5 CPU to INSSL and 2 CPU to WEB5, no change in results.
I almost forgot to mention, the outbound bandwidth of the INSSL is always hovering at a constant 200KB per second from the time we start the test until the end, no matter how many users we throw at it. (Of course, this could be limited from our testing end. We are not testing from the grid and realize the inaccuracy of the test there. However, we aren't looking at page load times.)
We have tried our normal tuning ideas and every suggestion we could find on Google. Nothing works. The really interesting thing is that we keep dying at the same number of Active Requests regardless of our apache directives.
Is there something to do with the network in our grid that is limiting things? Is there any limitation with the WEB5 appliance that I've overlooked? Does anyone have any ideas? Any input is so greatly appreciated!
-Mike
INSSL -> WEB5 -> NET
(There is also a monitor and a second INSSL for the monitor)
The WEB5 is the branched WEB5 + extra PHP I spoke of in other posts. I am running 2.4.7 e2677.
We are using LoadRunner from Mercury Interactive to push as many requests to the website on this application as possible to see its breaking point.
With all sorts of apache tuning and resource tuning, the most active requests I can get the application to handle is 127. (Give or take a couple) What happens is we are sending 250 virtual users at the application. They make requests between 5 seconds and 60 seconds randomly. When the Active Requests number reaches the 120s, the whole thing comes to a screeching halt.
I'm not asking for help in tuning apache here. What I'm asking for is help in eliminating the grid and the appliances from the equation.
You see, no matter how much bandwidth I add to INSSL and WEB5, that number above (127) never changes. What happens when we get to that point is that the apache counters in the monitor application stop refreshing. (Because the server-status GET request never returns...) Additionally, the application stops serving all requests. The lack of response is from multiple locations that attempt to connect, not just the location of the test driver. So the server definitely stops serving new requests. The CPU of WEB5 is hovering at 1%. The RAM of WEB5 is at around 60% utilization. Disk I/O against the content volume is well within normal usage ( far south of 1 Mb per second). The whole website will fit into memory. There is no PHP being used for this test. It is all static HTML files and flash content. (We are just trying to get a baseline of performance before kicking on the PHP site)
Here is a screen shot of the monitor when the application stops serving new requests:
http://flowz.com/screens/pz-load-test.png
That sounds like a classic case of MaxClients and ServerLimit misconfiguration, right? Well, the appliance has them set already to 256. Just to be on the safe side I upped that to 512 and then, later, 1024. (I verified in our error log that apache did not complain about requiring a recompile to take >256 values. But regardless, we are dying well before 256) This had no affect on further tests, everything died at 127. I've dug into just about every performance setting and there is no change, we always die at 127. Here are the main ones:
(Using Prefork MPM and adjusting that config, not worker MPM config)
StartServers 32
MinSpareServers 20
MaxSpareServers 80
ServerLimit 1024
MaxClients 1024
MaxKeepAliveRequests 500
KeepAlive on
KeepAliveTimeout 5
I have tried upping the RAM on WEB5 to 2 GB and the RAM on INSSL to 256MB, no change in the results. Additionally, I assigned .5 CPU to INSSL and 2 CPU to WEB5, no change in results.
I almost forgot to mention, the outbound bandwidth of the INSSL is always hovering at a constant 200KB per second from the time we start the test until the end, no matter how many users we throw at it. (Of course, this could be limited from our testing end. We are not testing from the grid and realize the inaccuracy of the test there. However, we aren't looking at page load times.)
We have tried our normal tuning ideas and every suggestion we could find on Google. Nothing works. The really interesting thing is that we keep dying at the same number of Active Requests regardless of our apache directives.
Is there something to do with the network in our grid that is limiting things? Is there any limitation with the WEB5 appliance that I've overlooked? Does anyone have any ideas? Any input is so greatly appreciated!
-Mike