PeterNic
03-14-2007, 12:45 PM
The appliances included in the 1.2.10/1.2.11 releases of AppLogic may fail to start intermittently (one out of 10 attempts), causing application start to fail. This problem appears more often on faster servers. Retrying the 'app start' makes the application start successfully.
The problem, SCR 1355: appliances occasionally crash during appliance boot causing the appliance start to fail, is that appliances included in those versions of AppLogic resize the kernel's network buffers inside the appliance in order to allow for higher network throughput. There is a race condition in the kernel code during this operation which can cause the appliance kernel to crash. In AppLogic 1.2.14b, we moved the network buffers configuration from the appliance script to the network startup script (prior to allowing any traffic) and this avoids the kernel race condition; appliances now start always.
However, any appliances that have been branched off 1.2.10/1.2.11 catalogs will still have this problem. Watch this thread for a post of what simple change is needed to fix such appliances and make them not fail.
(Note: there may be other reasons appliances would fail to start; what is described here is a fix for the particular problem of kernel oops during rx buffer config at appliance start. If you see other start-failure, please post it as a new thread.)
The problem, SCR 1355: appliances occasionally crash during appliance boot causing the appliance start to fail, is that appliances included in those versions of AppLogic resize the kernel's network buffers inside the appliance in order to allow for higher network throughput. There is a race condition in the kernel code during this operation which can cause the appliance kernel to crash. In AppLogic 1.2.14b, we moved the network buffers configuration from the appliance script to the network startup script (prior to allowing any traffic) and this avoids the kernel race condition; appliances now start always.
However, any appliances that have been branched off 1.2.10/1.2.11 catalogs will still have this problem. Watch this thread for a post of what simple change is needed to fix such appliances and make them not fail.
(Note: there may be other reasons appliances would fail to start; what is described here is a fix for the particular problem of kernel oops during rx buffer config at appliance start. If you see other start-failure, please post it as a new thread.)