PDA

View Full Version : Appliances fail to start intermittently (kernel oops)


PeterNic
03-14-2007, 12:45 PM
The appliances included in the 1.2.10/1.2.11 releases of AppLogic may fail to start intermittently (one out of 10 attempts), causing application start to fail. This problem appears more often on faster servers. Retrying the 'app start' makes the application start successfully.

The problem, SCR 1355: appliances occasionally crash during appliance boot causing the appliance start to fail, is that appliances included in those versions of AppLogic resize the kernel's network buffers inside the appliance in order to allow for higher network throughput. There is a race condition in the kernel code during this operation which can cause the appliance kernel to crash. In AppLogic 1.2.14b, we moved the network buffers configuration from the appliance script to the network startup script (prior to allowing any traffic) and this avoids the kernel race condition; appliances now start always.

However, any appliances that have been branched off 1.2.10/1.2.11 catalogs will still have this problem. Watch this thread for a post of what simple change is needed to fix such appliances and make them not fail.

(Note: there may be other reasons appliances would fail to start; what is described here is a fix for the particular problem of kernel oops during rx buffer config at appliance start. If you see other start-failure, please post it as a new thread.)

PeterNic
03-14-2007, 04:42 PM
Here is the procedure for updating custom appliances to avoid the kernel driver bug that causes an appliance to occasionally crash during boot. This change needs to be applied only to appliances branched from the system or proto catalogs of AppLogic 1.2.10 and 1.2.11 (or any appliances that set the rxbuf_min buffers from inside applogic_appliance).

Steps:

Remove the following lines from "/etc/init.d/applogic_appliance" (this
applies to any appliance branched from one of the system/proto catalog
appliances):

# set receive buffer size
if [ -d /proc/xen/net ] ; then
for i in /proc/xen/net/eth* ; do echo 256 >$i/rxbuf_min ; done
fi


Add the above lines to the end of the "/etc/sysconfig/network" file.


Note: In system:HLB, rxbuf is set in "/etc/init.d/pound".

Note: proto:INWEB and proto:PS8 currently do not modify rxbuf; you can add the above snippet at the end of their /etc/sysconfig/network script.