PDA

View Full Version : GSC App fail to start in timeout period


kapow
01-19-2008, 07:29 PM
I provisioned a simple GSC app and that worked fine. I yum installed httpd. Stopped that app and now the app won't start. It just sits there updating the percentage but dies after the 10 minute timeout.

I can ssh to the app just fine after about 20 seconds, so I know it's running. I see nothing strange in /var/log/messages. And, I see nothing in the grid log except for the timeout notice.

Any clues?

Or perhaps I'm taking the wrong approach. I just want a singleton to server as a small TikiWiki box. The WEB5 appliance has external db connections, but I want it all in one.

PeterNic
01-19-2008, 09:38 PM
To debug a non-starting appliance, see http://doc.3tera.net/AppLogic2/UserTroubleShoot.html#AnchorApplianceStart. The text there is a bit out-of-date (we'll fix that).

The simple approach:
- start the app with --debug flag on 'app start' command; this will make sure that AppLogic will leave the appliance in start-failed state after the 10 minute timeout, rather than stopping the appliance while you are troubleshooting.
- if you can ssh into the appliance, then things are good -- get in and trace the start sequence; at some point the vma agent needs to be loaded; at boot-complete, vme should be executed with started_ok (various scripts for this reside in /appliance).

I think something must have happened and the vme does not get invoked with started_ok even, and AppLogic thinks that your appliance is stuck somewhere during boot.

If you want things all-in-one and you don't plan to backup the whole appliance, or instantiate it or migrate it, then probably GSC is good enough. As an alternative, you can branch the IN appliance, remove its out terminal and the iptables configration script (eviscerate the NAT configuration and simplify iptables to the firewall settings you need); add a r/w content volume for the tikiwiki content (if you plan to update the tikiwiki often, keep its files on the content volume; if not, leave only the tikiwiki user data on the content volume and keep the tikiwiki "code" on the /usr volume; this way you will get a "tikiwiki" catalog class appliance). If you decide to go the non-GSC route, let me know and we can discuss further here.

Let me know if the GSC started OK.

(A caution: don't leave apps to run with "--debug" and keep them this way in production; this will give you heartburn in many cases, including grid restarts -- and it is simply dirty. If you want your GSC to be considered started before all the services start, simply move the vme event started_ok earlier in the init sequence (e.g., immediately after vmad is loaded)).

Regards,
-- Peter

kapow
01-20-2008, 04:09 PM
vma and ccad are running, but it appears the vme command is never executed. I executed the vme id=started_ok manually and the grid thinks it started ok.

looking at /var/log/messages, nothing traumatic is happening or hung there. The last thing started was VMA and it said that started fine.

kapow
01-20-2008, 04:32 PM
I think I'll just go with the LAMP template. The system doesn't need to be that minimal. Thanks.

PeterNic
01-20-2008, 08:19 PM
You are welcome. If you can use the Lamp template, so much the better.

If you are curious, it might be interesting to see what killed the invocation of vme started_ok when you installed httpd (did it overwrite rc.local? is it not letting the init process continue past to order 99? do you get strange errors on shutdown?).

-- Peter