PDA

View Full Version : Automating Server Maintenance


bkonia
12-05-2008, 03:31 PM
CentOS comes preinstalled with a number of utilities, such as logwatch and smartd, that help to simplify and automate server maintenance. My feeling is that utilities such as these would be very useful in terms of maintaining the servers that comprise an AppLogic grid. For example, I would like to receive an email notification in the event of a SMART error. However, while smartd is enabled and running on the server, sendmail is not installed. In the case of logwatch, that package is not installed at all. It seems like the CentOS installed by AppLogic is the bare minimum and lacks many of these utilities.

I realize that I could go ahead and install sendmail, logwatch, etc... but I'm hesitant to do so because of the stern warning issued when you login to an AppLogic server. It basically says that modifying anything on the server is a violation of the license agreement. I understand and agree with the logic behind this policy, but it's frustrating to not have access to well-established Linux utilities that make server administration easier and more automated.

I also realize that AppLogic has its own email notification system configurable by Aldo, but it appears that the notifications are currently limited to grid-level events and do not include server-level events such as smartd notifications, etc... Therefore, my question is:

1. Are there any plans to expand the scope of the AppLogic notification system to include server-level events?

2. In the meantime, could we agree upon an approved list of packages that can be installed via Yum, that would not violate the license agreement?

PeterNic
12-05-2008, 10:22 PM
bkonia,

We are absolutely planning to add hardware monitoring tools in the system. Smartd actually doesn't always run (there were some issues in FC3 and CentOS4, not sure they are fixed in CentOS5); we hope sendmail never will.

Many daemons were removed in order to (a) conserve memory, (b) reduce the arbitrary-time CPU and I/O load (and disk fillers), (c) remove parasitic interactions, and (d) improve security. The server under AppLogic is essentially an embedded system that just happens to run Linux as its embedded OS; it is not a regular Linux server by any means. Due to the heavy I/O load, tight memory, running even seemingly innocent things can cause severe problems -- from crashes to severe performance degradation. I will not get in details, but our support has some scars to show :)

On the question you didn't ask: do we have plans to add hardware monitoring, the answer is absolutely yes. Smartd is the primary one we intend to use (with a grain of salt). Looking for other tools as well.

On your first question: server-level events. Those already exist. A server can report problems back to the controller, and the controller will route these notifications through the normal grid exception channel (dashboard, e-mail, and others that may get added in the future). (just FYI, appliances also can submit exceptions)

If you have a good way to notice the smartd problem report (preferably without logwatchd), we will be happy to work with you to prototype the solution and create a supported hotfix. Also, if you have suggestions for other health & sanity monitoring tools, please let me know (here or in PM).

Best regards,
-- Peter

bkonia
12-06-2008, 01:06 PM
Hi Peter,

Ideally, I would like to receive an email notification when a smartd event is triggered, as well as having the notification appear in the dashboard. I would like this to be configurable so that you can define which events would trigger a notification. Perhaps you could have a section in the GUI where the user could set various parameters that would be passed through to the smartd configuration file, similar to the way you can enter properties for an appliance.

jonathan
09-10-2009, 01:52 AM
Hi Peter and bkonia,

Did you make any progress on this issue of having smartd events parsed to the dashboard?

We would also like to automate disk monitoring via the grid.

JD

PeterNic
09-10-2009, 11:35 AM
Jonathan, Bkonia --

Good news -- the smartd monitoring is integrated in our early fault detection system in AppLogic 2.7. bkonia, thanks for the great suggestions.

So far, even during the testing of the 2.7 beta release, it correctly caught and predicted a few hard disk failures. We're also testing it with various bad drives we come across, and so far it correctly detects the errors.

The response is posting an alert to the dashboard, which when configured for e-mail notifications, will also sent you an immediate e-mail notification. At this time we don't automatically move mirrors around and/or shutdown the server -- we would like to get a bit more experience with it.

It has been great also for diagnosing problems on customer grids -- you just go in there and look at the dashboard -- many problems became much easier to track down.

In addition to the disk fault detection, we have also added about 11 different health checks that detect and in many cases recover from some more typical problems.

Best regards,
-- Peter