PDA

View Full Version : Grid System Administration best practices


jonathan
06-04-2009, 04:38 AM
Hello all,

Does anyone know of a single Grid system administration check-list or best-practices document that I can build on to make a definitive System Administration manual?

I am thinking about a standard operations schedule (daily, weekly, fortnightly, monthly, quarterly, annually) that would:

* List best defaults for installations (e.g. setting up dashboard alerts to email)

* List mandatory sys admin tasks for safe grid operation (e.g. checking for hotfixes, log rotation, disks checks etc)

* List recommended option best practices or activities (e.g. benchmarking, physical inspections etc).

I would like to apply some Paretto principles to it too by including preventative maintenance/checks for the most common or serious problems and their causes typically encountered on grids.

I would also like to add a section on hardware acceptance testing best practices and performance benchmarking.

All pointers, suggested list items and advice gratefully received

JD

PeterNic
06-06-2009, 01:52 PM
Jonathan,

This is a great idea. I wouldn't mind offering an area where it can coalesce, especially if you wouldn't mind acting as an editor.

We can contribute to some of the topics -- we have various aspects of this (e.g., best defaults for installation), we can probably get some of the 80/20 rule problems to watch for based on the most frequent support requests we get. We can also invite most of our service and enterprise AppLogic operators to contribute, so that we can exchange best practices.

The current set of documents that provide aspects of these procedures is located in the wiki under /bin/view/AppLogic24/ServiceProvidersHome (this is accessible only to AppLogic licensees).

(I am also moving this thread to grid maintenance)

Best regards,
-- Peter

jonathan
06-08-2009, 05:54 AM
This is a great idea. I wouldn't mind offering an area where it can coalesce, especially if you wouldn't mind acting as an editor.

Hi Peter,

I would be delighted to act as editor. I have access to the resources you linked to (we are, of course, partners) so I will get started there.

Thanks for the feedback.

Kind regards,

Jonathan

andybrucenet
01-23-2012, 04:18 AM
Hi Jonathan and Peter,

This is a great idea and one I am looking for myself (best-practices). Is this list posted anywhere? Or did I miss something obvious in the documentation?

Thanks,

Andy

PeterNic
01-24-2012, 11:56 AM
Hi Andy,

I don't believe we closed the loop on this with Jonathan. At this time, the BFC docs are it - there is a FAQ and similar. Since the time of the original post, the most important periodic check - volumes needing repair - has been automated (AppLogic will automatically detect degraded volumes and initiate repair).

I would also suggest checking for hotfixes on a weekly basis (and/or subscribing to the Announcements, as well as the Notices and Alerts forums here) - note that BFC can automatically check for hotfixes for AppLogic but you still need to manually check for hotfixes for BFC itself.

Best regards,
- Peter

PeterNic
01-26-2012, 09:35 AM
Here is a document that can get us started:

http://workbench.cloudcommons.com/docman/view.php/535/78/CA+AppLogic+3.0+Grid+Installation+Companion.pdf