PDA

View Full Version : cPanel issues


Yazan
12-05-2008, 06:59 AM
There have been a number of issues with a cPanel customer and I'm trying to figure out what the issue is so I'm hoping somebody will be able to help.

About a month ago, the client wanted to have their server storage increased so we performed the resize and after waiting 2.5 hours to complete, we get the server coming back up with the same amount of space. Any clue what's going on there? We resorted to adding a second HD to their server which seemed to do the trick.

Two days ago, the server just locked up out of nowhere. Now I understand a lock-up occurs when a journal error occurs, but why there was a journal error in the first place is still a mystery to me (maybe same reason why it didn't resize). This forced all the WHM files to become read-only files; cPanel techs said that this was due to file corruption or HW failure and it clearly wasn't hardware failure.
WHM is not loading because the File-System is set to "read only". Exact statement was:


[root@www1 ~]# touch /usr/local/cpanel/testfile

touch: cannot touch `/usr/local/cpanel/testfile': Read-only file system


This could likely be caused by drive errors or file system corruption, you
will need to have your Datacenter run a full FSCK in single user mode to
properly test the file system.

Last few messages in the log were:
Dec 4 00:58:32 www1 pure-ftpd: (?@127.0.0.1) [INFO] New connection from
127.0.0.1

Dec 4 00:58:32 www1 pure-ftpd: (?@127.0.0.1) [INFO] Logout.

Dec 4 01:06:58 www1 pure-ftpd: (?@127.0.0.1) [INFO] New connection from
127.0.0.1

Dec 4 01:06:58 www1 pure-ftpd: (?@127.0.0.1) [INFO] Logout.

After a reboot the server came back online, but FTP was still down:
ftpd failed @ Thu Dec 4 10:10:32 2008. A restart was attempted
automagically.

Failure Reason: Unable to connect to port 21

cpsrvd failed @ Thu Dec 4 10:10:32 2008. A restart was attempted
automagically.


Failure Reason: Unable to connect to port 2086

This morning went down again, I believe it went into read-only mode again. It didn't automatically restart it or anything, it sat there. I rebooted it and it's up and running now, but I can't keep going like this; there must be an issue that can be solved.

Has anybody had an issue like this?

Thanks for the help.

PeterNic
12-05-2008, 10:40 AM
Yazan,

It appears that the filesystem on the volume is corrupted. From these messages, it is not possible to find out how badly.

Your first indication of a possible problem was the resize -- if a file system is resized, you should see the new size. If not, please contact our support for assistance.

The failures to start WHM, the ftp server, etc., are all results of the corrupted file systems -- if/when this is repaired, they should start OK. In typical installs, Linux cannot boot from a read-only file system (and when Linux detects a corrupted filesystem, it automatically makes it read-only in order to prevent further corruption). Once it gets to this mode, reboots are not going to fix it (and even if they do, they'll make it worse) -- this need to be given proper attention right away.

Several approaches (for all of them, I recommend first making a block-level copy of the volume, so you have a backup):

- if this is a 2.3.9+ grid, you can stop the appliance, and do "vol fscheck" or "vol fsrepair"

- if this is a 2.1.1/2.2.2 grid and you have maintainer access, mount the volume at block level on the controller and perform "fsck /dev/XXX", where XXX is what the volume got mounted as

- create a new application instance from the GSC template, with a volume of at least the same size; assign the same property values as the broken one. Edit the GSC class to add a placeholder volume called "old" (at /dev/hda3, most likely). Then copy the boot volume from the broken application into a volume called "old" in the new application and assign the "old" volume of the app to the instance's placeholder. This way, when the new GSC instance boots up, you will have both the new (clean but empty) volume and the old (with data but corrupted) volume -- you can then mount the old volume read-only and copy files off it to the new volume.

These are just a few suggestions I can give you without having looked at the app.

If you don't have a recent backup or if the volume is badly corrupted, you may be down for a while and even lose data. If would like our help with this, please open a ticket at our helpdesk.


Best regards,
-- Peter

Yazan
12-05-2008, 10:50 AM
Thanks for the response Peter. I'll try what you suggested and see how well that works. If that doesn't work then I think I'll hand it over to you guys.

Regards,

Yazan

Yazan
12-05-2008, 11:00 AM
Could Rsnapshot backups have caused this to happen?

PeterNic
12-05-2008, 10:07 PM
No, I don't think so -- rsnapshot, as far as I know, works above the filesystem level, so it cannot cause corruption of the file system.

My guess is that it got corrupted either because the appliance was shutdown uncleanly (e.g., if very loaded and did not shutdown within the shutdown timeout); sometimes the filesystem may get corrupted if Linux in the appliance runs under heavy I/O load while nearly out-of-memory.

Regards,
-- Peter