PDA

View Full Version : Volume repair command fails


Erika
08-26-2008, 09:45 AM
I tried to repair volumes on an AppLogic grid using

volume repair --all

But volume repair fails with the message "no servers available for the mirrors"

I checked whether the servers were up using the following commands:

server info srv1
server info srv2

Both servers were up. Next, I checked the available disk space in each server:

For srv1:
total 947662.07 MB
reserved 10240 MB
free 848958.11 MB

For srv2:
total 63988 MB
reserved 10240 MB
free 6.42 MB

I double-checked the available disk space in server 2 with df -h and saw that there was at least a gigabyte of free space in each partition. I don't know why the new numbers differed from the previous numbers:

From srv2:
df -h

Size Used Avail Use% Mounted on
63G 48G 12G 81% /var/applogic

Then I checked the connections between the servers.
From the provisioning server, I could ping srv1 and srv2. I could also ssh into srv1 and srv2.
From the controller, I could also ping srv1 and srv2. However, I could not ssh from the controller into srv2: "Permission denied (public key)"

I still don't know why the volume repair command is failing, and why the disk sizes don't match. Does anyone have any ideas?

Erika
08-27-2008, 12:07 PM
I found out an orphan volume in one of the servers. So I ran the following sequence of commands:

volume clean

followed by

volume repair --all

And now the volumes are being repaired !

Motto: Always clean before repairing :)

PeterNic
08-27-2008, 10:23 PM
Erika,

It appears that there was an orphan volume (i.e., a volume stream that is not owned by any volume or application) which was filling up the available disk space on srv2, and since yours is only a 2-server grid, there was no room to create the second mirror (as indicated by the error message on 'vol repair').

I will be happy to discuss the apparent discrepancy: Essentially, the issue is between the sparse block allocation technology used by AppLogic internally and the "guaranteed" disk space displayed by the grid CLI. In fact, you should have not even looked directly on the server (normal users can't; grid maintainers shouldn't use their maintainer account for non-aldo work).

The practical solution you found works -- you could have also seen the orphan volume with "vol check". I am glad you were able to figure it out and thank you for sharing the solution! (we will add the "volume check/volume clean" suggestion to the documentation)

-- Peter

JosephD
09-07-2008, 08:08 AM
Erika,

I am posting to find out if all is well with this process? Did the vol repair complete correctly? Also do you have any other questions about the repair process or any other volume maintenance?

Joseph
3tera

JosephD
09-07-2008, 08:15 AM
Erika,

If you have had any issues feel free to update me via a PM here or email

Joseph@3tera.com


Joseph
3tera