View Full Version : Building custom Linux distro applications
mwadham
07-03-2007, 02:43 AM
Hi,
I am interested in building a custom Linux distro application within AppLogic. It seems this is possible, but I can't seem to find any documentation on it. On the 3tera site there's a presentation on this, but it only showed on 1st April 07 and isn't available for online viewing.
If anyone could give me some clues as to how to achieve this i'd be very grateful. I would imagine its just a case of building an image with Ubuntu or Debian or Gentoo or whatever, installing the necessary software so that it can talk to the controller and then importing the image into a generic Lux application within AppLogic.
I just need to know how to start building the image, what format it needs to be in and how to import it into an AppLogic appliance.
Thanks!
Mark
There are a few internal documentations. But I'm not sure it's up to date.
So, I'll check with other guys and come back to you.
What is the version of Applogic you are using?
Can you check with 3t> info grid (assuming you have the access to the grid controler)
Nobu
mwadham
07-03-2007, 08:41 AM
Hi Nobu,
Thanks for coming back to me. We are running version 1.2.14.
Thanks,
Mark
PeterNic
07-04-2007, 10:33 AM
Mark, we are preparing a step-by-step instruction for doing that; it is not completely done/verified but if you don't mind taking it early, I'll post it here later today.
mwadham
07-04-2007, 10:58 AM
Hi PeterNic,
That would be awesome, thanks very much!
Kind Regards,
Mark
PeterNic
07-05-2007, 12:40 PM
Mark,
Here is a very rough cut. You will need to modify it slightly in the area of network configuration file cleanup depending on the distro (the network devices in Ubuntu, Debian and others are configured differently from Redhat-compatibles). Please post here with any questions; all operations below can and should be executed as a regular AppLogic user (not as a maintainer).
Roughly, there are 5 big steps:
- Step I: getting the appliance-specific files required on all appliances by AppLogic. (We will post ready made tar file, but for now you can create it yourself)
- Step II: creating a volume image file of the new distro; requires the new distro to be installed on a physical server to which you have access
- Step III: importing the volume image file onto a grid, as part of a simple application on which to test the new image
- Step IV: diagnosing/fixing boot-time problems with the new distro
- Step V: clean up and prepare the appliance for catalog
Step I - get appliance-specific files (for AppLogic 2.0.2, there is a ready .tar file, inquire here; don't have it for 1.2 yet)
0. Provision an instance of GSC. Grab the /etc/init.d/applogic_network file from its boot volume.
1. Create a new application (e.g., myfiles), drag a LINUX appliance and a NET gateway. Configure and save the app.
2. Start the app
3. Login to the LINUX appliance
4. Create a new directory /root/temp, make it the current directory
5. Execute the following command (from inside the /root/temp directory):
tar czf appliance-files.tgz /boot/* /lib/modules/* /appliance/* /etc/init.d/applogic*
6. Get that file to your workstation (or the physical server on which you will be doing the new CentOS physical build)
- either scp from /app/myfiles/main.LINUX/root/temp/appliance-files.tgz
- or upload it somewhere you can get it from using the NET gateway of the LINUX appliance
7. Rebuild the tar file, adding the /etc/init.d/applogic_network file to it
Step II - create image on a physical server
1. Install the desired Linux distro (e.g., CentOS 4.4 or 5) on a physical server locally; use a single-partition install (boot+all) and a reasonably minimal install. Leave at least 2GB+ free on the partition for the compressed image you will be creating. Make sure the 'iproute2' package is installed (it is installed by default on most modern distros).
2. Login as root
2.1. Make sure SELinux is disabled -- either by installing it disabled or by editing /etc/selinux/config and setting SELINUX=disabled
3. Copy the appliance-files.tgz file into the /root directory
4. Create a image file using the following command (e.g., in the /root directory)
dd if=/dev/zero of=LINUX.boot.img bs=1k seek=2047k count=1k
(check that the resulting file, LINUX.boot.img, is *exactly* 2GB or 2,147,483,648 bytes)
5. Mount the new image file using the following commands
mkdir -p /mnt
mkfs -t ext3 -F LINUX.boot.img
mount -o loop LINUX.boot.img /mnt
6. Copy and adjust the image using the following commands
mkdir /mnt/{proc,sys,home,tmp}
cp -ax /{root,dev,var,etc,usr,bin,sbin,lib} /mnt
rm -f /mnt/etc/sysconfig/network-scripts/ifcfg-eth* # required to allow AppLogic to configure network
Disabling the default network described above is for RedHat-style distros only. Generic rule: disable all network interfaces, except 'lo' (and maybe the ipv4-ipv6 tunnel, if desired).
For appliances that will have an "external" interface:
- leave the external interface setup intact AND make sure that it is named 'eth0'. Do not use a configuration that refers to the interface by MAC address. For RedHat/CentOS: delete the HWADDR= line from the 'eth0' config file.
rm -f /mnt/etc/sysconfig/network-scripts/route-eth*
echo "" > /mnt/etc/resolv.conf
echo "127.0.0.1 localhost.localdomain localhost" > /mnt/etc/hosts
mv /mnt/lib/tls /mnt/lib/tls.disabled # required for Xen (contact support for Xen-friendly glibc if you need NPTL)
pushd /mnt
tar xzf /root/appliance-files.tgz # you may get a warning about removed '/', ignore it
popd
rm -f /mnt/root/appliance-files.tgz # the archive is no longer needed
rm -f /mnt/appliance/passwd.stamp # forces the appliance to generate a unique root password on next boot
chroot /mnt /sbin/chkconfig --add applogic_vma
chroot /mnt /sbin/chkconfig --add applogic_cca # only for AppLogic 2.0, skip for 1.2
chroot /mnt /sbin/chkconfig --add applogic_appliance
chroot /mnt /sbin/chkconfig --add applogic_network
chroot /mnt /sbin/chkconfig --remove kudzu
touch /mnt/etc/applogic_network.conf
Also the above works only on RedHat systems. Other distros use different methods for installing auto-start services. Required ordering of the service start:
applogic_network starts before the other applogic_* services.
applogic_network starts with (or after) the system network service (in RedHat, this is 'network', with start priority of 10). It must also be AFTER the hostname is set - distros may differ in when/how they do this (e.g., RedHat does this much before network start, while Debian does it with network start). If the hostname is set sometime after the network is started, applogic_network startup must be moved to start AFTER that.
7. Check the /mnt/etc/fstab file, make sure it contains only one hard disk entry ("/dev/hda1 <tab> / <tab> ext3 <tab> defaults <tab> 0 0"); the file may contain other necessary file systems, such as /dev/pts, /dev/shm, /proc and /sys -- those are OK.
8. Make sure ARP ignore option is enabled (add the following two lines at the end of /mnt/etc/sysctl.conf if they are missing)
# ignore arp
net.ipv4.conf.all.arp_ignore = 1
9. Delete unnecessary files - /mnt/var/log/messages*, /mnt/var/log/secure*, ..., /mnt/tmp/* (there's other junk that can - and should - be deleted, some of it is distro-dependent. Removing the bulky stuff hereh will cut down on gzip/download times.)
# zap the empty space on the disk image (gets better compression):
cat </dev/zero >/mnt/tmp/zeros # this will stop with a disk full error
sync
rm /mnt/tmp/zeros
10. Unmount the image and compress it (make sure your current directory is outside /mnt, e.g., in /root)
umount /mnt
gzip LINUX.boot.img # output file should be LINUX.boot.img.gz
Step III - get the image onto the grid
1. Create a new application (e.g., mydistro), drag a LINUX appliance and branch it. Save the app
2. Resize the LINUX appliance boot volume to be exactly the same size as the loopback device image file you have (2G)
vol resize mydistro:LINUX.boot size=2G
3. Export the application to 'mydistro'
4. Using sftp to the controller, rename the volume file in the myapp directory (impex/mydistro/LINUX.boot.img.gz) to some other name in the same directory (e.g., old.img.gz)
5. Using sftp/scp, upload your image file to the same directory (impex/mydistro); make sure your image is named the same way as the old LINUX appliance boot volume was named: LINUX.boot.img.gz
6. Import 'mydistro' into a new application, e.g., mydistro2
7. Open the app in the editor (refresh the browser to update the app list)
8. Edit the LINUX appliance class, General tab. Make sure the Linux kernel and initrd paths are correct (see the files in the /boot directory of the appliance-files.tgz)
9. Save the app and try to start it. Don't forget to start it with the --debug option, as this will allow you to clean up last minute problems (assuming the distro boots generally OK, you will be able to ssh into it even if not everything went well).
Step IV - inspect/modify the volume in case the app did not start
Option A: stop the app, mount the LINUX boot volume and use an sftp client to check the /var/log/dmesg and messages
Option B: provide access to the volume in a Linux/GSC appliance
1. Provision an instance of GSC into a new app, 'myfiler' (make sure the root password is strong enough)
2. Open the new app in the editor, edit the GSC's class
3. Go to the Volumes tab, add a new placeholder volume, 'data', mounted as /dev/hda2
4. Save the application, close the editor
5. Copy the new Linux distro's boot volume as a data volume of the new filer app
vol copy mydistro2:LINUX.boot myfiler:data
6. Start the myfiler application
7. Login to the GSC as root, mount the data volume
mkdir -p /mnt
mount /dev/hda2 /mnt
8. Inspect and/or modify the volume
9. Stop the myfiler application
10. Copy the volume back to the new distro application (both apps must be stopped)
vol copy myfiler:data mydistro2:LINUX.boot
Option C: contact 3Tera or hosting provider customer support to assist with diagnosing & fixing the problem
Step V - Cleanup and prepare the appliance for catalog
1. Once the appliance with the new distro is booting OK, stop the app and make a backup copy (app copy mydistro2 mydistrobkp)
2. Disable all unnecessary services, including ALL hardware-specific services (such as kudzu). Use "chkconfig --remove" or similar. Backup again.
3. See Appliance Creation Guide in the AppLogic documentation for:
-- typical cleanup (deleting log files, password stamp, etc.)
-- preparation for moving an appliance into a catalog (additional cleanup)
We will be collecting notes on the procedure; once finalized, we will post the final version here, as well as put it in the documentation.
Regards,
-- Peter
mwadham
07-06-2007, 03:08 AM
Peter,
Many thanks for this, we probably won't have time to try this until next week sometime but I will check back and let you know how it goes.
Have a great weekend,
Mark
mwadham
07-06-2007, 08:39 AM
Peter,
A quick question - can you foresee any problems in using VMware to create the initial image of the foreign distribution? I can't see this being a problem as long as the kernel is reasonably generic.
Regards,
Mark
PeterNic
07-06-2007, 01:13 PM
Mark, I don't know of anyone who has done this, but I don't see why it wouldn't work. Let us know whether it did.
--Peter
mwadham
07-07-2007, 05:18 PM
Hi,
I tried this procedure with Gentoo running in vmware, I have got the boot image successfully into AppLogic and it starts up, but for some reason AppLogic isn't able to communicate with it properly. It times out when starting the application and deems it to be in a failed state, yet I can ssh into the application and see that it's up and running, at least to the point it would be if it were still running in vmware.
The initscripts that were copied over from the generic Linux catalog appliance had to be modified for Gentoo, I'll paste the modified files below. I think this must be where something is missing as AppLogic never gets notified that the application has started successfully.
The AppLogic networking stuff seems to work though, as the box gets the correct IP addressing and I can ssh into it from the 3t prompt.
/etc/init.d/applogic_appliance:
#!/sbin/runscript
# Created by Mark Wadham @ Areti 2007
# This is a Gentoo/runscript version of the applogic_appliance initscript
depend() {
need net applogic_network
}
start() {
ebegin "Starting ${SVCNAME}"
cd /appliance
./appliance.sh start
touch /var/lock/subsys/applogic_appliance
eend $?
}
stop() {
ebegin "Stopping ${SVCNAME}"
cd /appliance
./appliance.sh stop
rm -f /var/lock/subsys/applogic_appliance
eend $?
}
/etc/init.d/applogic_cce:
#!/sbin/runscript
# Created by Mark Wadham @ Areti 2007
# This is a Gentoo/runscript version of the applogic_cce initscript
depend() {
need net applogic_network
}
# get the CCE parameters from the kernel command line
get_params()
{
for tmp in $(cat /proc/cmdline);
do
token=`echo $tmp | cut -d "=" -f 1`
if [ "$token" == "cav_ip" ]; then
cav_ip=`echo $tmp | cut -d "=" -f 2`
fi
if [ "$token" == "dflt_ifc" ]; then
dflt_iface=`echo $tmp | cut -d "=" -f 2`
fi
done
tmp=`ip -o -f inet addr show $dflt_iface`
tmp=`echo $tmp | cut -d " " -f 4`
dflt_ip=`echo $tmp | cut -d "/" -f 1`
node_id=0
for tmp in ${dflt_ip//\./ } ; do
node_id=$(( ($node_id) * 256 + $tmp ))
done
[[ "$node_id" != 0 ]] | return 0
return 1
}
gen_cfg()
{
echo "node_id = \"$node_id\""
echo "node_name = \"${__APP_NAME}:${__COMP_NAME}\""
echo "cav_ip = \"$cav_ip\""
echo "grb_port = 90"
echo "ccs_port = 91"
return 1
}
# --- Main --------------------------------
# exit with no action, if this is not running in an AppLogic environment
# (standalone startup)
[[ -f /etc/applogic.sh ]] || exit 0
# import appliance properties
source /etc/applogic.sh
start() {
ebegin "Starting ${SVCNAME}"
get_params || exit 1
[[ "$cav_ip" == "0.0.0.0" ]] && exit 0
# TODO: the following to be removed after cav_ip starts having the 10-net address
route add -host "$cav_ip" "$dflt_iface"
gen_cfg >/etc/cce.cfg
cd /usr/local/cce/bin
./cce start
nohup ./ccesyscnt &
touch /var/lock/subsys/applogic_cce
eend $?
}
stop() {
ebegin "Stopping ${SVCNAME}"
pkill ccesyscnt
cd /usr/local/cce/bin
./cce stop
rm -f /var/lock/subsys/applogic_cce
eend $?
}
/etc/init.d/applogic_network:
#!/sbin/runscript
# Created by Mark Wadham @ Areti 2007
# This is a Gentoo/runscript version of the applogic_network initscript
depend() {
need net hostname
}
# set a safe path, don't rely on init to give us one
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
# this must match the file name:
svc=applogic_network
# the 'ip' command is required
T=`type -p ip`
[ -n "$T" ] || exit 1
# functions used by the 'applogic_network.conf' file:
function m2d()
{ ip -o link list | \
awk "BEGIN { IGNORECASE=1 } /ether $1/ { print substr(\$2,1,length(\$2)-1) }"
}
function terminal()
{ DEVICE=`m2d $1` ; }
function t_ip()
{ ip "$@" dev $DEVICE ; }
start() {
ebegin "Starting ${SVCNAME}"
[ -f /etc/applogic_network.conf ] || { exit 1 ; }
. /etc/applogic_network.conf
mkdir -p /var/lock/subsys
touch /var/lock/subsys/$svc
eend $?
}
stop() {
ebegin "Stopping ${SVCNAME}"
# stop all ethernet interfaces (it is OK if some of them aren't ours,
# they should be stopped in the same init phase as well)
ifcs=`ip -o link list | awk -F ": " '/link\/ether/ { print $2 }'`
for i in $ifcs ; do
ip link set down dev $i
ip addr flush dev $i
# the above commands also remove the routes
done
rm -f /var/lock/subsys/$svc
eend $?
}
status() {
# beyond the 'lock' file presence, we can't determine the status reliably:
[ -f /var/lock/subsys/$svc ] || exit 1
eend $?
}
reload() {
$0 stop
$0 start
eend $?
}
restart() {
$0 stop
$0 start
eend $?
}
install_chkconfig() {
T=`basename $0`
cp $0 /etc/init.d
chmod +x /etc/init.d/$T
chkconfig --add $T
touch /etc/applogic_network.conf
echo "IMPORTANT: the system-provided network configuration for all ethernet devices"
echo "except 'external' interfaces should be disabled before re-starting."
echo "E.g., 'rm /etc/sysconfig/network-scritps/ifcfg-eth*', or"
echo " 'rm /etc/sysconfig/network-scritps/ifcfg-eth[1-9]', to keep eth0 intact"
eend $?
}
/etc/init.d/applogic_vma:
#!/sbin/runscript
# Created by Mark Wadham @ Areti 2007
# This is a Gentoo/runscript version of the applogic_vma initscript
depend() {
need net applogic_network
}
# exit with no action, if this is not running in an AppLogic environment
# (standalone startup)
[[ -f /etc/applogic.sh ]] || exit 0
# import appliance properties
source /etc/applogic.sh
start() {
ebegin "Starting ${SVCNAME}"
# change password on first boot
old_vm=""
[[ -f /appliance/passwd.stamp ]] && old_vm=`cat /appliance/passwd.stamp`
vm="$__APP_NAME:$__COMP_NAME"
if [ "$old_vm" != "$vm" ]; then
PW=`cat /proc/sys/kernel/random/uuid`
passwd root <<END
$PW
END
RET=$?
[ "$RET" = 0 ] && echo "$vm" >/appliance/passwd.stamp
fi
# disable fsck on boot.
echo "this file disables fsck on boot" >/fastboot
# start VM agent
cd /appliance
./vma_ctl.sh start
RET=$?
[ "$RET" = 0 ] && touch /var/lock/subsys/applogic_vma
eend $?
}
stop() {
ebegin "Stopping ${SVCNAME}"
cd /appliance
./vma_ctl.sh stop
RET=$?
[ "$RET" = 0 ] && rm -f /var/lock/subsys/applogic_vma
eend $?
}
/var/log/dmesg doesn't seem to offer any clues as to what's wrong either...
[dmesg truncated due to forum post limit]
raid5: automatically using best checksumming function: pIII_sse
pIII_sse : 2650.000 MB/sec
raid5: using function: pIII_sse (2650.000 MB/sec)
md: md driver 0.90.1 MAX_MD_DEVS=256, MD_SB_DISKS=27
device-mapper: 4.4.0-ioctl (2005-01-12) initialised: dm-devel@redhat.com
NET: Registered protocol family 2
IP: routing cache hash table of 512 buckets, 4Kbytes
TCP established hash table entries: 2048 (order: 2, 16384 bytes)
TCP bind hash table entries: 2048 (order: 1, 8192 bytes)
TCP: Hash tables configured (established 2048 bind 2048)
NET: Registered protocol family 1
NET: Registered protocol family 17
Bridge firewalling registered
Freeing unused kernel memory: 116k freed
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
************************************************** *************
************************************************** *************
** WARNING: Currently emulating unsupported memory accesses **
** in /lib/tls libraries. The emulation is very **
** slow. To ensure full performance you should **
** execute the following as root: **
** mv /lib/tls /lib/tls.disabled **
************************************************** *************
************************************************** *************
Continuing...
EXT3 FS on hda1, internal journal
The dmesg log stops there.
I can only assume that some part of the init scripts are not sending the correct signal back to the controller to tell it that the application is running. If you guys could give me some clues as to where to look it'd be much appreciated.
Thanks,
Mark
mwadham
07-07-2007, 05:19 PM
You'll probably want to see this too..
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.3 1.8 1556 544 ? S Jul07 0:08 init [3]
root 2 0.0 0.0 0 0 ? SN Jul07 0:00 [ksoftirqd/0]
root 3 0.0 0.0 0 0 ? S< Jul07 0:00 [events/0]
root 4 0.0 0.0 0 0 ? S< Jul07 0:00 [khelper]
root 9 0.0 0.0 0 0 ? S< Jul07 0:00 [kthread]
root 12 0.0 0.0 0 0 ? S< Jul07 0:00 [kblockd/0]
root 35 0.0 0.0 0 0 ? S Jul07 0:00 [pdflush]
root 36 0.0 0.0 0 0 ? S Jul07 0:00 [pdflush]
root 38 0.0 0.0 0 0 ? S< Jul07 0:00 [aio/0]
root 37 0.0 0.0 0 0 ? S Jul07 0:00 [kswapd0]
root 579 0.0 0.0 0 0 ? S< Jul07 0:00 [kmirrord/0]
root 582 0.0 0.0 0 0 ? S Jul07 0:00 [kjournald]
root 654 0.0 2.8 1784 852 ? S<s Jul07 0:00 /sbin/udevd --daemon
root 2515 0.0 2.8 1928 840 ? Ss Jul07 0:00 /usr/sbin/syslog-ng
root 2735 0.0 5.2 105220 1576 ? Sl Jul07 0:00 ./vmad name=gentoo2.main.LINUX rem_ip=192.168.1.1 rem_port=11000 stats_enabled=1
root 2928 0.0 5.6 3956 1696 ? Ss Jul07 0:00 /usr/sbin/sshd
root 2985 0.0 2.5 1804 756 ? Ss Jul07 0:00 /usr/sbin/cron
root 3048 0.0 2.0 1592 624 tty1 Ss+ Jul07 0:00 /sbin/agetty 38400 tty1 linux
root 3387 0.0 7.4 6928 2228 ? Rs 00:07 0:00 sshd: root@pts/0
root 3392 0.0 5.1 2656 1544 pts/0 Ss 00:07 0:00 -bash
root 3486 0.0 1.8 1548 556 ? Ss 00:18 0:00 /sbin/agetty 38400 tty2 linux
root 3487 0.0 1.8 1548 556 ? Ss 00:18 0:00 /sbin/agetty 38400 tty3 linux
root 3488 0.0 1.8 1548 556 ? Ss 00:18 0:00 /sbin/agetty 38400 tty4 linux
root 3489 0.0 1.8 1548 556 ? Ss 00:18 0:00 /sbin/agetty 38400 tty5 linux
root 3490 0.0 1.8 1548 556 ? Ss 00:18 0:00 /sbin/agetty 38400 tty6 linux
root 3491 0.0 2.9 2184 872 pts/0 R+ 00:18 0:00 ps aux
PeterNic
07-07-2007, 05:40 PM
Mark,
Great job!
1. For AppLogic 1.2.14 you can disable and remove the applogic_cce (it is there but not used, replaced by applogic_cca that enables monitoring in 2.0)
2. The method by which AppLogic learns that the appliance has started OK is the following:
vmad (VM agent) is started after the network is started. It establishes communication with AppLogic; make sure that it can talk to the address you see on its command line (this address will be different every time; it always goes through the default address - the same that you use to ssh into the appliance). The fact that you were able to ssh into the appliance from the AppLogic shell means that this interface is configured well and not likely to be firewalled. Just in case, check 'netstat -n --tcp -p' -- you should see a connection to the IP address and port number shown on the vmad command line.
vme (VM event) can be run as many times as you like. There are 3 interesting events: started_ok, start_failed (you can specify reason for the failure in a text message, gets logged in the AppLogic log) and log (just log an event)
My guess is that vme is not invoked at the end of the boot -- it is usually invoked by the appliance-specific /appliance/appliance.sh script, after you check that everything is in order (the default script just says "we're OK"). Check the /appliance/appliance.sh script -- it should be invoked last with 'start' on the command line.
Regards,
--Peter
PeterNic
07-07-2007, 05:44 PM
Mark,
Another issue, visible in the dmesg log -- you need to either disable the NPTL library (mv /lib/tls /lib/tls.disabled) or install a virtualization-friendly glibc.
(Without this, you will get a performance slowdown, as described in the big, hard-to-miss starred banner in the dmesg log)
If you are interested, I will post the options for building glibc -- but you can disable it for now.
Regards,
-- Peter
mwadham
07-07-2007, 06:12 PM
Hi Peter,
I did actually do mv /lib/tls /lib/tls.disabled as per your original instructions, but that message still comes up. I am definitely keen to get glibc working properly so if you could post the options that would be great.
The application is starting up properly now, so thats good. I will try this with Debian tomorrow and let you know how it goes.
Thanks for all your help
Mark
mwadham
07-08-2007, 01:48 AM
I think -mno-tls-direct-seg-refs in the CFLAGS should fix glibc
mwadham
07-08-2007, 07:49 AM
I'm having a little bit of trouble with the Debian image:
(root)TestGrid1> app start debian_testing
Building application...
Creating volume debian_testing/volcache:main.NET.boot...Done
*** configuration of volume debian_testing.class.LINUX.boot failed, error log follows:
error: cannot determine the type of OS booted from volume debian_testing.class.LINUX.boot
Failed to build application 'debian_testing' - see log for details.
It has all the initscripts, and /etc/applogic.sh and /etc/applogic.csh but for some reason it can't seem to figure out that its Linux. I looked in /var/log on the controller but there were no messages related to this.
Mark
LeoKalev
07-08-2007, 08:33 AM
Mark,
On a first glance I don't see anything wrong with your modified scripts.
You can use the '--debug' option when starting the application to leave the appliance running, even if it doesn't report successful start (you've probably done that, already).
A few things to check:
does the "/sbin/runscript" program used to run the scripts turn on the "exit on fail" function of the shell (set -e). If this is turned on, the scripts are likely to fail - they are not designed to run with this option enabled.
Look at the system log (/var/log/messages). The 'dmesg' isn't helpful here - it has only kernel messages, and there isn't anything wrong with the kernel.
If you want me to help further, please do this:
add the following lines to each of the scripts /etc/init.d/applogic, /appliance/appliance.sh and /appliance/vma_ctl.sh
exec 2>>/tmp/startup.log
set -x
This will turn on shell tracing and redirect all 'stderr' output to /tmp/startup.log. You can then check it to see what went wrong - or send it to me by e-mail to review.
LeoKalev
07-08-2007, 08:41 AM
I'm having a little bit of trouble with the Debian image:
(root)TestGrid1> app start debian_testing
Building application...
Creating volume debian_testing/volcache:main.NET.boot...Done
*** configuration of volume debian_testing.class.LINUX.boot failed, error log follows:
error: cannot determine the type of OS booted from volume debian_testing.class.LINUX.boot
Failed to build application 'debian_testing' - see log for details.
Mark
You need to create an empty file named /etc/applogic_network.conf on the volume. This is noted in the procedure posted by Peter earlier in this thread, but if you used his older e-mail, this step was missing.
PeterNic
07-08-2007, 12:28 PM
My guess is that the started_ok event is not sent by the appliance at the end of startup (vme with started_ok event it, see how in /appliance/appliance.sh -- this is usually the last init script to run).
mwadham
07-08-2007, 01:10 PM
Ahh yes, it did say that in the instructions. Thanks, its now working :)
Mark
mwadham
07-09-2007, 01:45 PM
Hi,
I've noticed a few permissions issues with your instructions, so far /tmp had mode 755 instead of 777, and /dev/null had 600 instead of 666. It's my own fault for not checking but still, you might want to include some notes on permissions in the final draft of your instructions :)
Regards,
Mark
PeterNic
07-10-2007, 10:17 AM
Mark, it is weird that those are incorrect... we didn't change them from the distro... I wonder how they got this way.
In any case, thanks, we will include them in the instructions.
Is your appliance fully working now?
Regards,
--Peter
mwadham
07-11-2007, 02:08 AM
Hi Peter,
Perhaps copying a 777 directory between filesystems resets it to 755 for security purposes. The same thing seemed to happen on Gentoo, Debian Testing and Debian Sarge.
All three appliances work, the Debian Sarge one curiously shows 20 vmad processes in the process table despite having the exact same startup scripts as the Debian Testing appliance, although pstree shows that these are all threads of the same process so it must just be due to it being so old.
The only thing I did have trouble with was getting Plesk on Debian Sarge to work with AppLogic. I think I've just about managed to get it running now, I have the applogic_network and vma services starting first, then all the Plesk stuff, then after that the 'appliance start' script that reports back to the controller that it started ok. Any other order of services seems to cause the appliance to randomly either start or fail to start, depending on how it feels. I also had to increase the startup timeout for obvious reasons (Plesk has a ton of services).
Also, during the Plesk installation the 'Plesk VPN' package failed to install due to the 'tun' kernel device not being present. I would imagine this is a limitation of the Xen kernel that the appliances run, if there is an easy way to enable this please let me know. I doubt we'll ever use it, but it'd be handy to have it enabled just in case someone does want to.
Regards,
Mark
PeterNic
07-11-2007, 10:50 AM
Mark,
I am glad you were able to get all 3 distros to work -- that was quick!
The "20 vmad processes" is due to a missing NPTL library (I don't know if Debian Sarge had NPTL in the first place). They are threads of the same process.
Network and plesk - I wonder whether this is related to setting the host name. Maybe Leo can help here if you need it.
The tun device not being there - it is a missing driver in the kernel, an omission in the configuration file. There is no inherent limitation -- you can simply add that driver or rebuild the kernel (some customers have already done that and got the tun driver in). Also, our next AppLogic release will have that driver (together with ocfs2, another missing driver) included in the standard appliance kernel.
Regards,
-- Peter
PeterNic
07-11-2007, 10:55 AM
Mark,
Here is how we build the glibc to enable virtualization-friendly NPTL (this removes the need to rename /lib/tls, in fact it enables the usage of the NPTL threading, which is better/faster than the old linuxthreads package)
This procedure is for CentOS and rpm-based distros, but you can adapt it for any other distro:
get the glibc source rpm from centos download site (whatever version you need)
install the rpm using “rpm –i xxx”
Patch the glibc spec file located in /usr/src/redhat/SPECS/glibc.spec (this path may be different depending on your RPM setup) – it should have the following lines (boldface is what was added):
%ifarch %{ix86}
BuildFlags="-mno-tls-direct-seg-refs -march=%{_target_cpu}"
Build the glibc rpms: “rpmbuild -ba /usr/src/redhat/SPECS/glibc.spec”
The following is a list of RPMs that are needed from the build (they should be present in /usr/src/RPMS/i386):
glibc-<version>.i386.rpm
glibc-common-<version>.i386.rpm
nscd-<version>.i386.rpm
mwadham
07-12-2007, 01:35 AM
Thanks for this Peter, I already knew the compiler option as I have Gentoo VPS running on Xen which uses it, but I didn't know how to use it in rpm-based distros so that's very useful, thanks!
Mark
Jsmart
08-09-2007, 06:16 PM
there are a couple of things that could cause this..
#1 are you starting any type of firewall inside the appliance that could cause the agent not to be able to communicate to the controller on ports other than 22?
#2 if the vma/vme scripts and executables came from a different version of AppLogic. did you take these from another appliance on your current grid? (these are the /appliance/vma* and /appliance/vme files)
#3 something I'm missing in Gentoo. if the other two don't seem to work, I would be interested in looking here to help resolve this issue so we can post the resolution.
in addition to this; you do need to disable the TLS as there are know issues. please move /lib/tls to /lib/tls.disabled
--Jessie@3tera
jfchevrette
08-17-2007, 01:02 PM
Hi,
AppLogic 2.0.2
I have tried to build a custom CentOS5 appliance and so far it's not working. When I start the app, the LINUX appliance won't finish booting. The /var/log/message logs does not show any specific errors. The "app start" command keep going until I stop with Ctrl-C.
Starting the application with the --debug option did not show any difference.
Any ideas whan might be wrong? I did the process two times and the result was the same.
Thanks
Jean-Francois
PeterNic
08-17-2007, 01:28 PM
Jean-Francois,
When you start the app with --debug it will appear to fail the same way; however, it will actually leave the app running. About 1/2 minute after starting it, you can open a second console window and try to login to the appliance -- and inspect what's going wrong.
Regards,
-- Peter
jfchevrette
08-17-2007, 01:41 PM
Hi,
unfortunately I had no luck connecting to the appliance.
Warning: component mydistro2:main.LINUX is in starting state
--- ssh may not be available; trying anyway...ssh: connect to host 10.8.18.1 port 22: No route to host
I am guessing this is a problem with the network or something similar but I can't figure out what it is I've deleted all ifcfg-eth* and the applogic_network init script is set to start at boot.
What else can I do to debug further ?
EDIT: I can see the following under the /var/log/boot.log
Aug 17 16:36:11 centos5 vma_ctl.sh: SIOCADDRT: No such device
EDIT2: And this under /appliance/vmalog
Failed to add route for VMAD, err=7
SOLVED! The problem was that when I added the applogic_network script to my applogic-files.tgz archive, I lost the +x execute permissions and because of that the init script didn't start. I mounted the volume, changed the permissions and started my appliance succesfuly!
Regards,
jean-Francois
PeterNic
08-18-2007, 09:28 PM
Jean-Francois,
Looking at the edit history of your last message, that's not what I meant when I said that posting on the forums helps resolve problems -- but I see it did work for you :)
Thank you for using the forums!
-- Peter
PeterNic
09-04-2008, 05:33 PM
The same process for AppLogic 2.3.9 beta is described in http://doc.3tera.com/AppLogic23/AdvAPKUserManual.html#OS_Install
vBulletin® v3.7.5, Copyright ©2000-2012, Jelsoft Enterprises Ltd.