PDA

View Full Version : App fails with iptables failed to start


focher
03-13-2008, 08:56 PM
Hi,

I created a pretty simple application with an IN gateway appliance, a Linux appliance, and a OUT gateway appliance. I set the IP addresses on the IN and OUT appliances from my valid pool along with the gateway and DNS on the OUT appliance.

When I issue the app start, it eventually fails on starting the OUT appliance with the error that iptables failed to start. I did the app start with a --debug and then checked vmalog and it says:

Failed to add route: ip=192.168.10.1, dflt iface=eth2, err=7

I assume that this IP is automatically issued by the grid for some purpose. I have added my ifconfig and route results below and that network is being forwarded through one of the automatically provisioned interfaces.

eth0 Link encap:Ethernet HWaddr F2:66:07:00:14:07
inet addr:xx.xx.xx.xx Bcast:xx.xx.xx.255 Mask:255.255.255.0
inet6 addr: fe80::f066:7ff:fe00:1407/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:323 errors:0 dropped:0 overruns:0 frame:0
TX packets:23 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:26033 (25.4 KiB) TX bytes:1584 (1.5 KiB)

eth1 Link encap:Ethernet HWaddr F2:66:07:00:14:04
inet addr:10.56.20.16 Bcast:0.0.0.0 Mask:255.255.255.255
inet6 addr: fe80::f066:7ff:fe00:1404/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:5628 errors:0 dropped:0 overruns:0 frame:0
TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:334923 (327.0 KiB) TX bytes:468 (468.0 b)

eth2 Link encap:Ethernet HWaddr F2:66:07:00:14:05
inet addr:10.56.20.21 Bcast:0.0.0.0 Mask:255.255.255.255
inet6 addr: fe80::f066:7ff:fe00:1405/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:8739 errors:0 dropped:0 overruns:0 frame:0
TX packets:3175 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:540751 (528.0 KiB) TX bytes:718243 (701.4 KiB)

eth3 Link encap:Ethernet HWaddr F2:66:07:00:14:06
inet addr:10.56.20.23 Bcast:0.0.0.0 Mask:255.248.0.0
inet6 addr: fe80::f066:7ff:fe00:1406/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:6568 errors:0 dropped:0 overruns:0 frame:0
TX packets:1771 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:443180 (432.7 KiB) TX bytes:260433 (254.3 KiB)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:13449 errors:0 dropped:0 overruns:0 frame:0
TX packets:13449 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:5139414 (4.9 MiB) TX bytes:5139414 (4.9 MiB)

Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
192.168.7.2 * 255.255.255.255 UH 0 0 0 eth3
mon * 255.255.255.255 UH 0 0 0 eth2
10.56.20.15 * 255.255.255.255 UH 0 0 0 eth1
72.233.71.0 * 255.255.255.0 U 0 0 0 eth0
169.254.0.0 * 255.255.0.0 U 0 0 0 eth0
10.56.0.0 * 255.248.0.0 U 0 0 0 eth3
default 1.71.233.72.sta 0.0.0.0 UG 0 0 0 eth0

LeoKalev
03-14-2008, 10:58 AM
>> Failed to add route: ip=192.168.10.1, dflt iface=eth2, err=7

This vmalog entry is old, and likely is not related to the problem with starting the appliance. The correct service route used by the AppLogic "VM agent" daemon (vmad) does appear in your routing table (192.168.7.2 * 255.255.255.255 UH 0 0 0 eth3).

The likely problem is indeed with the iptables. It will help if you could please post the relevant system log entries with time stamps around the time when the appliance was last started.

PeterNic
03-14-2008, 11:47 AM
focher,

The two most common reasons for gateways failing to start are:
- insufficient memory (we recommend 64M set as min and default; sometimes they don't start in the minimum of 48M which is in the system catalog)
- duplicate IP address (e.g., setting the same IP address both in the IN and in the OUT gateway -- those two IP addresses must be different)

If none of these help, please post the last 50 or so lines from /var/log/messages as Leo suggested, or contact support at your provider. Either way, please let us know if it is resolved and what the trouble was.

Best regards,
-- Peter

focher
03-16-2008, 03:39 PM
I am using two different IPs on the IN and NET gateways. I checked the memory allocation and the MIN and Default are set to 64M (which I didn't change, so those values must be there as defaults now).

This is the last set of lines from /var/log/messages in the NET gateway after doing an app start --debug. I don't see any details why iptables is failing.

Mar 16 14:29:01 internet kernel: NET: Registered protocol family 2
Mar 16 14:29:01 internet kernel: blkfront: hda1: barriers enabled
Mar 16 14:29:01 internet kernel: netfront: device eth0 has copying receive path.
Mar 16 14:29:01 internet kernel: netfront: device eth1 has copying receive path.
Mar 16 14:29:01 internet kernel: netfront: device eth2 has copying receive path.
Mar 16 14:29:01 internet kernel: netfront: device eth3 has copying receive path.
Mar 16 14:29:01 internet kernel: IP route cache hash table entries: 512 (order: -1, 2048 bytes)
Mar 16 14:29:01 internet kernel: TCP established hash table entries: 2048 (order: 2, 16384 bytes)
Mar 16 14:29:01 internet kernel: TCP bind hash table entries: 2048 (order: 2, 16384 bytes)
Mar 16 14:29:01 internet kernel: TCP: Hash tables configured (established 2048 bind 2048)
Mar 16 14:29:01 internet kernel: TCP reno registered
Mar 16 14:29:01 internet kernel: IPv4 over IPv4 tunneling driver
Mar 16 14:29:01 internet kernel: GRE over IPv4 tunneling driver
Mar 16 14:29:01 internet kernel: TCP bic registered
Mar 16 14:29:01 internet kernel: Initializing IPsec netlink socket
Mar 16 14:29:01 internet kernel: NET: Registered protocol family 1
Mar 16 14:29:01 internet kernel: NET: Registered protocol family 17
Mar 16 14:29:01 internet kernel: Using IPI No-Shortcut mode
Mar 16 14:29:01 internet kernel: Freeing unused kernel memory: 156k freed
Mar 16 14:29:01 internet kernel: kjournald starting. Commit interval 5 seconds
Mar 16 14:29:01 internet kernel: EXT3-fs: mounted filesystem with ordered data mode.
Mar 16 14:29:01 internet kernel: EXT3 FS on hda1, internal journal
Mar 16 14:29:01 internet kernel: device-mapper: 4.5.0-ioctl (2005-10-04) initialised: dm-devel@redhat.com
Mar 16 14:29:01 internet kernel: kjournald starting. Commit interval 5 seconds
Mar 16 14:29:01 internet kernel: EXT3-fs: mounted filesystem with ordered data mode.
Mar 16 14:29:01 internet kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
Mar 16 14:29:01 internet kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
Mar 16 14:29:01 internet kernel: NET: Registered protocol family 10
Mar 16 14:29:01 internet kernel: lo: Disabled Privacy Extensions
Mar 16 14:29:01 internet kernel: IPv6 over IPv4 tunneling driver

Issuing a /etc/init.d/iptables restart doesn't seem to do anything. Nothing even shows up in the messages log. Also, I did an iptables -L and everything is empty. No chains.

Should I have to do anything more than drag the appliance over, hook up the NET service port from my Linux appliance to the NET gateway's IN service port then set the IP address, gateway, and DNS info?

PeterNic
03-16-2008, 04:38 PM
focher,


Should I have to do anything more than drag the appliance over, hook up the NET service port from my Linux appliance to the NET gateway's IN service port then set the IP address, gateway, and DNS info?

Yup, that's all. Keep in mind that 'out' terminal should be connected to OUT gateway; 'net' terminal to NET gateway (misconnecting will not prevent appliance start but will not function well).


I don't see any details why iptables is failing.


What I see from the log above is that the appliance-specific services on the appliance did not start... this is weird. iptables start but are not loaded with the correct ruleset -- it appears that the appliance doesn't even try to start the services. You may want to refresh the appliance's boot volume, just to be sure that the volume has not been damaged in some way. To do this, stop the app, issue 'app clean <appname>' and then restart the app (during the 'clean' command, you will see that 'volcache' volumes are being deleted -- that's OK, they will be rebuilt on the next app start).

If that doesn't help, check out the following logs, in this order:

On the appliance itself, check any log files in the /appliance directory
The grid log (do 'list log n=20' in the grid shell, after app start fails)


Also, if you are familiar with the Unix System V init scripts system, you can try to track down why the applogic_xxx services don't start.

If this doesn't help, you have two options:

continue this exchange on the forums
contact your grid provider's support; the latter may give you a faster resolution


Best regards,
-- Peter

PeterNic
03-16-2008, 04:43 PM
The OUT gateway may not be able to start if its remote_host property is not set or the host name cannot be resolved.

(From your first message, I thought you have IN.out->LINUX.in, LINUX.out->OUT.in ; your previous message was talking about NET gateways, I assume IN.out->LINUX.in, LINUX.net->NET.in)

-- Peter

PeterNic
03-16-2008, 04:48 PM
The properties you want to have set correctly on NET are

ip_addr
netmask
gateway
dns1


You can verify that they are all set correctly, by, for example, trying to ping www.yahoo.com from inside the NET appliance. If it doesn't work, then something is not set correctly (check external IP, netmask, routing table; as well as /etc/resolv.conf); once this starts working, then the next step is to see the same behavior from the LINUX appliance which is connected to NET.

In LINUX, the routing table should show default gateway being accessible through the 'net' terminal, the NET.in IP address as the default gateway; /etc/resolv.conf should show the NET.in's IP address. You can find the IP address of any appliance and any terminal 'iface info <appname>:main.<appliance>.<terminal>' in the grid shell.

Regards,
-- Peter

focher
03-16-2008, 05:56 PM
Sorry I started confusing the issue by mixing up the appliance types. Not using the OUT gateway so remote host doesn't apply. I am definitely using the NET gateway appliance so I have the IN.out --> LINUX.in --> Linux.net --> NET.in

On the NET appliance, I have set the ip (different than the one on the IN appliance) along with netmask, gateway and DNS. If I start with --debug and the app is in a failed state, I can ssh to the NET gateway and resolve DNS, ping external hosts, download etc.

After confirming I can access the Internet from the NET gateway, I tried from the LINUX appliance. The default route is set to "net" and "net" is defined on ETH5 on the same subnet as ETH3 on the NET appliance. My resolv.conf is set also to the NET appliance's IP address. I can ping the NET appliance but it goes no further when trying both an IP and a DNS name. My current theory is that the iptables rules are not enabling NAT, so will focus on that as the source of the problem.

I also did the app clean and still see the same problem. I am just trying to get comfortable with the grid and this is just to play around anyway, so I will try to fix it without resorting to support (except for the forum, of course).

focher
03-16-2008, 07:40 PM
Quite strange. I finally tried by just deleting the NET gateway appliance and putting a new one back. Everything started fine after that.

PeterNic
03-21-2008, 03:54 PM
Do you have a copy of the old, non-working app?