PDA

View Full Version : Interface - both input and output possible?


earthgecko
04-22-2009, 04:12 AM
Hi

I am trying to implement drbd on a NAS/NAS HA setup however after overcoming a number of hurdles with the drbd kernel module I am now stuck.

The problem, "I think", is that drbd needs to communicate in and out on the same interface so adding drbd_out and drbd_in interfaces to the appliance does not seem to work, drbd just does not like it as "I think" it is expecting communication from it's peer node from the drbd_in IP address (which is configured for each peer in the conf and is their in IPs) and it is receiving it from the drbd_out IP. I can telnet onto the 7788 port on the drbd_in interface from both NASs.

I have even tried binding and routing the drbd traffic through the appliance's default interfaces, alas no :(

However that has presented another mystery according to the documentation: (http://doc.3tera.com/AppLogic24/AdvADLSpec.html)

The 'interface default' entry enables configuring an unused network interface for unrestricted use, with an automatically assigned IP address on the same subnet as the ones used for terminal connections. The assigned IP address is made available to the AppLogic controller as the IP address of the component; this can be used for maintenance logins.

From that one would imagine an "unrestricted use" interface would allow for two appliance in the same application to at least ping each other on the default interface IP as defined in /etc/applogic/appliance.desc.ctl, however this does not seem to be the case. iptables is not running on either of the appliances as a service or kernel module. So am I totally wrong in my interpretation of the documentation?

Is there any way to add an interface that is both input and output on an appliance within a grid application?

Thanks again.

And PS - In relation to drbd and not being able to modify the filesystem as per http://forum.3tera.com/showpost.php?p=3881&postcount=2, I think I have managed to do it, you just do not let the appliance mount it and leave it for drbd to manage, seems to allow for modifications, even wiping the first blocks... dd if=/dev/zero of=/dev/hda3 bs=1K count=100 drbd can attach, connect and access the device... sync'ing... not sure until the possible IP issue is resolved...

PeterNic
04-24-2009, 06:38 PM
earthgecko,

I have been a proponent of bidirectional terminals for a while -- and we just may add them after the next release. However, most of the function you need is probably there. Starting in 2.4.7, the protocol "any" allows establishing connections in both directions (e.g., what would be needed for ftp support); the only thing that is directional is the simple binding. If I recall properly drbd, it is asymmetrical, i.e., there is a master and a slave. You can create a master and slave terminals (e.g., an output and an input; or vice-versa, depending on whether the master binds to the slave or vice-versa). Then you connect the master terminal to the slave; once the connection is initiated, both appliances will have the peer's IP and can establish new connections to each other via the same interface. In addition, with a bit more dipping in the new /etc/applogic/appliance.desc or one of its derivatives, you can get both the local and the remote IP address of connected appliances. (If this is the approach that can work for you, we can explain further if the APK documentation is insufficient.)

The default interface's description in the ADL reference is out-of-date by a few millenia (only with respect to it being "unrestricted"); thanks for pointing it out -- we'll fix it. The default interface is somewhat of a historical artifact; it is no longer optional and all appliances have it; it is used for very restricted communication between the appliance and the runtime (e.g., maintenance logins over ssh, getting IP addresses and appliance configuration); it cannot be used for data exchange between appliances (that's what terminals are for -- and we may expand the possible connection types in the future but they will all be explicit and via terminals).

Finally, failing all else, you can use the "external" interface of an appliance; it is also known as a "raw" interface, which provides an escape mechanism when the point-to-point terminal-based connections cannot be used (e.g., for groupcast communication or many-to-many connections); in a way, it is what it was intended to do in addition to being the uplink to the connected world. Some customers use RFC1918-type address ranges to arrange private communication between appliances of the same application and/or to communicate privately between applications on the same grid.

This is the last-resort method for what you're trying to do -- essentially it is what you would have done if you didn't have AppLogic. I hope the bidirectional communication across terminals will work out better, though.

Best regards,
-- Peter

PS: on modifying the block volume from under a filesystem -- yes, as long as the volume is not mounted on the filesystem, making changes at the block level are OK; as long as you stop modifying the block-level once you mount the block device in the filesystem (which I assume is what you do when the drbd slave becomes the master).

PeterNic
04-24-2009, 06:46 PM
Here's how the DRBD appliance and its normal connection method might look if using the bidirectional connections with terminals (see attached snapshot)

earthgecko
04-24-2009, 11:50 PM
Hi Peter

Thanks again :) I tired any, but obviously had an in and out terminal and did not use the any as a bidirectional terminal, I will give that a try and see how it pans out. I may not update this post for a while with feedback on that.

earthgecko
04-25-2009, 02:20 AM
Hi Peter

You are a star. That worked, attaching NASR1's out (protocol any) to NASR2's in (protocol any). Although it is a strange setup to have out and in for out/in and in/out (you know what I mean). Thank you, now have drbd talking and sync'ing!

Great stuff.

PeterNic
04-25-2009, 01:58 PM
earthgecko,

Fantastic -- glad it works! I expect we will add the explicit notion of bidirectional terminals, in order to make this not strange, and more importantly, intuitive.

Thanks for your feedback

Best regards,
-- Peter

earthgecko
05-01-2009, 03:05 AM
Hi

After much more twinkering! We are now at a point where I am fairly certain that there is no way (that I know of) to overcome this one. A brief overview is that, heartbeat is trying to bring up the virtual interface on which run it's configured service/s, in this case nfs.

Getting the drbd working is solved, now even with shared /var/lib/nfs locks, etc so that both NASRs can failover and take over the locks so that the client does not need to reacquire any locks. There seems to be no issue with getting drbd using and mounting an app volume.

However, DHCP on the grid may be a showstopper in this type of implementation, a drbd/HA NASR1/NASR2 within an application. May be time to consider static appliance addressing. Unless you can provide some pointers? I have a feeling that static appliance address main be quite difficult, hence I have tried to use the AppLogic DHCP appliance addressing, fearing what happens when you go deeper into the rabbit hole.

Any suggestions would be greatly appreciated or pointing me at some specific documentation on how to use static IP for appliances/terminals for grid application appliances.

Here is the hearbeat /var/log/messages excerpts:

May 1 09:46:09 LUX heartbeat: [22693]: info: Version 2 support: false
May 1 09:46:09 LUX heartbeat: [22693]: WARN: Logging daemon is disabled --enabling logging daemon is recommended
May 1 09:46:09 LUX heartbeat: [22693]: info: **************************
May 1 09:46:09 LUX heartbeat: [22693]: info: Configuration validated. Starting heartbeat 2.99.2
May 1 09:46:09 LUX heartbeat: [22694]: info: heartbeat: version 2.99.2
May 1 09:46:09 LUX heartbeat: [22694]: info: Heartbeat generation: 1241102532
May 1 09:46:09 LUX heartbeat: [22694]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth7
May 1 09:46:09 LUX heartbeat: [22694]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth7 - Status: 1
May 1 09:46:09 LUX heartbeat: [22694]: info: G_main_add_TriggerHandler: Added signal manual handler
May 1 09:46:09 LUX heartbeat: [22694]: info: G_main_add_TriggerHandler: Added signal manual handler
May 1 09:46:09 LUX heartbeat: [22694]: info: G_main_add_SignalHandler: Added signal handler for signal 17
May 1 09:46:09 LUX heartbeat: [22694]: info: Local status now set to: 'up'
May 1 09:46:19 LUX dhclient: DHCPREQUEST on eth10 to 10.47.255.254 port 67
May 1 09:46:30 LUX heartbeat: [22694]: WARN: node nasr2drbd: is dead
May 1 09:46:30 LUX heartbeat: [22694]: info: Comm_now_up(): updating status to active
May 1 09:46:30 LUX heartbeat: [22694]: info: Local status now set to: 'active'
May 1 09:46:30 LUX heartbeat: [22694]: WARN: No STONITH device configured.
May 1 09:46:30 LUX heartbeat: [22694]: WARN: Shared disks are not protected.
May 1 09:46:30 LUX heartbeat: [22694]: info: Resources being acquired from nasr2drbd.
May 1 09:46:30 LUX harc[22703]: info: Running /etc/ha.d/rc.d/status status
May 1 09:46:30 LUX mach_down[22738]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
May 1 09:46:30 LUX mach_down[22738]: info: mach_down takeover complete for node nasr2drbd.
May 1 09:46:30 LUX heartbeat: [22694]: info: Initial resource acquisition complete (T_RESOURCES(us))
May 1 09:46:30 LUX heartbeat: [22694]: info: mach_down takeover complete.
May 1 09:46:30 LUX IPaddr[22770]: INFO: Resource is stopped
May 1 09:46:30 LUX heartbeat: [22704]: info: Local Resource acquisition completed.
May 1 09:46:30 LUX harc[22841]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
May 1 09:46:30 LUX ip-request-resp[22841]: received ip-request-resp IPaddr::10.40.40.250/16/eth7 OK yes
May 1 09:46:30 LUX ResourceManager[22862]: info: Acquiring resource group: nasr1drbd IPaddr::10.40.40.250/16/eth7 drbddisk::r0 Filesystem::/dev/drbd0::/mnt/data::ext3 nfs
May 1 09:46:30 LUX IPaddr[22889]: INFO: Resource is stopped
May 1 09:46:31 LUX dhclient: DHCPREQUEST on eth10 to 10.47.255.254 port 67
May 1 09:46:31 LUX ResourceManager[22862]: info: Running /etc/ha.d/resource.d/IPaddr 10.40.40.250/16/eth7 start
May 1 09:46:31 LUX IPaddr[22984]: INFO: Using calculated netmask for 10.40.40.250: 255.255.0.0
May 1 09:46:31 LUX IPaddr[22984]: INFO: eval ifconfig eth7:0 10.40.40.250 netmask 255.255.0.0 broadcast 10.40.255.255
May 1 09:46:31 LUX IPaddr[22955]: INFO: Success
May 1 09:46:31 LUX Filesystem[23097]: INFO: Running OK
May 1 09:46:36 LUX ResourceManager[22862]: info: Running /etc/init.d/nfs start
May 1 09:46:40 LUX heartbeat: [22694]: info: Local Resource acquisition completed. (none)
May 1 09:46:40 LUX heartbeat: [22694]: info: local resource transition completed.
May 1 09:46:41 LUX kernel: NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
May 1 09:46:41 LUX kernel: NFSD: starting 90-second grace period
May 1 09:46:44 LUX dhclient: DHCPREQUEST on eth10 to 10.47.255.254 port 67
May 1 09:47:24 LUX last message repeated 3 times
May 1 09:48:27 LUX last message repeated 5 times

Just for clarification. HA does actually bring up the virtual interface as shown below as eth7:0. This is the desired result as well and it is bound on eht7:0 as the interface that drbd listens on is eth7. However, it "seems" like eth10 (the defaut interface) is trying to broadcast a DHCP change????

eth7 Link encap:Ethernet HWaddr F2:02:05:00:28:07
inet addr:10.40.40.1 Bcast:0.0.0.0 Mask:255.255.255.255
inet6 addr: fe80::f002:5ff:fe00:2807/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:54085 errors:0 dropped:0 overruns:0 frame:0
TX packets:15913 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:6389793 (6.0 MiB) TX bytes:1098323 (1.0 MiB)

eth7:0 Link encap:Ethernet HWaddr F2:02:05:00:28:07
inet addr:10.40.40.250 Bcast:10.40.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

eth9 Link encap:Ethernet HWaddr F2:02:05:00:28:09
inet addr:10.40.40.3 Bcast:0.0.0.0 Mask:255.255.255.255
inet6 addr: fe80::f002:5ff:fe00:2809/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:47749 errors:0 dropped:0 overruns:0 frame:0
TX packets:3489 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:10516756 (10.0 MiB) TX bytes:229433 (224.0 KiB)

eth10 Link encap:Ethernet HWaddr F2:02:05:00:28:0A
inet addr:10.40.40.10 Bcast:10.47.255.255 Mask:255.248.0.0
inet6 addr: fe80::f002:5ff:fe00:280a/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:15066 errors:0 dropped:0 overruns:0 frame:0
TX packets:3301 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1876576 (1.7 MiB) TX bytes:630682 (615.9 KiB)

himanshu
05-01-2009, 04:18 AM
Earthgecko -
It would help if you can outline as how you build the DRBD modules. After installing DRBD, modprobe simply does not recognize those modules. Can you please post the procedure you did for getting to this point?
Thanks,
Himanshu

himanshu
05-01-2009, 04:22 AM
Here is something you can try but obviously be unrecommended :)
The 2 appliance you are creating should be on different physical servers. Now you configure virtual interface eth0:1 and give them 172.X segment IP addresses and route them through the external interface. That will get you static IP but the communication will through your external switch and not the backbone.
Thanks,
Himanshu

PeterNic
05-01-2009, 10:38 AM
earthgecko,

I am not sure I get what's not working... but here's what I can propose:

- if this is not related to the bi-directional terminal connections, let's move it to a separate post
- see my clarification below on how appliances work and where DHCP is used and where it isn't
- if you like, we can arrange a quick chat with a few of our engineers -- including the one that built the appliance kit (hence the DHCP support and all the other wizardry that makes appliances work inside) and the one that set up heartbeat for the INSSLR gateway (which uses heartbeat for its public interface to provide redundant hosts for the same IP)

With respect to the appliance network configuration -- for more details see the appliance kit doc -- here are the basics:
- each appliance has a "default" network interface that is automatically configured by AppLogic using DHCP on a private network (so DHCP is not exposed on the public network); the "default" name is a bit of a misnomer -- consider it the "internal" interface; it is used by the appliance to communicate with AppLogic itself, primarily to get its configuration from and be able to send and receive system events (optional).
- each appliance has an optionally enabled "external" interface that works exactly as a regular NIC on a physical server. You are responsible to configure it (and by you, I mean the appliance itself); usually IP addresses are fully static and specified through properties. In addition to using the external interface for communicating on the internet, many customers use it with RFC1918 addresses for internal traffic -- between appliances of the same app (where the terminal connections don't work, e.g., in many-to-many cases); between applications on the same grid; and, in rare cases, between grids that sit on the same L2 network (primarily for DR).
- each appliance has zero or more terminals; the terminals are directional, point-to-point named interfaces used to provide private communication between appliances of the same application. Their IP addresses are selected by AppLogic and provided to the appliance through the "default" network interface during boot. The appliance kit (APK) is the one that actually takes that information (a map of MAC-to-IP), and sets the IP addresses for the terminals. AppLogic enforces that MAC and IP address, similar to ACL filters on a L2/L3 switch, except those get automatically configured by AppLogic and no humans or software ever has to deal with that.

In addition, for terminal-based connections, you have two types of terminals: inputs (for receiving requests for service) and outputs (for sending requests for service); as raised in this thread, there is a good case for having bidirectional terminals (which can be fully emulated as described in this thread). Note that traffic flows in both directions always -- the difference between inputs and outputs is only in who originates the connection (essentially, a binding method). Outputs have a variation called "gateway outputs" which is beyond the scope of this topic.

In addition to the MAC-to-IP map that the appliance gets for its terminal connections, it actually gets a full connection map -- the list of all IPs that connect to each input; and the IP to which each output is connected. Finally, for the IPs to which outputs are connected, APK creates host names, so that the name of the output terminal always resolves to the remote IP to which the output is connected.

There is no interaction whatsoever between the DHCP service the appliance uses for its default interface (i.e., its internal system interface) and any of the terminals or the external interface.

Let me know if any of the above helps; if not, whether you simply want to use the external interface (e.g., with RFC1918 addresses and/or public ones) or whether you want to have a quick chat and see if we can help you get the heartbeat to function properly (again, we use heartbeat in some of our DR appliances so we may have resolved things we don't even know were problems).

Best Regards,
-- Peter

earthgecko
05-05-2009, 12:24 AM
Hi Peter and Himanshu

I will have a good at configuring HA to use the eternal interface and see where I get to with that. Your comments above make sense and I can see how that would overcome the issue in theory. I will update with success (or failure) once I have reconfigured the appliances.

Thanks.

earthgecko
05-05-2009, 11:43 AM
Hi Peter and Himanshu

My last post must have been prophetic as I can report that I have managed to achieve both success and failure.

Peter indeed your comments, as always, were very useful. I have managed to get the external interfaces configured on both the NAS appliances and HA is now creating the virtual interface (as it was before), however now the virt int on each appliance can talk to each other. It works dandy one would expect with heartbeat.

However, I now realise that none of my other appliances within the application can see or indeed talk to this "new" 192.168.10.x range I have just created on the NASR appliances, alas they themselves can. The issue now is now that the appliances cannot use this range.

I hoped that this could be solved by putting the actual external interfaces on the "application" range, which is currently 10.40.40.x on the 255.248.0.0 subnet (I believe, as reported by the default interface). Configuring the eth0 interface as (example NASR1):

DEVICE=eth0
ONBOOT=yes
BOOTPROTO=static
IPADDR=10.40.40.20
NETMASK=255.248.0.0
(or a netmask of 255.255.255.255, like all the other interfaces for the terminal connectors)

The destination host is unreachable to a ping from NASR2 to NASR1 or from any other appliance. I am assuming that this is presumably at the iptables-like level running on the grid controller. So does this mean for all my other appliances in the application, I need to add an external interface that runs on the 192.x range (for example) so that they can communicate with NFS listening on heartbeat's virtual interface? This would also mean reconfiguring the 8 other appliances (probably the AppLogic appliance scripts) to cat/sed the grid assigned IP for the fs and log terminals, that are recorded in the /etc/hosts at startup and replace the grid assigned IP with the external interface NAS IP, would it not?

If I could avoid doing that, for interests of simplicity, I would prefer not to have to modify the AppLogic /appliance scripts and init.d files. However, at this point in time modifying the other appliance seems to be the only way forward.

PeterNic
05-05-2009, 06:03 PM
earthgecko,

I am completely at a loss of what you are trying to do... (BTW, you will not be able to assign any of the RFC1918 ranges that AppLogic uses and assigns internally -- it tracks those and will not let traffic without green lines)

My next guess is that you are trying to have two appliances that would serve the same IP address (like INSSLR already does), except provide this within the application; in this case, what it would really look like is that you will have the output of one appliance (e.g., fs of the WEB server) connected to two inputs at the same time (the input of each of the two failover NAS appliances). Is that what you're trying to achieve?

(My understanding is that the back-end synchronization using drbd works as stated and now you are trying to get the input to work properly).

If this is your goal, might I suggest using something like L3LB -- except you will modify it a bit to put priority on out1, so out2 will be used only if out1 is down. In fact, you may even be able to not modify it if your NAS+ appliances already decide by themselves which ones is the master. I'll attach a proposed diagram shortly.

Regards,
-- Peter

PeterNic
05-05-2009, 07:15 PM
Here's how the HA may work between the two NAS appliances (here assuming your drbd-based NAS appliances shown as the imaginary appliance class NASplus).

The back end of the NASplus is the same as shown earlier in this thread -- just a master and slave connection for drbd.

The inputs of the NASplus appliances are front-ended by a switch -- here using unmodified L3LB. You will notice I have added "ctl" output terminals on NASplus (and moved them on the left side to emphasize they are feedback).

The idea here is that NASplus will not try to multiplex its input IP address by itself. Instead, when it determines it is the primary, it would send the simple control requests:

1. For the master:
http://ctl/api/enable?channel=out1
http://ctl/api/disable?channel=out2

2. For the slave (the opposite):
http://ctl/api/disable?channel=out1
http://ctl/api/enable?channel=out2

(This is, of course, not the only way to do it: I'd rather have a custom load balancer/switch, that will detect when the primary drops, and send a notification and start sending to the secondary server. Instead of the appliances deciding which is currently active, they are told by the switch. In that case, the ctl is actually an incoming terminal (input). Both approaches will work; the first is simpler, since you modify only one appliance; the second one is more elegant/better architecturally, since it leaves the controlling function in the dispatcher rather than in the actors.)

I have attached a sample app (e.g., a media server) with a load balanced web front end, using the HA storage on the back. I have also attached a "zoomed-in" structure of the L3LB plus the NASes -- this is the essence of the proposed approach.

Let me know if this works (and, even, if this was the problem you were trying to solve :) ). It looks like we should start a new forum for architecture discussions.

Best regards,
-- Peter

earthgecko
05-05-2009, 11:54 PM
Hi Peter

Yes that would work it is a more resource hungry method of achieving this. Further to this it does introduce a single point of failure with the introduction of a L3LB, which is the fundamental reason for creating the HA NASR appliances in the first place.

You are also correct in that I now have two NAS appliances with drbd and HA working with failover, the stumbling block in the shared, HA virtual IP that is required for this architecture to work within the application range. Just like the shared HA IP in the INSSLR appliances, however in this case it needs to be a private IP rather than a public one. You stated that the RFC19198 addresses cannot be used, however you stated, "it tracks those and will not let traffic without green lines". Does this mean that there is a way to let the traffic in with "green lines"? If so could you let me know what green lines are.

I have PM'ed you a diagram of the application build, I apologies for not attaching it here, however I do not think my client would want me to publish their application build (I will check). This will give you an idea of what is trying to be achieved, essentially no single point of failure.

blate
08-08-2011, 05:13 PM
I agree that multi-directional terminals are not just desirable but *necessary* for certain applications. For example, I work with a number of apps that form trust relationships, in part, based on IP addresses. If I have two servers in the same application that need to talk to one another (A->B and B->A) I end up with two pairs of in/out terminals.

If A "registers" with B with source address 10.0.0.a, destination address 10.0.0.b, then B expects to be able to "register" back to A by sending a request to this same address and A expects to receive this request from address 10.0.0.b. But it doesn't end up working that way.

Worse still, suppose I would like to be able to have N machines with all-to-all communications -- e.g., for a shared coherent object cache... there does not appear to be any concept of a proper "network segment" (i.e., a bus). I'd like to be able to have a generic, "proper" interface that can be linked up in a many-to-many way with other hosts.

Is there a different AppLogic idiom for solving these sorts of problems? Or is thus a structural limitation of the present network model?

PeterNic
08-08-2011, 05:30 PM
Blate,

I agree this is needed.

For essentially bi-directional one-to-one connections, you can use terminals. (If you need tcp connections going back, you will need to use protocol "any"; udp is unrestricted).

For many-to-many connections, you have to use "raw" interfaces - namely, the external interface.

We're looking into extending the terminal idioms to support "bus"-type connections; but the above is what's working in the current version.

(Please IM me if you want to discuss possible approaches to the bus-type connections, so I can understand some of the specific requirements and make sure we use them in validation)

Best regards,
- Peter