View Full Version : NFSv4
digerata
07-17-2008, 01:29 PM
Hi,
I'm finding very little information on using NFSv4 with the NAS appliance. It appears that it is using v3 by default. Is there any documentation for v4 and/or are there any caveats to exporting mounts as v4 to other appliances that have an FS terminal?
Thanks!
PeterNic
07-17-2008, 05:10 PM
Digerata,
I believe the NAS appliance provides only v3 of nfs; the other appliances (like WEBxxx and LINUX) have only v3 clients installed.
I see no reason why v4 will not work (at least as well as it works outside of appliances). Have you had good experience with nfsv4?
If you want to use nfsv4, you will have to branch the NAS appliance and install it there; also, you will have to branch whatever appliances you will connect to it to add the nfsv4 client. We and others on the forums can help you if you hit any issues.
Please post here your experience - I am sure others will want to know as well. We can also consider adding nfsv4 in future versions of the catalog appliances if it is stable and performs well.
Thanks,
-- Peter
digerata
07-21-2008, 08:17 AM
Thanks for the info, Peter.
I actually have no experience with any version of NFS. We are moving an application that typically used GFS as a clustered filesystem. We need reliable, networked file locks, but we don't need the fiber channel that GFS provided. In addition, it has some administration overhead and other costs. We'd rather skip GFS in AppLogic's environment so NFS seemed to be the way to go. NFSv3 is not reliable with multiple clients accessing the same filesets. However, from the NFSv4 specs, it appears that v4 does provide the reliability now.
NFSv4 on *Solaris is stable. On Linux, that was a different story. In my research, I've found these links to help assure us that NFSv4 on Linux is now stable:
http://devresources.linux-foundation.org/dev/nfsv4/site/index.php
http://www.linux.com/feature/138453
http://www.vnunet.com/vnunet/news/2167303/linux-file-system-catches-nfsv4
And a good reference on NFS in general:
http://nfs.sourceforge.net/
I'll definitely let you know how it goes.
PeterNic
07-21-2008, 10:45 AM
digerata,
Thanks for the links!
If you don't care about the Unix-style permissions, you can also try the cifs support -- CIFS has proper concurrent access & locking semantics, similar to local file systems.
In any case, if you are interested in exploring NFSv4, please do; let us know if we can help.
Regards,
-- Peter
digerata
07-22-2008, 09:05 AM
To my understanding, CIFS locking is not reliable in the event of a network or server failure. This is due to the state of a client being maintained by the use of TCP connections. If the TCP connection is broken, state is lost. Here is a link that goes into more detail:
http://blogs.netapp.com/eislers_nfs_blog/2008/07/part-ii-since-n.html
(Of course, it is a NetApp employee, so the article might come with a bit of bias. However, the technical details seem to back it up.)
We aren't basing our decision on Unix-style permissions as our application is the only user accessing the filesystem. If you have seen other information about CIFS reliability, we would definitely consider it.
Thanks!
PeterNic
07-22-2008, 09:40 AM
digerata,
Yes, this is correct, vendor bias notwithstanding :)
I believe there is a short reconnect window but a locked file doesn't get stuck; server reboot totally releases all files. At the same time, the access is over a local connection so it shouldn't be dropping. If the server restarts or the TCP connection drops, the client gets a status code when disconnect happens and can work with it (e.g., rollback); you can also use lock "files", whose presence indicates a lock. Without knowing much about your app, it would be hard to suggest something that will work.
In any case, we're totally agnostic -- use one, the other, or both. Again, let me know if we can help in any way or if you get any results on NFSv4. You can also sign up for our beta if you prefer to use *Solaris, as AppLogic 2.3.9 introduced support for Solaris 10 and OpenSolaris.
Best regards,
-- Peter
digerata
07-22-2008, 10:18 AM
Ahh, yes, we are still not quite thinking the AppLogic way where the statement "The Network is the Computer" is actually true. (Obscure Sun marketing reference...) What you say makes sense.
We are running currently running Beta and are experimenting with the different OSes. If we go with NFS, we would likely run OpenSolaris as the NAS OS.
Thanks for the advice. I'll let you know how it goes.
digerata
07-31-2008, 03:18 PM
I'm just updating progress here. No need for a reply.
I'm surprised at the level of support between Linux and OpenSolaris implementations of NFSv4. So far, I have found that Linux is at least on par with OpenSolaris feature wise. Performance I haven't tested yet. We are interested in speedy deployment of our app right now and our expertise is Linux, so I'll be using Linux instead of OpenSolaris, for now. (Unless something comes up with NFS that requires OSOL)
I started out with the NAS appliance to see if I could simply "enable" nfsv4 on it and use the existing terminals and configuration. For whatever reason I could not determine, this does not work. After enabling a v4 mount on the NAS, I tried to mount the export from a LINUX64 client. The client simply hangs. Running the mount command from the server against localhost resulting in a hang as well. Luckily, it is not a complete freeze, Ctl+c gets out of the mess. I did a bunch of troubleshooting, including increasing the log level via a few: sysctl -w sunrpc.xxx_debug=1023, but nothing out of the ordinary popped up. No matter what, mount -t nfs4 would hang. A few valuable links I've found that helped:
http://wiki.linux-nfs.org/wiki/index.php/General_troubleshooting_recommendations
http://www.citi.umich.edu/projects/nfsv4/linux/using-nfsv4.html
http://www.vanemery.com/Linux/NFSv4/NFSv4-no-rpcsec.html
Frustrated, I yanked a Linux64 appliance off the shelf and quickly booted that up. A quick, mkdir /export ; chmod a+rwxt ; vi /etc/exports + fsid=0 and then a mount nfs4 showed that within minutes I *could* get NFSv4 working on a stock appliance! (Without even a single yum update)
Just a quick sanity check, I run: nfsstat. Sure enough, the only thing showing up is nfs v4.
Amazing!
(A testament to 3tera... How often can you say "yanked an appliance off the shelf and quickly booted it up" ? Yes, I have drank the Koolaid and it is tasty.)
digerata
07-31-2008, 03:43 PM
I've found the "Implementation Design" section of the appliance catalog invaluable for building my own appliances. However, looking at the page for the NAS device (in 2.3.9) there is 4 lines of summary and nothing like other appliances.
Is there some other place I can find more information on how the NAS appliance was built?
digerata
08-01-2008, 11:48 AM
I only ask for how the NAS appliance was built because I was trying to use that as a base for NFSv4. However, it appears that so much was left out of the image, it cannot be upgraded. The current NAS appliance throws an error when running yum update:
Setting up Update Process
Setting up Repo: base
Cannot find a valid baseurl for repo: base
I did branch it and also made the usr volume read/write. Any ideas?
PeterNic
08-01-2008, 12:16 PM
Digerata,
The more detailed implementation design and build notes sections are something we started doing more recently -- NAS is one of the oldest appliances in the catalog.
The reason it may not have all the bells and whistles is that it was most likely started from LUX (which is a liposucted version of the LINUX base appliance); also, if I remember correctly, it is based on Fedora Core 3 (see /etc/issue).
When trying to upgrade it:
- did you temporarily add a gateway output terminal and connect it to a NET gateway (so you can access the yum download server)?
- did you start it with more memory / resized the volumes to provide enough space for yum (again, once the appliance is created it is brutally trimmed down to what is necessary to operate, so if you want to modify it, you usually have to make room)?
Even with the two changes above, you may end up getting a very old nfsv4 -- the appliance is on Fedora Core 3.
I would definitely recommend starting the appliance off LUX5 (which is CentOS 5-based).
The NAS appliance essentially has:
nfs server
samba server
apache server (for http)
The appliance startup scripts configure these services to listen to the right interfaces, point the storage to the placeholder volume and that's pretty much all. If you need a NFSv4-only appliance, it may be much easier to create by looking at the startup scripts in the /appliance directory, and the nfs_load.sh script in particular.
If you need additional info on how to set it up -- be it a complete NAS or just NFSv4 -- let us know, we will be happy to help.
Best Regards,
-- Peter
digerata
08-05-2008, 12:15 PM
Thanks for the info, Peter.
I had missed adding an output terminal in the beginning. But I had problems with yum before getting to the network and I'm not a yum hacker so I just used a Linux64 as a starting point.
(I've been avoiding the LUX appliances because of the lack of build tools. However, I now have an appliance specifically for building binaries and RPMs so I may come back and redo everything as LUX. Those guys are SMALL images! Nice work there.)
I have a fully functioning NFSv4 server now. I copied over the appliance scripts from the NAS server and just removed the particulars for SMB and Apache. It turns out the process was very straightforward and only required 2 lines of configuration. I'll detail that more in a second.
After I got the server and clients talking, I ran some performance benchmarks. I setup a server for each of CIFS, NFSv3, and NFSv4. I also ran the tests from a server to a locally mounted volume to get a baseline. I used bonnie++ to generate the load. The CIFS and NFSv3 server was the standard NAS appliance. I configured all servers with 1 GB of ram, 500M bandwidth, and 1 CPU. The client and servers were created in the same AppLogic application and the application was running on one physical server. Both NFS mounts were exported with sync turned on. The bonnie++ parameters were:
bonnie++ -n 0 -u 0 -r 512 -s 4096 -f -b -d /mnt/local
And here are my results:
Version 1.03d ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
LOCAL 4G 38592 5 21835 0 62638 0 183.4 0
CIFS 4G 29232 0 17214 0 47509 0 136.8 0
NFS 4G 40356 4 14465 0 42868 0 265.9 0
NFSv4 4G 40143 3 20048 0 48825 0 204.3 0
These numbers are averaged over 4 runs.
I was quite surprised to find that both NFS versions were faster writing then LOCAL. At first I thought maybe NFS was exported with the async flag, but I double checked and I *did* specify sync. So I'm not sure how that could be. I'm open for suggestions on how to improve these benchmarks.
The setup for the NFSv4 appliance goes roughly like this (from memory, may miss something):
- Branch desired base appliance. Edit class -> add volume placeholder for data. Add terminal 'nfs4' with any protocol. (I believe, but hadn't verified that specifying nfs protocol would work)
- Create a data volume, add it to the appliance instance.
- Start appliance. Perform a `yum update`
- Copy over /appliance directory from NAS appliance. Also copy /etc/init.d/applogic_appliance from NAS appliance.
- Edit /etc/fstab, add mount point for the data directory (also, mkdir /mnt/data):
/dev/hda2 /mnt/data ext3 rw,acl
- Edit /etc/exports, add the NFSv4 specific export:
mnt/data *(rw,fsid=0,insecure,no_subtree_check)
Exports explanation:
* says any IP
fsid=0 is the trigger for NFSv4 exports
insecure says we are doing normal OS authentication (as opposed to the more complex Kerberos or GSS) (I think)
no_subtree_check only authenticates on the first request for / and no other sub directories.
- That's it. You can do a `service nfs restart` or exportfs -rv to make those changes active.
The client setup is just a one line change to /etc/fstab. Take any appliance that already supports mounting NFS and change the mount point to:
fs:/ /mnt/fs nfs4 rw,hard,intr,proto=tcp 0 0
You could probably leave off the proto=tcp parameter. Note that the mount point is fs:/ and not fs:/mnt/data. NFSv4 uses a "psuedo" filesystem mapping.
The quick check to see if your export or mount is using NFSv4 protocol is `nfsstat`. You should see a listing that contains "XXXXX nfs v4." where XXXXX is Server or Client.
Seeing that it is only a few lines of configuration, I would say it would be cool if future versions of the AppLogic NAS appliance could add support for this. Maybe a third terminal, nfsv4, or a property that allows you to specify which protocol you would like to use over the nfs terminal?
-Mike
digerata
08-06-2008, 08:29 AM
As I was setting up a second client, I noticed that I may have missed something. There is a service that needs to be started in the default LINUX64 client, rpcidmapd. To make things easier, here are a series of chkconfig commands that should make sure everything is running.
For the server: chkconfig --level 0123456 portmap off
chkconfig --level 345 portmap on
chkconfig --level 0123456 rpcidmapd off
chkconfig --level 345 rpcidmapd on
chkconfig --level 0123456 nfslock off
chkconfig --level 345 nfslock on
chkconfig --level 0123456 nfs off
chkconfig --level 345 nfs on
# For AppLogic, we aren't using the GSS service, shut it off
chkconfig --level 0123456 rpcgssd off
chkconfig --level 0123456 rpcsvcgssd off
For the client:chkconfig --level 0123456 portmap off
chkconfig --level 345 portmap on
chkconfig --level 0123456 rpcidmapd off
chkconfig --level 345 rpcidmapd on
chkconfig --level 0123456 nfslock off
chkconfig --level 0123456 nfs off
chkconfig --level 0123456 rpcgssd off
chkconfig --level 0123456 rpcsvcgssd off
Make sure you either restart the appliance or start/stop those services manually after running those commands and attempting to mount or export.
PeterNic
08-06-2008, 09:34 PM
Mike,
(I've been avoiding the LUX appliances because of the lack of build tools. However, I now have an appliance specifically for building binaries and RPMs so I may come back and redo everything as LUX. Those guys are SMALL images! Nice work there.)
Yup, that's the general idea -- build your rpms/binaries on, say, GSC/VPS which have all the gcc bells and whistles, then copy only the binaries on LUX. This works well for appliances that don't run user code -- gateways, load balancers, routers, NAS, even database servers. The only ones that may not work well with the cut-down images are things like the WEBxxx servers where you expect to see a lot of optional libraries driven by whatever php/perl code people put on; similarly, various Java/app servers probably should use LINUX-type, bigger base images (actually, the LINUX appliance is the "minimal" install -- we consider it the "fat" image) :). I am wondering if we have posted the image liposuction instructions for converting a LINUX image into LUX -- if there is interest, I am sure we have them somewhere.
Two fun facts:
- starting in AppLogic 2.3, we now support appliances with read-only boot volumes (similar to the LiveCD distros, except they are fully functional appliances, take property and network configuration as all other appliances)
- the smallest appliance ever was created by one of our users using Busybox and could run in 16 MB RAM, and had 12 MB boot volume!
I was quite surprised to find that both NFS versions were faster writing then LOCAL. ... So I'm not sure how that could be. I'm open for suggestions on how to improve these benchmarks.
The difference doesn't appear to be big, though. Sometimes simply by having another CPU to help can make this difference (since now your client runs in one appliance, the server in another).
I have seen slightly higher numbers for the LOCAL access -- depending on the hardware and some other issues (e.g., did you create the test volume with --prefill); if not, keep in mind that the first-pass write will have slightly longer time -- which your 4 tests average has probably eliminated sufficiently. I am glad to see that NFSv4 performs well; I hope it is also stable -- it has been around for a while so it should wipe out NFSv3, finally and forever (it doesn't hurt wishing).
We need to consider how to add NFSv4 in the future; my desire is to simply ditch NFSv3, and reconfigure the clients to use NFSv4 (I would hate to complicate things by adding properties or extra terminals). If we do it this way, it is bound to disappoint some and make others happy. My guess is that we'll add a v4 NAS appliance with some additional features (e.g., snapshot/backup/replication) while keeping the current v3 NAS appliance as well for backward compatibility. I am curious to know if there is a NFS mount with auto-version (I am almost sure there was such option, definitely worth looking into -- if we can make the new clients all autoadjust, we can make any new NASxxx appliance v4-only).
Thanks for posting the benchmarks and the detailed appliance configuration steps!!!
Best regards,
-- Peter
digerata
08-07-2008, 09:07 AM
Read-only boot volumes are great! We are already on 2.3.9, I didn't notice that I could check read only there, nice. Is there anything special we need to do besides that or does it automatically handle things like /var?
I didn't use the --prefill argument, but was using the web based volume tool when I created the volumes. I thought with how long it takes to create a volume, that it probably was doing a --prefill. I'll likely play around more with the benchmarks to see how fast we can get things.
I'm all for completely wiping out NFSv3 :) But at least one part of the client fstab will have to be modified, even if auto is allowed. NFSv4 changed the mount syntax to where fs:/mnt/data turns into fs:/. Bummer there for you.
Thanks for the input!
-Mike
PeterNic
08-07-2008, 12:58 PM
Read-only boot volumes are great! We are already on 2.3.9, I didn't notice that I could check read only there, nice. Is there anything special we need to do besides that or does it automatically handle things like /var?
No, you are on your own as far as building the appliance image itself is concerned :) (we will gladly help, of course)
It is not harder than building any read-only Linux distro -- and you might be able to use a Live CD image to start from. Our Linux filers do boot this way; you can do "vol manage" on some volume and then look around how we made the /var and /etc work (or drag-and-branch a filer and customize it).
I am not even willing to try building a Windows appliance with read-only boot volume, though :D
Thanks for the input!
You are welcome!
Regards,
-- Peter
vBulletin® v3.7.5, Copyright ©2000-2012, Jelsoft Enterprises Ltd.