View Full Version : Possible I/O issues
Yazan
03-02-2008, 01:14 AM
Hey guys, first time poster so let me know if this is in the right place.
I'm running a big game server community that has about 1600 users online at a time, but as soon as I put it on the grid, I've started to notice lag spikes. It happens a few times a day and would last for about 1 or 2 seconds. I think it might be an I/O issue, please let me know what you think.
Grid:
5 x Dell 2900 with dual 5410, 8 GB RAM dual 320 GB SATA II
10 minute I/O activity of the database:
| Key_read_requests | 2030328 |
| Key_reads | 166540 |
| Key_write_requests | 313060 |
| Key_writes | 205639 |
Do you think RAID will solve this matter? Or maybe getting an external server with RAID SCSI setup to host the database?
If anybody has had a similar experience or may have an idea how to solve the issue, I'm all ears.
Thank you for your time,
Yazan.
PeterNic
03-02-2008, 03:55 AM
Yazan, is there a particular time of day it happens? The grid does run several periodic checks (e.g., the volume check, ran every 6 hours); none of these should impact performance but it is worth checking. You can find the grid time with the 'grid info -v' command.
Are all software versions the same on and off grid? (incl. firewalls, web servers/apps servers, version of MySQL, etc.)?
Another thing you can do -- set up monitoring dashboard for the application. Watch traffic coming in and going out of appliances (if your application consists of multiple appliances) -- this way you can narrow down the first appliance in which the problem appears.
Finally, it appears you are a grid maintainer; you may want to check the server HDDs for possible seek or read failures; if you see large number of seek failures or other errors, this may indicate a disk is going bad. Retries -- esp. seek errors -- may take 1-2 secs to recover from. (Very unlikely but easy to check -- you can posts questions on this on the grid maintainers forum.)
Regards,
-- Peter
Yazan
03-02-2008, 08:17 PM
Thanks for the reply Peter.
How can I check if the HDD's are failing on the grid?
Regards,
Yazan.
PeterNic
03-02-2008, 09:23 PM
Yazan,
I'll post the command for checking the HDD shortly, will cross-link it to the maintainer's forum. If you are seeing the same effect on many servers (and the volumes do reside on various servers, keep in mind AppLogic mirrors always on at least 2, by default 2), then it is unlikely.
Another thing that you should do if your application is very time sensitive and keeps writing data -- prefill the data volume(s) when you create them. Use the --prefill option on 'volume create' (http://doc.3tera.net/AppLogic2/CliVolume.html#AnchorCreate). An alternative way to do this which is easier then you already have the volume with the data is to simply copy it (assuming volume "data" in application "myapp"):
app stop myapp
vol copy myapp:data myapp:data2
vol rename myapp:data olddata
vol rename myapp:data2 data
app start myapp
(you can later delete the old volume, 'vol destroy myapp:olddata')
The copy process does the same as --prefill during create (just don't specify the --fscpy option).
Prefilling the volume ensures that all blocks of the new volume are allocated when the volume is created, rather then when written to. This ensures more uniform write times for latency-sensitive volumes.
Regards,
-- Peter
PeterNic
03-02-2008, 09:35 PM
Yazan, please also see http://support.3tera.net/showthread.php?p=754#post754 for details on how to check for bad HDD.
vBulletin® v3.7.5, Copyright ©2000-2012, Jelsoft Enterprises Ltd.