PDA

View Full Version : Oversubscription of Network Bandwidth


bkonia
02-03-2009, 07:29 AM
I have installed hf2668 on my grid to enable oversubscription of network bandwidth. My reasoning was that I felt that enforcing bandwidth limits at the appliance level would create artificial bottlenecks. While there may be certain situations where it's beneficial to limit bandwidth at the appliance level, I believe that in most cases, it's better to simply allow each appliance to use however much bandwidth it needs.

On the other hand, I believe that it's highly beneficial to enforce bandwidth limits at the application level. Within a given application, all the appliances are working together to accomplish a common goal. Therefore, you want to allow bandwidth to flow to where it's needed most. However, by enabling bandwidth oversubscription, that opens the door to the possibility of one application hogging all the bandwidth, to the detriment of other applications running on the same grid. For example, I have one application that does streaming video and I can easily envision this application sucking up all the bandwidth on the grid.

The bottom line is that I would like to have the ability to limit total bandwidth usage of an application, while at the same time, allowing bandwidth to flow unrestricted within each application. Additionally, it would be nice to be able to selectively enforce bandwidth limits on certain appliances, while leaving other appliances within the same application, unrestricted. For example, my video streaming application includes a Flash Media Server appliance. I might want to limit this appliance to a certain maximum bandwidth to ensure that the FMS doesn't suck up all the bandwidth and choke off the web server. However, I would want to allow traffic to flow freely between the web server, the NAS and the MySQL server.

As I'm writing this, I'm thinking that perhaps the real problem is that AppLogic allocates the bandwidth whether you use it or not. So, if you allocate 100M bandwidth for an appliance, that amount gets deducted from your total available bandwidth. Then you run into situations where applications fail to start because there's not enough bandwidth resources available on the grid. So you then have to go around to each appliance and manually reduce its bandwidth allocation. Additionally, you're forced to make arbitrary choices of how bandwidth is allocated between appliances that act as a group, such as my example of Web+NAS+MySQL. In this example, even if you really knew for sure that the NAS required more bandwidth than MySQL, there would certainly be times when MySQL would be busier than NAS and due to the arbitrary bandwidth limit on MySQL, overall performance of the application would suffer.

Given all of the above considerations, I think the best solution would be to have the ability to limit maximum bandwidth at both the appliance level and at the application level, while allowing actual bandwidth usage to adjust dynamically, so that appliances experiencing peak loads could use the extra bandwidth as long as they didn't exceed their max. As a practical matter, this would mean that AppLogic would no longer enforce a total bandwidth limit for the grid as a whole. You could enter 1GB bandwidth for 100 different appliances and the only effect would be that each appliance would be limited to 1GB. The applications would still start even though the amount of bandwidth allocated would greatly exceed what the grid can handle. Of course, I realize that other users may think differently and would want the total bandwidth limits enforced. Therefore, I would suggest creating a preference setting in the grid configuration to enforce or not enforce total bandwidth usage.

digerata
02-03-2009, 10:26 AM
I really like the idea of application level bandwidth. We are constantly over-thinking the connections between each appliance and how much bandwidth each theoretically uses. Additionally, there is no easy method to evaluate and re-evaluate your bandwidth consumption versus allocated at the appliance level. (Sure, there is the monitor graphs to chart bandwidth usage of a terminal, but it isn't ideally suited for this purpose) So bandwidth at the appliance level is often a shot in the dark.

However, I think the maximum bandwidth value should still be enforced at that the application level. (i.e., allocating bandwidth max bandwidth at application start) In the same way you want to keep appliance level restrictions so as to prevent one appliance from swamping the others, you don't want one application to swamp others on the same server.

-Mike

PeterNic
02-03-2009, 10:18 PM
Brad, Mike,

Interesting discussion. We've been getting questions on this and I have had a few discussions with other customers who expressed desire for advanced bandwidth constraints.

It looks like we have two approaches that can be combined:

1. Bandwidth range (needless to say, this can be applied to other types of resources)

Being able to define two parameters, not one: a "reserve" bandwidth and a "cap" bandwidth. The "reserve" is guaranteed, so the entity will never have less than the reserve available (so no matter what else is running wild out there, this entity will have at least that much). The "cap" is what this entity will not be allowed to exceed. This way, you can get adaptive bandwidth up to the cap, and a guaranteed amount. The only problem with this I have is that instead of having to divine one number, you now need to come up with two. There are some tools we could add that may help figure out what those values should be -- not unlike the way speed limits are determined on streets, essentially measuring within the range and coming up with proposed values.

An alternative requiring less effort configuring would be to have a grid-wide parameter of "bandwidth oversubscription" ratio -- say 2x, 3x, or even 10x. This way you only configure one value; and you know what the oversubscription value is.

It may be that enterprise scenarios, the first mechanism would work better; in hosting, the second (just a guess).

2. Per-appliance vs. per-application allocation

The per-application specification and enforcement of bandwidth is likely going to be a good way to simplify the assignment of resources. What concerns me a bit is that if you have a perfectly working and tuned app, and you add one more function -- say a MON appliance or a another web server -- then keeping the app in the same tuned shape may be harder.

---

As a side question, do you think that using priority levels instead of actual bandwidth values would work better for your use cases? For example, if App A has priority Pa and app B has priority Pb, then at full blast, apps A and B would get Pa:Pb ratio of the available bandwidth.


Cheers,
-- Peter

bkonia
02-04-2009, 08:10 AM
Being able to define two parameters, not one: a "reserve" bandwidth and a "cap" bandwidth. The "reserve" is guaranteed, so the entity will never have less than the reserve available (so no matter what else is running wild out there, this entity will have at least that much). The "cap" is what this entity will not be allowed to exceed. This way, you can get adaptive bandwidth up to the cap, and a guaranteed amount. The only problem with this I have is that instead of having to divine one number, you now need to come up with two.


I like this approach, provided that:

I could set the reserve bandwidth to zero
The reserve bandwidth would be used when calculating how much total bandwidth is left on the grid, not the cap bandwidth.
The cap bandwidth setting would be optional


In other words, let's say I had 5 applications, each one with 20 appliances. I could set reserve bandwidth to zero and cap bandwidth to 1GB for all 100 appliances. In this scenario, no appliance would be guaranteed bandwidth, but no appliance could use more than 1GB. In the AppLogic dashboard, it would show that zero bandwidth had been allocated on the grid.

This would make it very easy to configure new appliances and applications because you would not be assigning arbitrary amounts of bandwidth. Later on, if you notice that one particular appliance is not getting enough bandwidth, you could go back and fine tune it by giving some reserved bandwidth to that appliance.


An alternative requiring less effort configuring would be to have a grid-wide parameter of "bandwidth oversubscription" ratio -- say 2x, 3x, or even 10x. This way you only configure one value; and you know what the oversubscription value is.

I don't like the ratio approach at all. I feel that this approach would only complicate matters, because it adds an additional layer of calculations to convert from actual bandwidth to ratios and vice versa. Perhaps it would be simpler to configure, but it would be much more unwieldy to manage and fine tune. If I know that a particular appliance needs 1GB of bandwidth, I want to be able to assign exactly what it needs. I don't want to have to figure out what the assigned value will translate into after some ratio is applied.

The per-application specification and enforcement of bandwidth is likely going to be a good way to simplify the assignment of resources. What concerns me a bit is that if you have a perfectly working and tuned app, and you add one more function -- say a MON appliance or a another web server -- then keeping the app in the same tuned shape may be harder.

Not sure what you mean. Let's say you have an application with 1GB of bandwidth assigned to it. Now you add a new appliance. The application still has 1GB of bandwidth, so how does adding the appliance change anything? Why would this mess up the app? It seems to me that the whole benefit of managing bandwidth at the application level is that you no longer have to worry about the bandwidth allocations to individual appliances within the application. If you increase the application bandwidth by 10%, then that 10% increase becomes uniformly available to all the appliances. Now it may be that one appliance is a bandwidth hog and it sucks up the 10% increase, but then you would still have the option to apply the cap bandwidth at the appliance level.

Ideally, I would like to approach bandwidth configuration as follows:


Set reserve bandwidth to zero for all applications and for all appliances
Set a bandwidth cap for each of your applications
If you notice that certain appliances are bandwidth hogs, send bandwidth caps for those appliances
If you notice that a certain application is not getting enough bandwidth, assign some reserved bandwidth to that application
If you notice that a certain appliance is not getting enough bandwidth, assign some reserved bandwidth to that appliance


This way, you start out with the least restrictive configuration and gradually move to a more restrictive configuration after observing how the applications and appliances perform. This would avoid creating artificial bottlenecks, since you're only restricting bandwidth AFTER you observe that a restriction is needed.


As a side question, do you think that using priority levels instead of actual bandwidth values would work better for your use cases? For example, if App A has priority Pa and app B has priority Pb, then at full blast, apps A and B would get Pa:Pb ratio of the available bandwidth.

I don't like priority levels for the same reason that I don't like ratios. Priorities and ratios are derivative measurements that are one step removed from the actual measurement. While this may simplify configuration for some people, I feel that it makes it more difficult to fine tune your configuration to get it exactly how you want it. Sorry, I guess I'm a control freak!