tmart
06-19-2009, 06:43 AM
As I understand the behavior of the AppLogic scheduler, it seems to support something of a "backpack" dynamic programming model wherein the scheduler attempts to fill servers with components in order to leave sufficient resources for future large components. While this can be a good thing when we don't know what we will need to startup tomorrow, it creates resource contention and works against spreading I/O load across the grid. Usually we size grids based upon our knowledge of the largest applications.
As an example, if we have 4 physical servers and 10 virtual appliances total - but each with 2 volumes, the volumes (including mirrors) will be spread across the physical servers fairly evenly (I think), but all 10 virtual appliances will be started on the minimum number of physical servers (presumably to leave full server resources available for larger resource hogs in the future). If all virtual appliances can fit onto one physical server, then one physical server will be drawing from 20 volumes distributed across all 4 physical servers and the three "unused" servers are doing nothing except serving volume I/O.
There are at least a couple of problems that I see with this scheduler bias:
High availability -- "failover groups" aside, we have a hugely asymmetric risk with respect to server loss.
I/O contention -- setting inter-virtual appliance communication aside, the network contention for I/O seems too high. All 20 volumes streaming into one physical server.
I might suggest building several models for the scheduler and allowing these to be user-selectable. It might make sense to allow these to be selectable at the application level - or at the grid level... but I could even see a use for physical servers to be built into scheduler pools. Here are some models that you might consider:
one that provides the current "backpack" dynamic programming model.
Another model that attempts to minimize risk due to single server loss.
Perhaps another model that is biased for providing small clusters of communication (eg. IN -> Apache) that do not need to consume traffic on the physical NIC if the two components are scheduled to run on the same server.
On grids running multiple applications, a single application will tend to be consolidated on the same physical servers because the components will have (normally) been started up within a relatively small window of time. This means that application start & stop times will be impacted in a bad way (everything starting on the same server), perhaps causing timeouts or ill effects of already-running virtual appliances on the same physical server.
Perhaps these scheduler models could be somehow engaged by providing "hints" in a way similar to the current "failover groups" properties, but perhaps at the application level as well... and maybe also at the grid level, as described above.
Of course, we always need a means to relax any hard constraints induced by scheduler bias, similar to how we can now override failover groups resource constraints.
As an example, if we have 4 physical servers and 10 virtual appliances total - but each with 2 volumes, the volumes (including mirrors) will be spread across the physical servers fairly evenly (I think), but all 10 virtual appliances will be started on the minimum number of physical servers (presumably to leave full server resources available for larger resource hogs in the future). If all virtual appliances can fit onto one physical server, then one physical server will be drawing from 20 volumes distributed across all 4 physical servers and the three "unused" servers are doing nothing except serving volume I/O.
There are at least a couple of problems that I see with this scheduler bias:
High availability -- "failover groups" aside, we have a hugely asymmetric risk with respect to server loss.
I/O contention -- setting inter-virtual appliance communication aside, the network contention for I/O seems too high. All 20 volumes streaming into one physical server.
I might suggest building several models for the scheduler and allowing these to be user-selectable. It might make sense to allow these to be selectable at the application level - or at the grid level... but I could even see a use for physical servers to be built into scheduler pools. Here are some models that you might consider:
one that provides the current "backpack" dynamic programming model.
Another model that attempts to minimize risk due to single server loss.
Perhaps another model that is biased for providing small clusters of communication (eg. IN -> Apache) that do not need to consume traffic on the physical NIC if the two components are scheduled to run on the same server.
On grids running multiple applications, a single application will tend to be consolidated on the same physical servers because the components will have (normally) been started up within a relatively small window of time. This means that application start & stop times will be impacted in a bad way (everything starting on the same server), perhaps causing timeouts or ill effects of already-running virtual appliances on the same physical server.
Perhaps these scheduler models could be somehow engaged by providing "hints" in a way similar to the current "failover groups" properties, but perhaps at the application level as well... and maybe also at the grid level, as described above.
Of course, we always need a means to relax any hard constraints induced by scheduler bias, similar to how we can now override failover groups resource constraints.