=================================
Allocation for Partitioned Ganeti
=================================

:Created: 2015-Jan-22
:Status: Implemented
:Ganeti-Version: 2.15.0

.. contents:: :depth: 4


Current state and shortcomings
==============================

The introduction of :doc:`design-partitioned` allowed to
dedicate resources, in particular storage, exclusively to
an instance. The advantage is that such instances have
guaranteed latency that is not affected by other
instances. Typically, those instances are created once
and never moved. Also, typically large chunks (full, half,
or quarter) of a node are handed out to individual
partitioned instances.

Ganeti's allocation strategy is to keep the cluster as
balanced as possible. In particular, as long as empty nodes
are available, new instances, regardless of their size,
will be placed there. Therefore, if a couple of small
instances are placed on the cluster first, it will no longer
be possible to place a big instance on the cluster despite
the total usage of the cluster being low.


Proposed changes
================

We propose to change the allocation strategy of hail for
node groups that have the ``exclusive_storage`` flag set,
as detailed below; nothing will be changed for non-exclusive
node groups. The new strategy will try to keep the cluster
as available for new instances as possible.

Dedicated Allocation Metric
---------------------------

The instance policy is a set of intervals in which the resources
of the instance have to be. Typical choices for dedicated clusters
have disjoint intervals with the same monotonicity in every dimension.
In this case, the order is obvious. In order to make it well-defined
in every case, we specify that we sort the intervals by the lower
bound of the disk size. This is motivated by the fact that disk is
the most critical aspect of partitioned Ganeti.

For a node the *allocation vector* is the vector of, for each
instance policy interval in decreasing order, the number of
instances minimally compliant with that interval that still
can be placed on that node. For the drbd template, it is assumed
that all newly placed instances have new secondaries.

The *lost-allocations vector* for an instance on a node is the
difference of the allocation vectors for that node before and
after placing that instance on that node. Lost-allocation vectors
are ordered lexicographically, i.e., a loss of an allocation
larger instance size dominates loss of allocations of smaller
instance sizes.

If allocating in a node group with ``exclusive_storage`` set
to true, hail will try to minimise the pair of the lost-allocations
vector and the remaining disk space on the node after, ordered
lexicographically.

Example
-------

Consider the already mentioned scenario were only full, half, and quarter
nodes are given to instances. Here, for the placement of a
quarter-node--sized instance we would prefer a three-quarter-filled node (lost
allocations: 0, 0, 1 and no left overs) over a quarter-filled node (lost
allocations: 0, 0, 1 and half a node left over)
over a half-filled node (lost allocations: 0, 1, 1) over an empty
node (lost allocations: 1, 1, 1). A half-node sized instance, however,
would prefer a half-filled node (lost allocations: 0, 1, 2 and no left-overs)
over a quarter-filled node (lost allocations: 0, 1, 2 and a quarter node left
over) over an empty node (lost allocations: 1, 1, 2).

Note that the presence of additional policy intervals affects the preferences
of instances of other sizes as well. This is by design, as additional available
instance sizes make additional remaining node sizes attractive. If, in the
given example, we would also allow three-quarter-node--sized instances, for
a quarter-node--sized instance it would now be better to be placed on a
half-full node (lost allocations: 0, 0, 1, 1) than on a quarter-filled
node (lost allocations: 0, 1, 0, 1).