DC/OS Multi-tenant Resource Isolation in DC/OS 1.10 and 1.11

Purpose

This document details various options available in DC/OS 1.10.x and 1.11.x to support isolating multiple tenant’s workloads on a single DC/OS cluster. Note that the general theme of the 1.12 is “Multi-tenancy”, so all of this is subject to change in 1.12. More specifically, many new options regarding this topic may become available.


Background

Apache Mesos

DC/OS is meant to be a single operating system for a large, multi-node cluster. It’s built aroun the open source project Apache Mesos, which acts as the kernel of the cluster.

DC/OS Core Framework

A standard installation of DC/OS comes with two built-in frameworks, which register with the Apache Mesos cluster that is the central components of the DC/OS cluster.

DC/OS Catalog Frameworks (SDK Services)

In addition the core (root) Marathon and Metronome services, certain packages from the DC/OS Catalog will install additional frameworks. For example:

Many of these services were built by a common framework building mechanism, called the DC/OS Service SDK or DC/OS Commons. For the remainder of this document, these will be referred to as SDK services.

Of note, while these frameworks will register with Apache Mesos and request resources with a given role (for example, kafka-role), the actual process that the framework itself is running on is run as a Marathon service. So you have this high-level architecture:

DC/OS Frameworks: Core + Catalog

So, when you have a DC/OS cluster and have installed several SDK services (such as Kafka and/or Elastic), you end up with a handful of Mesos frameworks, all competing for resources. Mesos will try to allocate resources approximately fairly to all of the framework.

Once the SDK services have all of the resources they need, they’ll tell the Mesos cluster that they don’t need any more resources, and the remainder of resources will be allocated to any frameworks still asking for resources.

Once all of the SDK services are complete, Mesos will continue to send resources to the remaining frameworks that are requesting offers. For the purposes of this document, this is primarily Marathon.

The Issue

By default, there is only one Marathon running on a DC/OS cluster; all users of the cluster are able to submit whatever applications and services they would like to Marathon to run, and a given Marathon instance has no inherent prioritization of resources among its users.

Any user who has access to the core Marathon can therefore submit marathon app manifests, and Marathon will tell Mesos it’s looking for resources, and Mesos will essentially give Marathon all of its available resources.

This leads to a lack of granularity and segmentation of the cluster. You can specify Mesos attributes on specific nodes, and attribute constraints on individual Marathon apps, such that the only place Marathon will place instances of those apps is on nodes that meet the given constraints, but this will not prevent Marathon from placing unconstrained apps on those nodes.

For example, if a cluster the following:

Then we could specify that app /baremetal-app must always run on a baremetal node, but there’s no (simple) way to reserve the expensive baremetal for only a certain class of apps with the core Marathon, or the ensure that other applications or users stay on the virtual machine nodes.


Resource Isolation - Multiple Frameworks

So how do we ensure that certain workloads (for example, prod), are guaranteed a certain set of resources? At a high level, there are a two tools we can combine to achieve this:

Marathon-on-Marathon (MoM)

You can spin up additional Marathon frameworks, and have the actual framework process run on the root Marathon. This is called “Marathon-on-Marathon”, or MoM.

In the DC/OS Catalog, there is a package called “Marathon”; this can be used to install an OSS MoM. While starting up an OSS MoM, make sure to customize the following settings:

One MoM is up and running, you can access the Marathon interface by clicking on the “Open Service” link next to the service, or by navigating to https://<dcos-url>/services/marathon-prod.

In addition to the OSS MoM, DC/OS users who have an enterprise license with Mesosphere can use the Enterprise Edition of Marathon-on-Marathon, which adds the following capabilities:

Installing OSS Marathon-on-Marathon

Use the directions here to install a Marathon-on-Marathon instance.

Installing Enterprise MoM

Use the directions here to install n Enterprise Marathon-on-Marathon instance.

Configuring Access to Enterprise MoM

Use the directions here to enable users to access Enterprise Marathon-on-Marathon

Configuring Access within Enterprise MoM

Use the directions here to configure access to paths within Enterprise Marathon-on-Marathon instance.

Splitting up Cluster Resources

There are several options available to us now (in DC/OS 1.10.x and 1.11.x, which correspond to Apache Mesos 1.4.x and 1.5.x, respectively):

Current Limitations

There are many features that are potentially on the roadmap for DC/OS and Apache Mesos. This is a non-authoritative and non-exhaustive list of features that may come in the future.

Please contact Mesosphere for official roadmap and release timeline.

Multi-role frameworks

Currently, Mesos frameworks can be configured to support multiple roles. For example, framework X could be designed to support roles A and B. Unfortunately, Marathon (the primary framework used in DC/OS) does not yet support multiple roles. See JIRA (Marathon-2775)[https://jira.mesosphere.com/browse/MARATHON-2775].

Quota Minimums and Maximums

Currently, Apache Mesos enforces a quota as both a minimum and a maximum. For example, if role prod is configured with a quota of 100 CPU cores, then prod will experience two behaviors:

In the future, these may be configurable on a separate basis. For example, for a given role X, we could set a guarantee of 100 CPU cores and a limit of 200 CPU cores.

Hierarchical Reservations

Apache Mesos currently supports hierarchical roles (i.e., refinement of a portion of a given reservation for role X into reservations for children roles X/A and X/B). However, this currently has limited value; in the DC/OS sphere is currently utilized only by the Kubernetes and Edge-LB Catalog services (which are built on the SDK). More importantly, Marathon does not currently support hierarchical roles.

Hierarchical Quotas

In addition to the above limitation regarding Framework support of hierarchical roles, quotas cannot currently be assigned to hierarchical roles.

Multi-role reservations

In the future, it may be possible to configure a given reservation such that it supports multiple roles. This may take one or more of several forms, which have not yet been fully defined:

Revocable Reservation

In the future, Apache Mesos may support a set of quotas and/or reservations for a set of roles such that resources currently in use for one role may be pre-empted or revoked by another role. For example, a task using the dev may be paused and/or killed in favor of a higher-priority prod role.


Load Balancing / Ingress

In addition to the above discussion about configuring resource allocation for services and tasks running in DC/OS, services often also have to be exposed to end-users. In a microservices architecture, automatic service discovery is important to provide a consistent endpoint for users and clients to access.

In DC/OS, there are two primary ingress mechanisms:

Here are the key differences (there are many others): * Marathon-LB can only talk to one instance of Marathon. So if you have multiple instance of Marathon, then you need multiple Marathon-LBs. * Marathon-LB listens on a port on the host. You can’t, for example, have multiple Marathon-LB instances all listening on ports 80 and 443 on the same node.

Configuring Marathon-LB with MoM

Use the directions here to configure Marathon-LB to work with MoM

Configuring Edge-LB with MoM

Use the directions here to configure Edge-LB to work with MoM