Marathon

A container orchestration platform for Mesos and DC/OS

Download Marathon v1.8.222

v1.8.222 SHA Checksum · v1.8.222 Release Notes

Overview

Marathon is a production-grade container orchestration platform for Mesosphere’s Datacenter Operating System (DC/OS) and Apache Mesos.

Features

  • High Availability. Marathon runs as an active/passive cluster with leader election for 100% uptime.
  • Multiple container runtimes. Marathon has first-class support for both Mesos containers (using cgroups) and Docker.
  • Stateful apps. Marathon can bind persistent storage volumes to your application. You can run databases like MySQL and Postgres, and have storage accounted for by Mesos.
  • Beautiful and powerful UI.
  • Constraints. These allow to e.g. place only one instance of an application per rack, node, etc.
  • Service Discovery & Load Balancing. Several methods available.
  • Health Checks. Evaluate your application’s health using HTTP or TCP checks.
  • Event Subscription. Supply an HTTP endpoint to receive notifications - for example to integrate with an external load balancer.
  • Metrics. Query them at /metrics in JSON format, push them to systems like Graphite, StatsD and DataDog, or scrape them using Prometheus.
  • Deprecated Metrics. Query them at /metrics in JSON format, or push them to systems like Graphite, StatsD and DataDog.
  • Complete REST API for easy integration and scriptability.

DC/OS features

Running on DC/OS, Marathon gains the following additional features:

  • Virtual IP routing. Allocate a dedicated, virtual address to your app. Your app is now reachable anywhere in the cluster, wherever it might be scheduled. Load balancing and rerouting around failures are done automatically.
  • Authorization (DC/OS Enterprise Edition only). True multitenancy with each user or group having access to their own applications and groups.

Examples

Marathon orchestrates both apps and frameworks

The graphic below shows how Marathon runs on Apache Mesos acting as the orchestrator for other applications and services.

Marathon is the first framework to be launched, running directly alongside Mesos. This means the Marathon scheduler processes are started directly using init, upstart, or a similar tool.

Marathon is a powerful way to run other Mesos frameworks: in this case, Chronos. Marathon launches two instances of the Chronos scheduler using the Docker image mesosphere/chronos. The Chronos instances appear in orange on the top row.

If either of the two Chronos containers fails for any reason, then Marathon will restart them on another agent. This approach ensures that two Chronos processes are always running.

Since Chronos itself is a framework and receives resource offers, it can start tasks on Mesos. In the use case below, Chronos is running two scheduled jobs, shown in blue. One dumps a production MySQL database to S3, while another sends an email newsletter to all customers via Rake.

Meanwhile, Marathon also runs the other application containers - either Docker or Mesos - that make up our website: JBoss servers, Jetty, Sinatra, Rails, and so on.

We have shown that Marathon is responsible for running other frameworks, helps them maintain 100% uptime, and coexists with them creating workloads in Mesos.

Scaling and fault recovery

The next three images illustrate scaling and container placement.

Below we see Marathon running three applications, each scaled to a different number of containers: Search (1), Jetty (3), and Rails (5).

As the website gains traction, we decide to scale out the Search service and our Rails-based application.

We use the Marathon REST API call to to add more instances. Marathon will take care of placing the new containers on machines with spare capacity, honoring the constraints we previously set. We can see the containers are dynamically placed:

Finally, imagine that one of the datacenter workers trips over a power cord and a server is unplugged. No problem for Marathon: it moves the affected Search and Rails containers to a node that has spare capacity. Marathon has maintained our uptime in the face of machine failure.