MMCloud Architecture

Design

MMCloud enables users to run containerized applications on virtual machines leased from Cloud Service Providers. The virtual machines are usually Spot Instances; they can also be On-demand Instances (this is a per-job configuration option). All functions are controlled by the MMCloud Operations Center (OpCenter), which receives CLI commands from users (clients) and manages resources in the cloud, as shown in MMCloud Architecture.

Components

The MMCloud architecture includes the following components.

Users (Clients)
Using the MMCloud web interface or the MMCloud CLI, clients interact with the MMCloud OpCenter.
MMCloud OpCenter
This provides the core functionality that allows MMCloud to marshal resources for starting workloads and to migrate workloads if needed. If the OpCenter is not running, currently executing jobs continue (but are not migrated) and new jobs are not scheduled.
Application Library
A container image registry is a service for hosting and distributing images. The default registry for Docker images is Docker Hub. A repository is a collection of images within a registry (one registry hosts many repositories). A private repository requires a username and access token to post or retrieve images; a public repository does not. The Application Library contains a database of information for accessing container images in various repositories (public and private).
Worker Nodes
These are the compute engines provided by virtual machines running in the Cloud Service Provider's network. Worker nodes may have locally-mounted file systems and attached storage. On-demand Instances or Spot Instances can be used as worker nodes.

Workflow

The operation of MMCloud proceeds as follows. A client submits a job to the OpCenter, using CLI command options to select a container image from the Application Library and to specify the compute resources needed (the web interface generates the CLI command line automatically). The OpCenter uses this information to orchestrate the necessary resources in the Cloud Service Provider's network and schedules the job for execution. The cloud resources always include a compute node and may include block storage and file systems as well. One or more data sets usually accompany a job. The user-provided job script describes how these data sets are accessed, for example, by loading data from an AWS S3 bucket. The job script also describes where the output is placed — results are usually written to a persistent file system. When the job has run to completion, the user retrieves the results.

Workload Continuity

Using the AppCapsule feature, OpCenter automatically moves a job running on a Spot Instance to a new Spot Instance if the first Spot Instance is reclaimed, as illustrated in MMCloud Operation.

In the example shown, the job starts executing on Spot Instance A. If the Cloud Service Provider signals that it intends to reclaim Spot Instance A, the OpCenter triggers the AppCapsule feature to capture the state of the running job and export the checkpoint image to persistent storage. A new Spot Instance is started (Spot Instance B in this case), the checkpoint image is imported from persistent storage, and the job resumes execution.

The job continues to run in this manner until completion, at which point the user retrieves the final results.

Workload Mobility

For jobs running on Spot Instances, job migration occurs automatically if the Spot Instance is reclaimed. The new Spot Instance is usually of the same type as the one that was reclaimed although the user can set a policy so that the new instance is different, for example, an On-demand Instance or a Spot Instance of a different size.

Using the CLI or the web interface, you can manually migrate a job from one virtual machine to another, for example, from a Spot Instance to an On-demand Instance of a different type. Using CLI command options, you can specify the new On-demand Instance by the instance type (for example, c6xi.large in AWS) or by specifying ranges of memory size and number of virtual CPUs.

Workload mobility is useful for rightsizing compute platforms: jobs can pass through several execution stages where the compute requirements are different; for example, one stage might be memory-intensive whereas another stage might be compute-intensive. Workload mobility moves the job from one compute platform to another as the resource demands change.

Job migration can be initiated in three ways:

Manually, using CLI commands or the OpCenter web interface.
Automatically, using a rules-based policy driven by resource utilization, for example, CPU or memory utilization thresholds.
Programatically, by inserting float migrate commands at breakpoints specified in the job script, for example, after loading data.

Licensing

MemVerge maintains a portal where you obtain licenses for MMCloud. Subscribers to MMCloud choose from the following license plans.

MMCloud Pro: Monthly charges are based on the number of CPU core-hours used.
MMCloud Enterprise: All the features included in the Pro license plus professional support and custom SLAs. Charges are not usage-based — charges are based on a contract between MemVerge and the account holder.

When you log in to the OpCenter web interface the first time, you click on a button to apply the license to your OpCenter instance. You can also apply the license using the CLI.