MMCloud Overview
What is Memory Machine™ Cloud?
Memory Machine Cloud (MMCloud) is a software platform that streamlines the way containerized applications are deployed in the cloud or in a hybrid cloud environment.
Based on a customizable policy, MMCloud selects and instantiates cloud resources on behalf of the user. MMCloud has a built-in job scheduler so users can deploy Docker containers (and other containers that comply with the Open Container Initiative image format specification) across a group of virtual machines.
MMCloud includes AppCapsule, MemVerge's checkpoint/restore (C/R) capability. The AppCapsule is a moment-in-time snapshot of the application instance, including in-memory state and relevant files. AppCapsule is used to support workload mobility and workload continuity. Workload mobility means that a job can move from one virtual machine to another, for example, to a more powerful virtual machine that is a better fit for the next stage of execution. Workload mobility also provides high availability — if the underlying virtual machine is reclaimed, the workload automatically moves to a new virtual machine and resumes running.
Most of the time, hyperscale Cloud Service Providers (CSPs) have excess virtual machine capacity which they offer as Spot Instances at varying discounts — the average discount is around 80%. The trade-off is that any Spot Instance can be reclaimed with only nominal warning (typically, two minutes or less). MMCloud’s AppCapsule feature is triggered automatically when the CSP signals that it is reclaiming the Spot Instance. Job execution pauses and then resumes on a new Spot Instance, allowing users to take advantage of the reduced cost of Spot Instances without the risk of losing intermediate results before the job has run to completion.
MMCloud Architecture
MMCloud has several components that interact with cloud services to ensure mobility for submitted jobs so that, for example, jobs run to completion on Spot Instances.
Design
MMCloud enables users to run containerized applications on virtual machines leased from Cloud Service Providers. The virtual machines are usually Spot Instances; they can also be On-demand Instances (this is a per-job configuration option). All functions are controlled by the MMCloud Operations Center (OpCenter), which receives CLI commands from users (clients) and manages resources in the cloud, as shown in the figure.
Components
The MMCloud architecture includes the following components.
- Users (Clients)
Using the MMCloud web interface or the MMCloud CLI, clients interact with the MMCloud OpCenter. - MMCloud OpCenter
This provides the core functionality that allows MMCloud to marshal resources for starting workloads and to migrate workloads if needed. If the OpCenter stops running, currently executing jobs continue (but are not migrated) and new jobs are not scheduled. - Application Library
A container image registry is a service for hosting and distributing images. The default registry for Docker images is Docker Hub. A repository is a collection of images within a registry (one registry hosts many repositories). A private repository requires a username and access token to post or retrieve images; a public repository does not. The Application Library contains a database of information for accessing container images in various repositories (public and private). - Worker Nodes
These are the compute engines provided by virtual machines running in the Cloud Service Provider's network. Worker nodes may have locally-mounted file systems and attached storage. On-demand Instances or Spot Instances can be used as worker nodes.
Workflow
The operation of MMCloud proceeds as follows. A client submits a job to the OpCenter, using CLI command options to select a container image (from the Application Library or from another image repository) and to specify the compute resources needed. The web interface generates the CLI command line automatically. The OpCenter uses this information to orchestrate the necessary resources in the Cloud Service Provider's network and schedules the job for execution. The cloud resources always include a compute node and may include block storage and file systems as well. One or more data sets usually accompany a job. The user-provided job script describes how these data sets are accessed, for example, by loading data from an AWS S3 bucket. The job script also describes where the output is placed — results are usually written to a persistent file system. When the job has run to completion, the user retrieves the results.
Workload Continuity
Using the AppCapsule feature, OpCenter automatically moves a job running on a Spot Instance to a new Spot Instance if the first Spot Instance is reclaimed, as illustrated in the figure.
The job starts executing on Spot Instance A. If the Cloud Service Provider signals that it intends to reclaim Spot Instance A, the OpCenter triggers the AppCapsule feature to capture the state of the running job and export the checkpoint image to persistent storage. A new Spot Instance is started (Spot Instance B in this case), the checkpoint image is imported from persistent storage, and the job resumes execution.
The job continues to run in this manner until completion, at which point the user retrieves the final results.
Workload Mobility
For jobs running on Spot Instances, job migration occurs automatically if the Spot Instance is reclaimed. The new Spot Instance is usually of the same type as the one that was reclaimed although the user can set a policy so that the new instance is different, for example, an On-demand Instance or a Spot Instance of a different size.
Using the CLI or the web interface, you can manually migrate a job from one virtual machine to another, for example, from a Spot Instance to an On-demand Instance, or from one instance type to another instance type. Using CLI command options, you can specify the new On-demand Instance by the instance type (for example, c6xi.large in AWS) or by specifying ranges of memory size and number of virtual CPUs.
Workload mobility is useful for rightsizing compute platforms: jobs can pass through several execution stages where the compute requirements are different; for example, one stage might be memory-intensive whereas another stage might be compute-intensive. Workload mobility moves the job from one compute platform to another as the resource demands change.
Job migration can be initiated in three ways:
- Manually, using CLI commands or the OpCenter web interface.
- Automatically, using a rules-based policy driven by resource utilization, for example, CPU or memory utilization thresholds.
- Programatically, by inserting float migrate commands at breakpoints specified in the job script, for example, after loading data.
Licensing
MemVerge maintains a portal where you obtain licenses for MMCloud. Subscribers to MMCloud choose from the following license plans.
- MMCloud Pro: Monthly charges are based on the number of CPU core-hours used.
- MMCloud Enterprise: All the features included in the Pro license plus professional support and custom SLAs. Charges are not usage-based — charges are based on a contract between MemVerge and the account holder.
When you log in to the OpCenter web interface the first time, you click on a button to apply the license to your OpCenter instance. You can also apply the license using the CLI.