New Features in MMCloud Carmel 2.0 Release

Date Released

Released on 03-06-2023

Supported Clouds

MMCloud is designed to work on any cloud infrastructure. The Carmel release supports the following clouds:

New Features

General

  • MMCloud Gateway Service

    The Gateway Service provides a reverse proxy so that clients can access all servers in a server farm using a single IP address. If a server moves to a new instance and acquires a new IP address (for example, if a job running on a Spot instance migrates after a reclaim event), the migration is transparent to the client (user). The server farm can provide multiple services simultaneously, for example, RStudio and Jupyter Notebooks.

  • Job migration initiated by swap space usage

    If an application's virtual memory usage extends to swap space, it is often a hint that an out-of-memory (OOM) condition is imminent. MMCloud detects swap space usage and automatically migrates the job to a virtual machine with larger memory capacity.

  • Option to ignore AWS Rebalance Recommendation signal

    The AWS Rebalance Recommendation signal is used by AWS to indicate that a Spot Instance has a higher likelihood of being reclaimed. For a job that has a memory footprint of less than 128 GB, the Rebalance Recommendation signal is unnecessary because the OpCenter has enough time to move the job to a new instance within the two-minute Spot Instance reclaim window. The ignoreRebalance option can be used with float submit to ignore any Rebalance Recommendation signals for the submitted job.

  • Integration with AliCloud Billing

    This integration allows AliCloud to deliver a customer invoice that includes the MemVerge license fee.

User Experience Improvements

  • GUI Version 1
    From the OpCenter landing page, a user can launch the MMCloud GUI. With the GUI, the user can:
    • Submit jobs
    • Monitor and manage jobs (using filters to focus on particular jobs)
    • View logs and metrics
    • Analyze application behavior, e.g., examine a time series chart of CPU and memory usage
    • Manage the application library, including adding and deleting container images
  • Support for float df command

    The unix df command is commonly used to show disk usage and file system mount points on a unix server. The float df -j job_id command provides the same information on a per job basis, which is useful in analyzing how an application is using disk space.

  • Default values shown in float command line help messages
  • "Starting" added as a job status value

    Job status now progresses from "Initializing" to "Starting" to "Executing" to more accurately describe when the job script is executing.

  • Image specified in float submit command by name or by URI

Platform Improvements

  • New virtual machine instance created before job migration initiated

    This prevents a job failing because no virtual machines are available to migrate to.

  • EBS volume optimization for large, long-running jobs

    Large-capacity, high-throughput EBS volumes are expensive. This optimization minimizes the time that EBS volumes of this type (needed to persist AppCapsules) are instantiated.

  • Support for configurable discounts in MMCloud licenses
  • Support for the Hangzhou/Beijing/Shanghai region in AliCloud

Bug fixes

  • Fall back to current VM if new host not available

    When a manual job migration begins, it can happen that a new host is not available. In earlier releases, this causes the job to fail. In the Carmel release, the current VM is not deleted until the new VM instance is started successfully. If a new VM is not available, job migration stops and the job continues running on the current VM.