New Features in MMCloud Fire Island 2.3 Release

Date Released

Released on 07-20-2023

Supported Clouds

MMCloud is designed to work on any cloud infrastructure. The Fire Island release supports the following clouds:

New Features

General

  • Job Suspend and Resume

    Two new options, Suspend and Resume, are available as actions to apply to a job. These are in addition to the existing Migrate, Modify, and Cancel actions. Suspend has two modes — Performance, which maintains the storage volumes, and Eco, which releases the storage volumes after capturing volume snapshots.

  • Carbon Emissions Metric

    From the perspective of the user, services purchased from a Cloud Service Provider fall into Scope 3 emissions (using the terminology of the Greenhouse Gas Protocol). An estimate of the Scope 3 emissions associated with each job is available. Aggregate measures of carbon emissions are also calculated — by job and by time period.

  • Google Cloud Readiness
    The status of Google Cloud support changes from Experimental to General Availability. Improvements include the following.
    • Deployment with Terraform

      A revised Terraform template is available from the Google Cloud Marketplace.

    • Periodic snapshots

      Spot VMs can be reclaimed by Google Cloud with a thirty second warning. For most jobs, this is not enough time to capture the current memory footprint. To provide job continuity on Spot VMs, periodic snapshots must be used. Support for periodic snapshots in Google Cloud has been improved.

    • Support for Google Cloud-specific features: firewall tags and labels

Functional Improvements

  • Separate CPU and Memory Thresholds for WaveRider Policy

    In earlier releases, job migration is triggered when either a CPU or a memory utilization threshold is crossed. In the Fire Island release, the policy can be set so that only a CPU threshold crossing triggers a migration or only a memory threshold crossing triggers a migration. The policy can also be set, as in earlier releases, so that either a CPU or a memory threshold crossing triggers a migration.

  • Out-of-Memory Protection Enabled by Default

    By default, the WaveRider feature is enabled with the following settings: evadeOOM set to true, and CPU and memory threshold triggers turned off.

  • CPU Compatibility Check

    A snapshot created on a CPU with one architecture must be restored on a CPU with a compatible architecture. For example, a snapshot created on an Intel Xeon Platinum 8000 series processor cannot be restored on a Graviton processor. Before attempting to migrate a job, the compatibility between the current CPU and the target CPU is checked.

Experimental Features

  • WaveWatcher Insight

    WaveWatcher provides real-time measurements of resource utilization (CPU, memory, network, etc.) as a job executes. WaveWatcher Insight adds a button to display suggestions for improving resource efficiency.

  • WaveRider Simulator

    When WaveRider is enabled for a job, the OpCenter monitors CPU and memory utilization as the job executes, and migrates the job as determined by the user-supplied policy. The WaveRider Simulator allows the user to conduct a what-if analysis after a job completes without WaveRider enabled. The simulator allows the user to investigate the effects of WaveRider on cost and wall clock time.

  • Jenkins Plugin

    Jenkins is a server-based platform used to automate building, testing, and deploying software. Jenkins uses plugins to integrate with software projects not written in Java and with a variety of version control and bug reporting systems. The Fire Island release provides a plugin so that Jenkins can schedule jobs to run on OpCenter and manage the jobs once they are running.