Skip to content

Latest Release

A brief introduction to MMBatch followed by what's new in the latest release.

Overview

Memory Machine Batch (MMBatch) captures the entire running state of a Batch Job into a consistent image and restores the Job on a new Compute Instance without losing any work progress. It ensures a high quality of service at the Batch level using low-cost, but unreliable Spot-based Compute Instances. For more details, visit the MMBatch website.

New in the 1.2 Release

The MMBatch 1.2 release adds new features, improves the user experience and enhance security of the product.

  • Workflow Manager Support

    • Cromwell and miniWDL are supported starting in 1.2.
    • For how to use MMBatch with Cromwell Workflow Manager, see here.
    • For how to use MMBatch with miniWDL Workflow Manager, see here.
    • For how to use MMBatch with Nextflow Workflow Manager, see here.
  • MMBatch Management Server

    • MMBatch Management Server is supported starting in 1.2 where the user can view the dashboard with reporting metrics from the browser.
    • For how to access the dashboard as well as the definitions of reporting metrics, see here.
    • For how to install stand-alone MMBatch Management Server, see here.
  • AWS Cognito Support

    • User can utilize existing AWS Cognito credentials to sign in MMBatch’s Management Server starting in 1.2.
    • For how to enable/disable Cognito as well as signing in, see here.
  • Security

  • RESTFUL API

    • MMBatch provides RESTFUL APIs starting in 1.2.
    • For the API reference guide, see here.

New in the 1.2.1 Release

  • Three new metrics added in Management Server dashboard:
    • Total Cost: Estimated total cost for all jobs.
    • EC2 On-Demand Savings: Estimated savings when replacing on-demand instances with spot instances.
    • EC2 Spot Savings: Estimated savings when restoring a job from a preempted spot instance, assuming each spot instance runs only one job.
  • Enhanced AWS Cognito settings:
    • Users can now specify an admin group in AWS Cognito settings.
    • Only users in the designated group will be recognized as admins.
    • Only admins can enable or disable AWS Cognito.

New in the 1.2.2 Release

  • Dashboard metrics updates -
    • "Total Job Runtime" → "Total Job CPU Hours": Total number of CPU hours requested by all jobs. It is based on CPU requested, not the actual CPU usage.
    • "Time Saved" → "Job CPU Hours Saved": Estimated job CPU hours savings when restoring a job from a preempted spot instance, assuming each instance runs only one job.
  • Dashboard metrics terminalogy renaming for clarification -
    • "Total Cost" → "Total EC2 Instance Cost"
    • "Cost Saved" (under "EC2 Spot Savings") → "Instance Cost Saved"
    • "Cost Saved" (under "EC2 On-Demand Savings") → "Instance Cost Saved"

Known Limitations

  • When an AWS Spot Interruption event triggers, the AWS instance is terminated after 2 minutes. Depending on checkpoint volume storage configuration, around 30GB of data can be saved during this period. The data is the total of memory usage plus container filesystem usage. The actual amount of data saved depends on multiple factors including storage type, storage bandwidth, the number of jobs running on the AWS instance being reclaimed. It is recommended to attach a shared filesystem for container scratch files instead of writing to the container filesystem. An attached shared filesystem data does not require saving during checkpoint. The recommended shared filesystem is JuiceFS with S3 backing.
  • When signing up AWS Cognito service from MMBatch Management Server, with username alone, both email address and phone number are required attributes.
  • In some cases, applications may fail immediately after restore (for example, during S3 uploading). These failures can be critical, but they are still counted as successful restores, leading to potential overestimation of savings.