Skip to content

MMBatch Management Server

The MMBatch Management Server functions as both an API server and a central point for metrics collection and global checkpoint configuration. Metrics from worker nodes are displayed in an interactive single-page application (SPA).

In this section, we breakdown the MMBatch Management Server browser GUI into its' primary tabs: Dashboard and Jobs.

Reporting

The MMBatch Management Server provides reporting for visibility of the spot reclaim protections as well as estimated time savings.

An example below - Dashboard

The definitions of the metrics are -

  • Total Jobs - Total number of AWS Batch jobs submitted with MMBatch installed and enabled

  • Spot Protections - Total number of spot reclaim protection MMBatch provided

  • Total Job CPU Hours - Total number of CPU hours requested by all AWS Batch jobs submitted with MMBatch installed and enabled. It is based on CPU requested, not the actual CPU usage.

The CPU hours of an individual job is computed as follows:

CPU time (CPU-hours) = Job requested CPUs × Job runtime

The runtime of an individual job is computed as follows:

Job runtime = Job complete − Job start
  • Total EC2 Instance Cost: Estimated total cost for all jobs.

  • EC2 Spot Savings - Instance Cost Saved: Estimated cost savings when restoring a job from a preempted spot instance, assuming each spot instance runs only one job.

  • EC2 Spot Savings - Job CPU Hours Saved: Estimated Job CPU Hours savings when restoring a job from a preempted spot instance, assuming each spot instance runs only one job.

  • EC2 On-demand Savings - Instance Cost Saved: Estimated savings when replacing on-demand instances with spot instances.

User can also see the AWS queues from MMBatch Management Server, including the number of jobs submitted.

Configuration

The MMBatch Management Server also provides GUI for easy setup.

Checkpoint

User can enable / disable checkpointing for spot reclaim protection, as well as configure the interval between checkpointing. Check out our Configuration Guide.

Checkpoint

Job EBS Volume

Users can enable or disable managed EBS features, as well as configure the EBS volume type, size, mount path, and custom tags.

Managed EBS

Space Considerations for using Managed EBS

During restore when using Managed EBS, the new instance must first do a docker pull of the container image. As the docker image is saved to the root volume, it must be large enough to hold the image. For large containers, such as for GPU workloads, the additional docker pull increases the time required to perform checkpoint-restore by 5-10 minutes, as measured with a 45GB image. Managed EBS will not use space in the root volume for application data.

Log

User can configure log level and log file sizes.

Log

Server

User can configure the port access to MMBatch Management Server as well as the certificate and private key file path.

Server

Cognito

User can utilize existing AWS Cognito credentials to sign in MMBatch’s Management Server. Check out our Guide for how to enable, disable, and sign in to Cognito.

Cognito