Dashboard Overview
The Dashboard view provides a high-level overview of your Memory Machine Batch operations, offering quick insights into job progress, resource utilization, and cost savings.
Let's break down each section of the Dashboard View and describe what valuable information each provides.
Dashboard Header
At the top of the dashboard, you'll find:
- Memory Machine Batch Title: Identifies the product.
- Dashboard / Jobs Tabs: Allows you to navigate between the main Dashboard overview and the detailed Jobs list.
- Date Range Filter: Click the calendar icon to select a custom start and end date for the data displayed on the dashboard.
- Refresh Button: Click this to update the dashboard with the latest data.
- User Menu Icons: Provides access to user settings, notifications, and other account-related functions.
Cognito Icon 
Allows a user to enable or disable Amazon Cognito. If you disable Cognito, you will be logged out of the Management Server. Check out our Guide for how to enable, disable, and sign in to Cognito.
Configuration Icon 
Although small, this icon houses all configuration capabilities within the Management Server. Once clicked, it will provide you with five separate areas for configuration.
Note
The Cancel and Save buttons on the top right of the Configurations View are key to this part of the Management Server. Any changes made in any of the five configuration areas will need to be saved by clicking on the blue Save button.
The default view is of the Checkpoint configuration area.
Checkpoint
Users can enable / disable checkpointing for spot reclaim protection, as well as configure the interval between checkpointing. Check out our Configuration Guide.
Checkpoint is divided into two sections, Basic and Advanced:
Basic Section:
Within the Basic Section you will find the following configuration options:
- Mode: Choose between None, Iterative, and Periodic
- Interval: Choose the interval you wish to have by hours, minutes, and seconds.
- Image Path: Choose which image you want Management Server to point to.
Advanced Section:
Within the Advanced Section you will find the following configuration options:
- Checkpoint Copy File Paths: Add Checkpoint copy file paths here
- Root File System Difference: Click this checkbox if you wish to see differences in root file systems.
- Close TCP Connection: Click this checkbox if you wish to close the TCP connection.
- Diagnosis Mode: Toggle this switch if you wish to be in Diagnosis mode for error detection and troubleshooting activities.
Job EBS Volume
Users can enable or disable managed EBS features, as well as configure the EBS volume type, size, mount path, and custom tags.
- Volume Type: MMBatch offers seven different options via a dropdown menu to best fit the type of volume you have:
- General Purpose SSD (gp3, gp2)
- Provisioned IOPS SSD (io2, io1)
- Throughput Optimized HDD (st1)
- Cold HDD (sc1)
- Magnetic/Previous Generation (standard)
- Volume Size: Choose which size of volume you have within EBS.
- EBS Mount Path: Add the mount path you have for EBS
- Retention Interval: Choose which interval you wish to retain data for.
- Successful Job TTL: Set the duration for which the EBS Volume (and the data on it) associated with a successfully completed batch job will persist before being automatically deleted or de-provisioned.
- Failed Job TTL: Set the duration for which the EBS Volume (and the data on it) associated with a failed batch job will persist before being automatically deleted or de-provisioned.
- Custom Tags: Set custom tags to match custom tags for AWS Cost Explorer.
- EBS Per Job: Enable or disable by toggling the switch.
Space Considerations for using Managed EBS
During restore when using Managed EBS, the new instance must first do a docker pull of the container image. As the docker image is saved to the root volume, it must be large enough to hold the image. For large containers, such as for GPU workloads, the additional docker pull increases the time required to perform checkpoint-restore by 5-10 minutes, as measured with a 45GB image. Managed EBS will not use space in the root volume for application data.
Log
The Log area allows for configuration of logging associated with MMBatch jobs.
- Log File: This field is not editable and reports the name of the log file.
- Log Level: Log files record application events, categorized by levels that indicate severity or detail. This helps you understand system behavior and troubleshoot. Select the logging level you wish to report to your log file from a dropdown of five options:
- Error: Critical issue preventing an operation or function. Indicates a serious failure requiring immediate attention.
- Warning: An unexpected event or potential problem. The application continues, but this might lead to future issues.
- Info: General, high-level messages about normal application progress and significant events.
- Debug: Detailed internal information for developers or advanced troubleshooting. Can be very verbose.
- Trace: Extremely fine-grained details for deep diagnostics. The most verbose level, often impacting performance.
Server
The server configuration area allows users to establish access adresses and enter certificate and private key paths.
- Server ID: This non-editable field displays your Server UUID.
- Access Address: Within this area you can set a network address, http type, and port should you choose not to use the default settings.
- Certificate File Path: the location (address) on your computer or server's file system where a digital certificate file is stored. Software uses this path to find and access the certificate, which is essential for secure communication, authentication, or encryption processes.
- Private Key File Path: the location (address) on your computer or server's file system where a private key file is stored. Software uses this path to find and access the private key, which is essential for secure communication, authentication, or encryption processes.
Cognito
The Cognito configuration area allows users to utilize existing AWS Cognito credentials to sign in MMBatch’s Management Server. Check out our Guide for how to enable, disable, and sign in to Cognito.
- ON Toggle: This toggle shows that Cognito is currently On.
- User Pool ID: This ID uniquely identifies that specific user directory within your AWS account and a particular AWS region.
- Client ID: The entity that is authorized to call the Amazon Cognito User Pool API operations.
- Identity Pool ID: A component of Amazon Cognito that allows your users to obtain temporary AWS credentials to access AWS services directly.
- Admin Groups: Any administration group associated with this AWS Cognito configuration.
Note
The Cognito area is greyed out and does not allow any changes within it when Cognito is enabled.
Light/Dark Mode 
The Light/Dark Mode icon allows you to choose if you wish to view the Management Server in Light mode or in Dark Mode. If you are viewing in Dark mode, the icon will change to .
User Icon 
Clicking on the User icon will create a small pop up asking if you wish to log out.
Key Performance Indicators (KPIs) - Summary Cards
The top section features several cards displaying critical summary metrics:
- Total Jobs: The total number of AWS Batch jobs submitted with MMBatch installed and enabled that have been processed within the selected date range.
- Total Job CPU Hours: The cumulative CPU hours consumed by all AWS Batch jobs submitted with MMBatch installed and enabled. It is based on the amount of CPU time requested, not the actual CPU usage. The CPU hours of an individual job is computed as follows: The runtime of an individual job is computed as follows:
- Total EC2 Instance Cost: An estimated total cost of the EC2 instances used for your batch jobs.
- Spot Protections: The number of jobs that utilized Spot Protection, preventing interruptions.
Cost Savings Overview
This section highlights the cost efficiencies achieved:
- EC2 Spot Savings: Estimated cost savings when restoring a job from a preempted spot instance, assuming each spot instance runs only one job.
- EC2 On-demand Savings: Estimated savings when replacing on-demand instances with spot instances.
- Job CPU Hours Saved: Estimated Job CPU Hours savings when restoring a job from a preempted spot instance, assuming each spot instance runs only one job.
Checkpoint and Restore Activity Summaries
These two prominent panels provide an immediate glance at the health and success rate of your Checkpoint and Restore operations:
- Attempts: The total number of times the operation (Checkpoint or Restore) was attempted.
- Succeeded: The number of successful operations, with a green bar indicating progress.
- Failed: The number of failed operations, highlighted with an orange bar for easy identification. Clicking on the chain icon (
) will take you to the jobs view with a filter to look only for failed jobs.
- Success Rate: The percentage of successful attempts out of the total attempts.
Queue Details Table
Below the summaries, a detailed table provides granular data for each of your queues:
- QUEUE Name: The identifier for each individual queue.
- JOB Submitted: Number of Jobs submitted for the given queue.
- CHECKPOINT Columns: For each queue, this section shows:
- Attempts: Checkpoint attempts for jobs in that queue.
- Success Rate: The success rate of checkpoint operations within that queue.
- Succeeded: The count of successful checkpoints.
- Failed: The count of failed checkpoints. The chain icon (
) will take you to the jobs view with a filter that only looks for failed jobs.
- RESTORE Columns: (Identical to Checkpoint columns) For each queue, this section shows:
- Attempts: Restore attempts for jobs in that queue.
- Success Rate: The success rate of restore operations within that queue.
- Succeeded: The count of successful restores.
- Failed: The count of failed restores. The chain icon (
) will take you to the jobs view with a filter that only looks for failed jobs.
- Pagination At the bottom right of the Queues table, pagination controls (e.g., "1 / 10 page") allow you to navigate through multiple pages of queue data if your list exceeds a single view.