Skip to content

Memory Machine Batch (MMBatch) - AWS CloudFormation Deployment

Overview

Memory Machine Batch (MMBatch) is a high-performance batch processing solution that leverages AWS Batch for scalable compute resources with advanced checkpointing capabilities. The solution supports two deployment architectures based on different storage backends:

  • EFS-Based Deployment: Uses Amazon EFS for checkpoint storage
  • JuiceFS-Based Deployment: Uses JuiceFS backed by Amazon S3 and Redis (MemoryDB) for enhanced performance and scalability

Architecture

EFS-Based Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Management    │    │   AWS Batch     │    │   Amazon EFS    │
│   Server        │    │   Compute       │    │   Checkpoint    │
│   (EC2)         │◄──►│   Environment   │◄──►│   Storage       │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   IAM Roles     │    │   Launch        │    │   Security      │
│   & Policies    │    │   Template      │    │   Groups        │
└─────────────────┘    └─────────────────┘    └─────────────────┘

JuiceFS-Based Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Management    │    │   AWS Batch     │    │   JuiceFS       │
│   Server        │    │   Compute       │    │   (S3 + Redis)  │
│   (EC2)         │◄──►│   Environment   │◄──►│   Checkpoint    │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   IAM Roles     │    │   Launch        │    │   MemoryDB      │
│   & Policies    │    │   Template      │    │   (Redis)       │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   S3 Buckets    │    │   Security      │    │   Multi-AZ      │
│   (Scratch &    │    │   Groups        │    │   Deployment    │
│   Checkpoint)   │    │                 │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘

Key Features

EFS-Based Solution

  • Simple Setup: Minimal external dependencies
  • Native AWS Integration: Leverages Amazon EFS for checkpoint storage
  • Cost Effective: Pay-per-use storage with no upfront costs
  • Managed Service: AWS handles EFS maintenance and scaling
  • Encrypted Storage: Built-in encryption for data at rest

JuiceFS-Based Solution

  • Advanced Caching: Redis-based metadata caching for improved performance
  • Scalable Storage: S3-backed storage with unlimited capacity
  • Multi-AZ Resilience: Redis clusters across multiple availability zones

Prerequisites

AWS Account Requirements

  • AWS account with appropriate IAM permissions
  • CloudFormation service enabled
  • S3 access for template hosting and artifact storage

Network Infrastructure

  • VPC: Existing VPC in your target region
  • Subnets:
  • EFS: Single subnet for compute resources
  • JuiceFS: Multiple subnets across at least 2 AZs for Redis high availability
  • Security Groups: Existing security group allowing ports 22, 443, 8080-8086
  • Internet Gateway: For public access (if not using private-only deployment)

EC2 Resources

  • Key Pair: Existing EC2 key pair for SSH access
  • AMI IDs: Region-specific AMI IDs for compute and management instances
    • For Management Server Node: Choose the Ubuntu Image (Use for Standalone AMI)
    • For Compute Server Node: Choose the ECS Optimized AL2023 Image

JuiceFS-Specific Requirements

  • MemoryDB Support: Ensure MemoryDB is available in your target region
  • Multi-AZ Subnets: At least 2 subnets in different AZs for Redis clusters
  • S3 Permissions: IAM roles with S3 read/write access for bucket operations

Deployment Options

For more information on instance types, please check out our Prequisites and Considerations page.

1. EFS-Based Deployment

Key Parameters

Parameter Description Default
UniquePrefix Resource naming prefix Required
VpcId VPC for deployment Required
SubnetId Subnet for compute resources Required
SecurityGroupId Security group for resources Required
KeyName EC2 key pair name Required
InstanceTypes Comma-separated instance types m6i.large,m6i.xlarge,m6i.2xlarge,m6i.4xlarge,m6i.8xlarge,m6i.12xlarge,m6i.16xlarge
MinvCpus Minimum vCPUs 0
MaxvCpus Maximum vCPUs 256
MMABVersion MMAB release version 1.4.0-release

Deployed Resources

  • EC2 Management Instance: Runs MMAB service for job management
  • EFS File System: Encrypted checkpoint storage with access point
  • IAM Roles: Instance, Batch service, and Batch instance roles
  • Security Groups: EFS mount target and compute resource security
  • AWS Batch: Compute environment(s) and job queue(s)
  • Launch Template: EC2 instance configuration for batch jobs

2. JuiceFS-Based Deployment

Key Parameters

Parameter Description Default
UniquePrefix Resource naming prefix Required
VPCID VPC for deployment Required
SubnetId Subnet for management instance Required
SubnetIds Comma-separated subnets for Redis/Batch Required
SecurityGroupId Security group for resources Required
KeyName EC2 key pair name Required
RedisNodeType MemoryDB node type db.t4g.small
InstanceTypes Comma-separated instance types m6i.large,m6i.xlarge,m6i.2xlarge,m6i.4xlarge,m6i.8xlarge,m6i.12xlarge,m6i.16xlarge
MinvCpus Minimum vCPUs 0
MaxvCpus Maximum vCPUs 256
MMABVersion MMAB release version 1.4.0-release

Deployed Resources

  • EC2 Management Instance: Runs MMAB service with enhanced IAM permissions
  • S3 Buckets: Scratch and checkpoint storage buckets
  • Redis Clusters: MemoryDB clusters for JuiceFS metadata (scratch and checkpoint)
  • IAM Roles: Enhanced roles with S3 and Redis access permissions
  • Launch Template: Configured with JuiceFS mounting and ECS agent tuning
  • AWS Batch: Compute environment(s) and job queue(s) with multi-queue support
  • Temporary Setup Instance: Self-terminating EC2 instance for JuiceFS formatting

Multi-Queue Configuration

Both deployment options support advanced multi-queue configurations for workload optimization:

Queue Structure

  • Jq1: High-priority, compute-optimized instances (c5 family)
  • Jq2: General-purpose workloads (m5 family)
  • Jq3: Memory-optimized workloads (m5 family with larger instances)

Enable Multi-Queue

Set EnableMultiQueue=true and configure instance types for each queue.

Security Considerations

Network Security

  • Security Groups: Restrict access to necessary ports only
  • VPC CIDR: Configure appropriate CIDR blocks for ingress rules
  • Private IP Option: Deploy with private IPs only for enhanced security

IAM Security

  • Least Privilege: IAM roles with minimal required permissions
  • Instance Profiles: Secure credential management for EC2 instances
  • Service Roles: Dedicated roles for AWS Batch operations

Data Security

  • Encryption: EFS and S3 data encrypted at rest
  • Access Control: IAM-based access control for all resources

Monitoring and Management

Management Server

  • Web Interface: Accessible on port 8080
  • API Endpoints: RESTful API for job management
  • Configuration: KV store for runtime configuration

Spot Instances

  • Default Configuration: Uses Spot instances for cost savings
  • Instance Diversification: Multiple instance types for availability

Storage Optimization

  • EFS: Pay-per-use with lifecycle policies
  • S3: Intelligent tiering and lifecycle management
  • JuiceFS: Efficient caching reduces S3 access costs

Troubleshooting

Common Issues

  1. AMI Compatibility: Ensure AMI IDs are valid for your region
  2. Subnet Configuration: Verify subnets are in the correct VPC and AZs
  3. Security Group Rules: Check that required ports are open
  4. IAM Permissions: Verify CloudFormation has necessary permissions

Log Locations

  • Management Server: /var/log/mmab/
  • Batch Jobs: CloudWatch Logs
  • System Logs: /var/log/ on EC2 instances