Memory Machine Batch (MMBatch) - AWS CloudFormation Deployment
Overview
Memory Machine Batch (MMBatch) is a high-performance batch processing solution that leverages AWS Batch for scalable compute resources with advanced checkpointing capabilities. The solution supports two deployment architectures based on different storage backends:
- EFS-Based Deployment: Uses Amazon EFS for checkpoint storage
- JuiceFS-Based Deployment: Uses JuiceFS backed by Amazon S3 and Redis (MemoryDB) for enhanced performance and scalability
Architecture
EFS-Based Architecture
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Management │ │ AWS Batch │ │ Amazon EFS │
│ Server │ │ Compute │ │ Checkpoint │
│ (EC2) │◄──►│ Environment │◄──►│ Storage │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ IAM Roles │ │ Launch │ │ Security │
│ & Policies │ │ Template │ │ Groups │
└─────────────────┘ └─────────────────┘ └─────────────────┘
JuiceFS-Based Architecture
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Management │ │ AWS Batch │ │ JuiceFS │
│ Server │ │ Compute │ │ (S3 + Redis) │
│ (EC2) │◄──►│ Environment │◄──►│ Checkpoint │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ IAM Roles │ │ Launch │ │ MemoryDB │
│ & Policies │ │ Template │ │ (Redis) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ S3 Buckets │ │ Security │ │ Multi-AZ │
│ (Scratch & │ │ Groups │ │ Deployment │
│ Checkpoint) │ │ │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Key Features
EFS-Based Solution
- Simple Setup: Minimal external dependencies
- Native AWS Integration: Leverages Amazon EFS for checkpoint storage
- Cost Effective: Pay-per-use storage with no upfront costs
- Managed Service: AWS handles EFS maintenance and scaling
- Encrypted Storage: Built-in encryption for data at rest
JuiceFS-Based Solution
- Advanced Caching: Redis-based metadata caching for improved performance
- Scalable Storage: S3-backed storage with unlimited capacity
- Multi-AZ Resilience: Redis clusters across multiple availability zones
Prerequisites
AWS Account Requirements
- AWS account with appropriate IAM permissions
- CloudFormation service enabled
- S3 access for template hosting and artifact storage
Network Infrastructure
- VPC: Existing VPC in your target region
- Subnets:
- EFS: Single subnet for compute resources
- JuiceFS: Multiple subnets across at least 2 AZs for Redis high availability
- Security Groups: Existing security group allowing ports 22, 443, 8080-8086
- Internet Gateway: For public access (if not using private-only deployment)
EC2 Resources
- Key Pair: Existing EC2 key pair for SSH access
- AMI IDs: Region-specific AMI IDs for compute and management instances
- For Management Server Node: Choose the Ubuntu Image (Use for Standalone AMI)
- For Compute Server Node: Choose the ECS Optimized AL2023 Image
JuiceFS-Specific Requirements
- MemoryDB Support: Ensure MemoryDB is available in your target region
- Multi-AZ Subnets: At least 2 subnets in different AZs for Redis clusters
- S3 Permissions: IAM roles with S3 read/write access for bucket operations
Deployment Options
For more information on instance types, please check out our Prequisites and Considerations page.
1. EFS-Based Deployment
Key Parameters
Parameter | Description | Default |
---|---|---|
UniquePrefix |
Resource naming prefix | Required |
VpcId |
VPC for deployment | Required |
SubnetId |
Subnet for compute resources | Required |
SecurityGroupId |
Security group for resources | Required |
KeyName |
EC2 key pair name | Required |
InstanceTypes |
Comma-separated instance types | m6i.large,m6i.xlarge,m6i.2xlarge,m6i.4xlarge,m6i.8xlarge,m6i.12xlarge,m6i.16xlarge |
MinvCpus |
Minimum vCPUs | 0 |
MaxvCpus |
Maximum vCPUs | 256 |
MMABVersion |
MMAB release version | 1.4.0-release |
Deployed Resources
- EC2 Management Instance: Runs MMAB service for job management
- EFS File System: Encrypted checkpoint storage with access point
- IAM Roles: Instance, Batch service, and Batch instance roles
- Security Groups: EFS mount target and compute resource security
- AWS Batch: Compute environment(s) and job queue(s)
- Launch Template: EC2 instance configuration for batch jobs
2. JuiceFS-Based Deployment
Key Parameters
Parameter | Description | Default |
---|---|---|
UniquePrefix |
Resource naming prefix | Required |
VPCID |
VPC for deployment | Required |
SubnetId |
Subnet for management instance | Required |
SubnetIds |
Comma-separated subnets for Redis/Batch | Required |
SecurityGroupId |
Security group for resources | Required |
KeyName |
EC2 key pair name | Required |
RedisNodeType |
MemoryDB node type | db.t4g.small |
InstanceTypes |
Comma-separated instance types | m6i.large,m6i.xlarge,m6i.2xlarge,m6i.4xlarge,m6i.8xlarge,m6i.12xlarge,m6i.16xlarge |
MinvCpus |
Minimum vCPUs | 0 |
MaxvCpus |
Maximum vCPUs | 256 |
MMABVersion |
MMAB release version | 1.4.0-release |
Deployed Resources
- EC2 Management Instance: Runs MMAB service with enhanced IAM permissions
- S3 Buckets: Scratch and checkpoint storage buckets
- Redis Clusters: MemoryDB clusters for JuiceFS metadata (scratch and checkpoint)
- IAM Roles: Enhanced roles with S3 and Redis access permissions
- Launch Template: Configured with JuiceFS mounting and ECS agent tuning
- AWS Batch: Compute environment(s) and job queue(s) with multi-queue support
- Temporary Setup Instance: Self-terminating EC2 instance for JuiceFS formatting
Multi-Queue Configuration
Both deployment options support advanced multi-queue configurations for workload optimization:
Queue Structure
- Jq1: High-priority, compute-optimized instances (c5 family)
- Jq2: General-purpose workloads (m5 family)
- Jq3: Memory-optimized workloads (m5 family with larger instances)
Enable Multi-Queue
Set EnableMultiQueue=true
and configure instance types for each queue.
Security Considerations
Network Security
- Security Groups: Restrict access to necessary ports only
- VPC CIDR: Configure appropriate CIDR blocks for ingress rules
- Private IP Option: Deploy with private IPs only for enhanced security
IAM Security
- Least Privilege: IAM roles with minimal required permissions
- Instance Profiles: Secure credential management for EC2 instances
- Service Roles: Dedicated roles for AWS Batch operations
Data Security
- Encryption: EFS and S3 data encrypted at rest
- Access Control: IAM-based access control for all resources
Monitoring and Management
Management Server
- Web Interface: Accessible on port 8080
- API Endpoints: RESTful API for job management
- Configuration: KV store for runtime configuration
Spot Instances
- Default Configuration: Uses Spot instances for cost savings
- Instance Diversification: Multiple instance types for availability
Storage Optimization
- EFS: Pay-per-use with lifecycle policies
- S3: Intelligent tiering and lifecycle management
- JuiceFS: Efficient caching reduces S3 access costs
Troubleshooting
Common Issues
- AMI Compatibility: Ensure AMI IDs are valid for your region
- Subnet Configuration: Verify subnets are in the correct VPC and AZs
- Security Group Rules: Check that required ports are open
- IAM Permissions: Verify CloudFormation has necessary permissions
Log Locations
- Management Server:
/var/log/mmab/
- Batch Jobs: CloudWatch Logs
- System Logs:
/var/log/
on EC2 instances