Storage
Configuring Storage for MMBatch Checkpoints
Types of storage and file systems supported for storing checkpoint data:
-
AWS EFS
-
JuiceFS with AWS S3
-
AWS FSx Lustre
which can be configured in EC2 Launch Template as a mount point.
Examples below -
-
AWS EFS - code block below. See here for a complete CloudFormation example.
-
JuiceFS with AWS S3 - code block below. See here for a complete CloudFormation example.
-
Create IAM roles, S3, Redis and Required Infra for JuiceFS
BatchInstanceRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Principal: Service: ec2.amazonaws.com Action: sts:AssumeRole Path: "/" ManagedPolicyArns: - arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role - arn:aws:iam::aws:policy/AmazonS3FullAccess - arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy Policies: - PolicyName: "JuiceFSpolicy" PolicyDocument: Version: "2012-10-17" Statement: - Effect: Allow Action: - "elasticache:*" Resource: !Sub "arn:aws:elasticache:${AWS::Region}:${AWS::AccountId}:cluster/mm-engine-${UniquePrefix}" - Effect: Allow Action: - "s3:*" Resource: !Sub "arn:aws:s3:::mm-engine-juice-fs-${UniquePrefix}/*" RoleName: !Sub "mm-batch-instance-role-${UniquePrefix}" JuiceFSS3Bucket: Type: AWS::S3::Bucket Properties: BucketName: !Sub "mm-engine-juice-fs-${UniquePrefix}" BucketEncryption: ServerSideEncryptionConfiguration: - ServerSideEncryptionByDefault: SSEAlgorithm: AES256 PublicAccessBlockConfiguration: BlockPublicAcls: true BlockPublicPolicy: true IgnorePublicAcls: true RestrictPublicBuckets: true
-
Launch Template
mkdir -p /mmc-checkpoint chmod 777 /mmc-checkpoint curl -sSL https://d.juicefs.com/install | sh - # Format and mount JuiceFS /usr/local/bin/juicefs format --storage s3 --bucket https://${JuiceFSS3BucketName}.s3.${AWS::Region}.amazonaws.com --trash-days=0 "rediss://${RedisClusterEndpoint}:6379/1" juicefs-metadata nohup /usr/local/bin/juicefs mount \ "rediss://${RedisClusterEndpoint}:6379/1" \ --cache-dir /mnt/jfs_cache \ --cache-size 102400 \ /mnt/jfs > /tmp/juicefs-mount.log 2>&1 & echo "Waiting for /mnt/jfs to be mounted..." while ! mountpoint -q /mnt/jfs; do sleep 2 echo "Still waiting for /mnt/jfs..." done echo "/mnt/jfs is now mounted." MOUNTPOINT=/mnt/jfs CHECKPOINT_DIR=$MOUNTPOINT/mmc-checkpoint # Ensure mount point and subdirectories exist mkdir -p $CHECKPOINT_DIR chmod 777 $CHECKPOINT_DIR # Handle /mmc-checkpoint symlink if [ -e /mmc-checkpoint ]; then echo "/mmc-checkpoint exists. Deleting it to recreate as symlink." rm -rf /mmc-checkpoint fi ln -s $CHECKPOINT_DIR /mmc-checkpoint echo "Symlink created: /mmc-checkpoint -> $CHECKPOINT_DIR"
-
-
AWS FSx Lustre
where /mmc-checkpoint
can be configured through RESTFUL API (see here for reference).
Supported Storage for User Scratch Data
Types of storage and file systems supported for user scratch data:
-
EBS
-
JuiceFS with AWS S3
-
AWS FSx Lustre