Using miniWDL with MMBatch
Deploy miniWDL Environment
We'll use the terraform stack for AWS to deploy an environment to run miniWDL in.
Install Terraform
Please find the install instructions here
Apply Terraform
After terraform
is installed we are going to clone the upstream repository and initialize terraform.
$ git clone https://github.com/miniwdl-ext/miniwdl-aws-terraform.git
Klone nach 'miniwdl-aws-terraform'...
remote: Enumerating objects: 78, done.
remote: Counting objects: 100% (78/78), done.
remote: Compressing objects: 100% (54/54), done.
remote: Total 78 (delta 43), reused 54 (delta 24), pack-reused 0 (from 0)
Empfange Objekte: 100% (78/78), 20.31 KiB | 990.00 KiB/s, fertig.
Löse Unterschiede auf: 100% (43/43), fertig.
$ cd miniwdl-aws-terraform
$ terraform init
Initializing the backend...
Initializing provider plugins...
- Finding latest version of hashicorp/aws...
- Finding latest version of hashicorp/cloudinit...
- Installing hashicorp/aws v5.84.0...
- Installed hashicorp/aws v5.84.0 (signed by HashiCorp)
- Installing hashicorp/cloudinit v2.3.5...
- Installed hashicorp/cloudinit v2.3.5 (signed by HashiCorp)
Terraform has created a lock file .terraform.lock.hcl to record the provider
selections it made above. Include this file in your version control repository
so that Terraform can guarantee to make the same selections by default when
you run "terraform init" in the future.
Terraform has been successfully initialized!
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.
If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.
(base) M1BookPro ➜ miniwdl-aws-terraform git:(main)
Adjust Stack
Cloud Init script
Add a new variable to the file variables.tf
.
The script in main.tf
has a section that imports a script.
data "cloudinit_config" "task" {
gzip = false
# enable EC2 Instance Connect for troubleshooting (if security group allows inbound SSH)
part {
content_type = "text/x-shellscript"
content = "yum install -y ec2-instance-connect"
}
part {
content_type = "text/x-shellscript"
content = file("${path.module}/assets/init_docker_instance_storage.sh")
}
}
To be able to access the DNS name of the EFS volume within the script we inline the script, instead of just importing it.
data "cloudinit_config" "task" {
gzip = false
# enable EC2 Instance Connect for troubleshooting (if security group allows inbound SSH)
part {
content_type = "text/x-shellscript"
content = "yum install -y ec2-instance-connect"
}
part {
content_type = "text/x-shellscript"
content = <<-EOT
#!/bin/bash
# To run on first boot of an EC2 instance with NVMe instance storage volumes:
# 1) Assembles them into a RAID0 array, formats with XFS, and mounts to /mnt/scratch
# 2) Replaces /var/lib/docker with a symlink to /mnt/scratch/docker so that docker images and
# container file systems use this high-performance scratch space. (restarts docker)
# The configuration persists through reboots (but not instance stop).
# logs go to /var/log/cloud-init-output.log
# refs:
# https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-instance-store.html
# https://github.com/kislyuk/aegea/blob/master/aegea/rootfs.skel/usr/bin/aegea-format-ephemeral-storage
set -euxo pipefail
shopt -s nullglob
mkdir -p /mnt/scratch/tmp
systemctl stop docker || true
if [ -d /var/lib/docker ] && [ ! -L /var/lib/docker ]; then
mv /var/lib/docker /mnt/scratch
fi
mkdir -p /mnt/scratch/docker
ln -s /mnt/scratch/docker /var/lib/docker
# Create checkpoint dir
mkdir -p /mmc-checkpoint
# Mount EFS filesystem
mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport ${aws_efs_file_system.efs.dns_name}:/ /mmc-checkpoint
## create a subdir to rebount under the subdir
mkdir -p /mmc-checkpoint/checkpoints
umount /mmc-checkpoint
mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport ${aws_efs_file_system.efs.dns_name}:/checkpoints /mmc-checkpoint
# Install MM batch engine
curl -k ${var.api_address}/api/v1/scripts/install-pagent | bash
systemctl restart docker || true
systemctl restart --no-block ecs
EOT
}
}
Add SSH KeyPair
In case you want to log into a worker node you'll need to add key_name="<key_pair_name>"
to the resource "aws_launch_template" "task"
within the main.tf
file.
Please add this line at the end of the script
And look out for this line within main.tf
.
resource "aws_launch_template" "task" {
name = "${var.environment_tag}-task"
update_default_version = true
iam_instance_profile {
name = aws_iam_instance_profile.task.name
}
key_name ="KEY_PAIR_NAME"
block_device_mappings {
device_name = "/dev/xvda"
ebs {
volume_type = "gp3"
volume_size = 40
# ^ Large docker images may need more root EBS volume space on worker instances
}
}
user_data = data.cloudinit_config.task.rendered
}
The SSH section within the security group looks like this.
Existing VPC
If you want to use existing VPC, replace the network resources in main.tf
with your existing VPC, subnets and security groups. While doing this, please make sure EFS port(2049) is open to the worker nodes.
How is that done?
Service Role
Warning
The stack will deploy a role AWSServiceRoleForEC2Spot
which can only be created once globaly.
If you already have this role created, you need to set the following value to false
in the variables.tf
file.
variable "create_spot_service_roles" {
description = "Create account-wide spot service roles (disable if they already exist)"
type = bool
default = false
}
Apply
Apply the terraform
template with your owner_tag
and s3_upload_buckets
. Note here we use miniwdl
as the environment tag which affects the resources names created by this templated, such as compute environment names, queue names and launch template names.
Pass the following variable.
environment_tag=miniwdl
: tag that will be used for all resources createdowner_tag=me@example.com
: tag to identify the owner of the resourcess3upload_buckets=["MY-BUCKET"]
: Please use a bucket in the region you are deploying the stack in-var='api_address=http://WW.XX.YY.ZZ:8080'
: IP address of the managmement server
terraform apply \
-var='environment_tag=miniwdl' \
-var='owner_tag=me@example.com' \
-var='s3upload_buckets=["MY-BUCKET"]' \
-var='api_address=http://WW.XX.YY.ZZ:8080'
Once applied, you should see output like this.
Apply complete! Resources: 12 added, 0 changed, 0 destroyed.
Outputs:
fs = "fs-03XYZ1"
fsap = "fsap-0aXYZ1"
security_group = "sg-0XYZ1"
subnets = [
"subnet-0eXYZ1",
"subnet-0aXYZ2",
"subnet-00XYZ3",
]
workflow_queue = "miniwdl-workflow"
Create Management Server
Run MiniWDL
Install
MiniWDL Plugin
MiniWDL does not use AWS Batch job attemps usually. Instead it will create a new AWS Batch job if the job fails. To enable checkpoint/restore using MMBatch we created a miniWDL plugin which assigns a environment variable for each job that persists - so that a job retry can be identified even though the AWS Batch Job ID changed (b/c it is a new job).
To use the plugin, please use our public image. Either by environment variable or flag to miniwdl-aws
.
Test Env
Let's create a simple WDL workflow.
workflow helloWorld {
String name
call sayHello { input: name=name }
}
task sayHello {
String name
command {
for i in $(seq 1 30); do
printf "# Iteration $i: hello to ${name} on $(date)\n"
sleep 10
done
}
output {
String out = read_string(stdout())
}
runtime {
docker: "archlinux:latest"
maxRetries: 3
}
}
Submit workflow
$ miniwdl-aws-submit hello.wdl --workflow-queue miniwdl-workflow name=world
2025-01-23 11:30:16.978 miniwdl-zip hello.wdl <= /Users/kniepbert/data/temp/memverge/miniwdl/hello.wdl
2025-01-23 11:30:16.979 miniwdl-zip Prepare archive /var/folders/tg/x8qd961x4xq98g35631w4t0r0000gn/T/miniwdl_zip_jz__13sj/hello.wdl.zip from directory /var/folders/tg/x8qd961x4xq98g35631w4t0r0000gn/T/miniwdl_zip_rdt43szb
2025-01-23 11:30:16.980 miniwdl-zip Move archive to destination /var/folders/tg/x8qd961x4xq98g35631w4t0r0000gn/T/tmp5wqw7sno/hello.wdl.zip