Using Cromwell with MMCloud

Summary

Workflows are computational pipelines, such as those found in bioinformatics, where there are a series of — sometimes interconnected — steps, each of which may involve different software and dependencies. Cromwell is an execution engine that allows users to run workflows written in the Workflow Description Language (WDL, pronounced widdle). WDL is a domain-specific language (DSL), that is, WDL is a language that has features customized for particular applications, in this case, for genomic analyses.

Cromwell is distributed as a Java ARchive (JAR) file, so running a workflow defined in a .wdl file requires a Java runtime engine to execute the Cromwell Java package. In a manner similar to Nextflow, the wdl file describes the steps in the workflow and how each step must be executed. The execution environment for each step in the analysis is described in Cromwell terminology as a "backend." The analogous concept in Nextflow is "executor."

By including a MemVerge-provided configuration file with the Java runtime, Cromwell can use Memory Machine Cloud (MMCloud) as a "backend." From a MMCloud point of view, the execution step that is assigned to it is an independent job that it runs just like any other batch job, which means that all the MMCloud features, such as SpotSurfer and WaveRider, are available. The benefits to the Cromwell user include cost savings and cloud resource rightsizing.

This document describes how to use Cromwell with MMCloud so that Cromwell can schedule one or more (or all) of the tasks in a workflow to run on MMCloud. Examples are used to demonstrate the principles; you can adapt and modify as needed to fit your workflow.

Configuration

The Cromwell Host is the host where you install Java and load the Cromwell JAR file. To use MMCloud as a backend, you must include a MemVerge-provided configuration file when you run the Cromwell JAR file. The configuration file contains the logic that translates the Cromwell task commands into a job file that is submitted (using the float CLI) to the MMCloud OpCenter. The OpCenter instantiates a Worker Node (a container running in its own virtual machine) for each task in the process pipeline that uses MMCloud as a backend.

Cromwell Configuration with MMCloudy

The Cromwell configuration file describes the environment for each backend. To use MMCloud as a backend, the configuration file must contain definitions for:

IP address of the OpCenter
Login credentials for the OpCenter
Default number of vCPUs for the Worker Node (value can be left blank)
Default memory capacity for the Worker Node (value can be left blank)
Default container image (value can be left blank)

If a value is left blank, it must be provided in the runtime section of the wdl file.

Operation

The Cromwell job file (a file with extension .wdl) describes the workflow and specifies the backend for each task (the default backend is the local host). When the user submits a job using the java run command, any process with the backend defined as "float" is scheduled for the OpCenter. Combining information from the configuration file and the job file results in a float submit command that is sent to the OpCenter. This procedure is repeated for every task in the workflow that has "float" as the backend.

Cromwell Operation with MMCloudy

Requirements

To use Cromwell with MMCloud, you need the following:

MMCloud Carmel 2.0 release or later
OpCenter instance with valid license
Cromwell Host (can be your local computer or a Linux virtual machine running in same VPC as the OpCenter)
Cromwell Host with the following:
- Java 11
- Cromwell jar file
- MemVerge's Cromwell configuration file
- Job file in wdl format
- Input file(s) in json format
- Options file in json format (optional)
- MMCloud CLI binary. You can download it from the OpCenter.

Prepare the Cromwell Host

The Cromwell Host can be any computer that has access to the MMCloud. For complicated workflows, it is likely that the wdl file references objects in S3 buckets. For this reason and to comply with the same security policies that apply to the OpCenter, the instructions described here assume that the Cromwell Host is a Linux virtual machine running in the same VPC as the OpCenter. You can view instructions on how to create an AWS EC2 instance here. If the Cromwell Host is in a different VPC subnet, check that the Cromwell Host can reach the OpCenter. Ensure that any firewall rules allow access to ports 22 (the port used by ssh), 80, 443, and 8000.

Check the version of java installed on the Cromwell Host by entering:

$ java -version
openjopenjdk version "11.0.18" 2023-01-17 LTS
OpenJDK Runtime Environment (Red_Hat-11.0.18.0.10-3.el9) (build 11.0.18+10-LTS)
OpenJDK 64-Bit Server VM (Red_Hat-11.0.18.0.10-3.el9) (build 11.0.18+10-LTS, mixed mode, sharing)

If needed, install Java 11. Commercial users of Oracle Java need a subscription. Alternatively, you can install OpenJDK under an open-source license by entering (on a Red Hat-based Linux system):

sudo dnf install java-11-openjdk
Create a directory called cromwell and cd to it
Download the Cromwell jar file (85 is a recent version) from here and place it in the cromwell directory

Check the Cromwell jar file by entering:

$ java -jar cromwell-85.jar --help
cromwell 85
Usage: java -jar /path/to/cromwell.jar [server|run|submit] [options] args>...

  --help                   Cromwell - Workflow Execution Engine
  --version
Command: server
Starts a web server on port 8000.  See the web server documentation for more details about the API endpoints.
Command: run [options] workflow-source
Run the workflow and print out the outputs in JSON format.
  workflow-source          Workflow source file or workflow url.
  --workflow-root <value>  Workflow root.
  -i, --inputs <value>     Workflow inputs file.
  -o, --options <value>    Workflow options file.
  -t, --type <value>       Workflow type.
  -v, --type-version <value>
                           Workflow type version.
  -l, --labels <value>     Workflow labels file.
  -p, --imports <value>    A zip file to search for workflow imports.
  -m, --metadata-output <value>
                           An optional JSON file path to output metadata.
Command: submit [options] workflow-source
Submit the workflow to a Cromwell server.
  workflow-source          Workflow source file or workflow url.
  --workflow-root <value>  Workflow root.
  -i, --inputs <value>     Workflow inputs file.
  -o, --options <value>    Workflow options file.
  -t, --type <value>       Workflow type.
  -v, --type-version <value>
                           Workflow type version.
  -l, --labels <value>     Workflow labels file.
  -p, --imports <value>    A zip file to search for workflow imports.
  -h, --host <value>       Cromwell server URL.

Download the OpCenter CLI binary for Linux hosts from the following URL:

https://<op_center_ip_address>/float

Replace <op_center_ip_address> with the public (if you are outside the VPC) or private (if you are inside the VPC) IP address for the OpCenter. If you download the CLI binary (called float) to your local machine, move the file to the Cromwell Host.
Make the CLI binary file executable and add the path to the CLI binary file to your PATH variable.

Open a file called cromwell-float.conf and insert the following contents.

include required(classpath("application"))
# This is an example of how you can use Cromwell to interact with float.

backend {
    default = float

    providers {
        float {
            actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
            config {
                runtime-attributes="""
                String f_cpu = "MINVCPUS"
                String f_memory = "MINMEMORY"
                String f_docker = ""
                String f_extra = ""
    """

    # If an 'exit-code-timeout-seconds' value is specified:
    # - check-alive will be run at this interval for every job
    # - if a job is found to be not alive, and no RC file appears after this interval
    # - Then it will be marked as Failed.
    # Warning: If set, Cromwell will run 'check-alive' for every job at this interval
    exit-code-timeout-seconds = 30

    submit = """
        mkdir -p ${cwd}/execution
        echo "set -e" > ${cwd}/execution/float-script.sh
        echo "cd ${cwd}/execution" >> ${cwd}/execution/float-script.sh
        tail -n +22 ${script} > ${cwd}/execution/no-header.sh
        head -n $(($(wc -l < ${cwd}/execution/no-header.sh) - 14)) ${cwd}/execution/no-header.sh >> ${cwd}/execution/float-script.sh

        float submit -i ${f_docker} -j ${cwd}/execution/float-script.sh --cpu ${f_cpu} --mem ${f_memory} ${f_extra} >  ${cwd}/execution/sbatch.out 2>&1
        cat ${cwd}/execution/sbatch.out | sed -n 's/id: \(.*\)/\1/p' > ${cwd}/execution/job_id.txt
        echo "receive float job id: "
        cat ${cwd}/execution/job_id.txt

        JOB_SCRIPT_DIR=float-jobs/$(cat ${cwd}/execution/job_id.txt)
        mkdir -p $JOB_SCRIPT_DIR
        cd $JOB_SCRIPT_DIR

# create the check alive script
cat <<EOF > float-check-alive.sh
SCRIPT_DIR=$(pwd)
cd ${cwd}/execution
float show -j \$1 --runningOnly > job-status.yaml
if [[ -s job-status.yaml ]]; then
    cat job-status.yaml
else
    float show -j \$1 | grep rc: | tr -cd '[:digit:]' > rc

    if [ ! -s rc ]; then
    # If the rc file is empty, write the default value (e.g., 0)
      echo "127" > rc
    fi

float log cat -j \$1 stdout.autosave > stdout
float log cat -j \$1 stderr.autosave > stderr
fi  
cd $SCRIPT_DIR
EOF

# create the kill script
cat <<EOF > float-kill.sh
SCRIPT_DIR=$(pwd)
cd ${cwd}/execution
float scancel -f -j \$1
cd $SCRIPT_DIR
EOF

        cat ${cwd}/execution/sbatch.out
    """

    kill = """
        source float-jobs/${job_id}/float-kill.sh ${job_id}
    """

    check-alive = """
        source float-jobs/${job_id}/float-check-alive.sh ${job_id}
    """

    job-id-regex = "id: (\\w+)\\n"
            }
        }
    }
}

Replace the following (keep the quotation marks).
- MINVCPUS: minimum number of vCPUs to use as the default for a Worker Node
- MINMEMORY: minimum memory capacity (in GB) to use as the default for a Worker Node.
The string following f_extra is combined with the string following f_extra in the wdl file and appended to the float submit command sent to the OpCenter. The f_extra string in the wdl file overrides the f_extra string in the configuration file if they are in conflict.
The string following f_docker is the name of a default docker image used in case the task in the wdl file does not specify a docker image. You can leave this blank.
Use float login to log in to your OpCenter

Run a Simple Cromwell Workflow

The default backend for Cromwell is the local host. Run a simple "hello world" workflow on the local host by completing the following steps.

Create a file called "helloworld.wdl" and insert:

# Example workflow
workflow myWorkflow {
    call myTask
}

task myTask {
    command {
        echo "hello world"
    }
    output {
        String out = read_string(stdout())
    }
}

Run the "hello world" job by entering:

$ java -jar cromwell-85.jar run helloworld.wdl

The output from Cromwell is verbose. To determine that the job runs successfully and to view the output, look for the following section:

[2023-05-08 21:47:01,69] [info] BackgroundConfigAsyncJobExecutionActor [cb7c6fd7myWorkflow.myTask:NA:1]: Status change from - to Done
[2023-05-08 21:47:03,27] [info] WorkflowExecutionActor-cb7c6fd7-9020-41cf-841f-c95f43ce86da [cb7c6fd7]: Workflow myWorkflow complete. Final Outputs:
{
  "myWorkflow.myTask.out": "hello world"
}
[2023-05-08 21:47:06,67] [info] WorkflowManagerActor: Workflow actor for cb7c6fd7-9020-41cf-841f-c95f43ce86da completed with status 'Succeeded'. The workflow will be removed from the workflow store.
[2023-05-08 21:47:09,04] [info] SingleWorkflowRunnerActor workflow finished with status 'Succeeded'.
{
  "outputs": {
    "myWorkflow.myTask.out": "hello world"
  },
  "id": "cb7c6fd7-9020-41cf-841f-c95f43ce86da"
}

Run a Workflow using OpCenter as a Backend

A Cromwell workflow file can call many tasks. If the tasks are independent, Cromwell can scatter the tasks into parallel "shards." Once the shards are complete, the results can be gathered into a single output. The following example demonstrates different aspects of Cromwell.

Task executed using Local as backend
Task executed by OpCenter as backend
Parallel sharding
JSON files to supply input data and parameters

To run this example, complete the following steps:

Create a file called 3step.wdl and insert the following content.

workflow scatterGather {
    Array[String] names

    call intro
    scatter (name in names) {
        call addList { input: name=name }
    }
    call compileList { input: items=addList.out }
}

task intro {
    command {
        echo "Starting to compile the grocery list"
    }
    output {
        String out = read_string(stdout())
    }
    runtime {
        backend: "Local"
    }
}


task addList {
    String name

    command {
        printf "[cromwell-addList] Add ${name} to list\n"
        sleep 30
    }
    output {
        String out = read_string(stdout())
    }
    runtime {
        f_docker: "cactus"
        f_cpu: "2"
        f_memory: "4"
        f_extra: "--name addList"
    }
}

task compileList {
    Array[String] items

    command {
        printf "[cromwell-compileList] These items are on the grocery list:\n" > my_file.txt
        sleep 1
        echo ${sep=". " items} >> my_file.txt
        cat my_file.txt
        sleep 30
    }
    output {
        String out = read_string(stdout())
    }
    runtime {
        f_docker: "cactus"
        f_extra: "--name compileList"
    }
}

Create an input file called 3step.json and insert the following content.

{
   "scatterGather.names": ["Apples", "Bananas", "Milk", "Bread"]
}

Create an options file called options.json and insert the following content.
```
{
    "write_to_cache": false,
    "read_from_cache": false
}
```

Run the worklow by entering the following command.

java -Dconfig.file=cromwell-float.conf -jar cromwell-85.jar run 3step.wdl --inputs 3step.json --options options.json

The task called "intro" runs locally. The task called addList creates four parallel shards that run on OpCenter. Finally, the output from the four shards are gathered by the task called compileList that runs on OpCenter.

The Cromwell output is voluminous. Check for key events during the run.

Tasks assigned to backends.

[2023-05-09 20:45:17,93] [info] MaterializeWorkflowDescriptorActor [0b810f55]: Call-to-Backend assignments: scatterGather.addList -> float, scatterGather.intro -> Local, scatterGather.compileList -> float

Parallel shards created.

[2023-05-09 20:45:22,46] [info] WorkflowExecutionActor-0b810f55-0d90-4c00-bf41-066e486e45b6 [0b810f55]: Starting scatterGather.addList (4 shards)
[2023-05-09 20:45:26,26] [info] Assigned new job execution tokens to the following groups: 0b810f55: 5
[2023-05-09 20:45:26,52] [info] BackgroundConfigAsyncJobExecutionActor [0b810f55scatterGather.intro:NA:1]: echo "Starting to compile the grocery list"
[2023-05-09 20:45:26,53] [info] DispatchedConfigAsyncJobExecutionActor [0b810f55scatterGather.addList:2:1]: printf "[cromwell-addList] Add Milk to list\n"
sleep 30
[2023-05-09 20:45:26,53] [info] DispatchedConfigAsyncJobExecutionActor [0b810f55scatterGather.addList:3:1]: printf "[cromwell-addList] Add Bread to list\n"
sleep 30
[2023-05-09 20:45:26,54] [info] DispatchedConfigAsyncJobExecutionActor [0b810f55scatterGather.addList:0:1]: printf "[cromwell-addList] Add Apples to list\n"
sleep 30
[2023-05-09 20:45:26,54] [info] DispatchedConfigAsyncJobExecutionActor [0b810f55scatterGather.addList:1:1]: printf "[cromwell-addList] Add Bananas to list\n"

Results gathered.

[2023-05-09 20:51:14,26] [info] WorkflowExecutionActor-0b810f55-0d90-4c00-bf41-066e486e45b6 [0b810f55]: Starting scatterGather.compileList

Job concluded successfully.

[2023-05-09 20:54:38,68] [info] SingleWorkflowRunnerActor workflow finished with status 'Succeeded'.
{
  "outputs": {
    "scatterGather.compileList.out": "[cromwell-compileList] These items are on the grocery list:\n[cromwell-addList] Add Apples to list. [cromwell-addList] Add Bananas to list. [cromwell-addList] Add Milk to list. [cromwell-addList] Add Bread to list",
    "scatterGather.intro.out": "Starting to compile the grocery list",
    "scatterGather.addList.out": ["[cromwell-addList] Add Apples to list", "[cromwell-addList] Add Bananas to list", "[cromwell-addList] Add Milk to list", "[cromwell-addList] Add Bread to list"]
  },
  "id": "0b810f55-0d90-4c00-bf41-066e486e45b6"
}

OpCenter shows the five tasks (four "scatter" tasks and one "gather" task).

$ float squeue -A -f name=List -d

+-----------------------+-------------+-------+-----------+----------------------+----------+------------+
|          ID           |    NAME     | USER  |  STATUS   |     SUBMIT TIME      | DURATION |    COST    |
+-----------------------+-------------+-------+-----------+----------------------+----------+------------+
| fihe9H73Mw7stcdCN1JPD | compileList | admin | Completed | 2023-05-09T20:51:17Z | 2m37s    | 0.0089 USD |
| JO2HXLD0WJLYVPW2npISI | addList     | admin | Completed | 2023-05-09T20:45:29Z | 4m3s     | 0.0025 USD |
| N8JDsSywHuKa3c7s8FZrl | addList     | admin | Completed | 2023-05-09T20:45:29Z | 3m59s    | 0.0024 USD |
| 4t9qV52z8B7423YtfLbve | addList     | admin | Completed | 2023-05-09T20:45:29Z | 3m59s    | 0.0024 USD |
| HsBFzGvPvhfekjFPSJ97A | addList     | admin | Completed | 2023-05-09T20:45:29Z | 4m5s     | 0.0025 USD |
(edited)

Run Cromwell in Server Mode

When used in run mode, Cromwell launches a single workflow from the command line. Run mode is typically used for development. In server mode mode, Cromwell starts a web server that supports a feature-rich REST API. By default, the web server is started on the local host (0.0.0.0) and port 8000 (these can be changed in the configuration file).

To start a Cromwell server on a local Cromwell Host, perform the following steps.

On the Cromwell Host, enter the following command.

java -Dconfig.file=cromwell-float.conf -jar cromwell-85.jar server

Check that the Cromwell Host's inbound filtering rules allow access to port 8000
Open a browser and go to http://<cromwell_host_ip>:8000 where <cromwell_host_ip>is the public (private) IP address of the Cromwell Host if you are outside (inside) the VPC.

To submit a job using the Cromwell web interface, complete the following steps.

Expand the Workflows section and then click Submit a workflow for execution
Click Try it out (on the right-hand side)
Browse your local computer for a workflowSource wdl file
Browse your local computer for a workflowInputs json file
Browse your local computer for a workflowOptions json file
Click Execute (at the bottom of the section). If the job is accepted, the server returns a code 201.
Copy the workflow ID
Click Get the outputs of a workflow
Click Try it out and then paste the workflow ID into the id box
Click Execute

Troubleshooting

As the Cromwell workflow runs, detailed log messages are written to the terminal.