Skip to content

Using Nextflow with MMCloud

Summary

Nextflow is a workflow manager used to run and manage computational pipelines such as those found in bioinformatics. Workflow managers make it easier to run complex analyses where there are a series of — sometimes interconnected — tasks, each of which may involve different software and dependencies.

Nextflow provides a framework for describing how a workflow must be executed and includes a CLI for issuing nextflow commands. The execution environment for each task is described using the Nextflow DSL (Domain-Specific Language). In Nextflow terminology, each task is assigned to an "executor," which is a complete environment for running that step in the analysis.

By attaching a MemVerge-developed plugin to a workflow, Nextflow can use MMCloud as an "executor." From an MMCloud point of view, the execution task that Nextflow assigns to it is an independent job that it runs just like any other batch job. The Nextflow user enjoys the benefits that accrue because of using MMCloud, such as reduced costs and shorter execution times.

This document describes how to use Nextflow with MMCloud so that Nextflow can schedule one or more (or all) of the tasks in a workflow to run on MMCloud. Examples are used to demonstrate the principles; you can adapt and modify as needed to fit your workflow.

Configuration

A Nextflow workflow requires a working directory where temporary files are stored and where a process can access the output of an earlier step. When MMCloud is engaged as an executor, the OpCenter instantiates a Worker Node (a container running in its own virtual machine) for each step in the process pipeline. Every Worker Node and the Nextflow Host (the host where the nextflow binary is installed) must have access to the same working directory — for example, the working directory can be an NFS-mounted directory or an S3 bucket. The figure below shows a configuration where an NFS Server provides shared access to the working directory and also acts as repository for input data and the final output.

Nextflow Configuration using NFS Server

The Nextflow configuration file describes the environment for each executor. To use MMCloud as an executor, the configuration file must contain definitions for:

  • Nextflow plugin (source code and documentation are available here)
  • Working directory (directory where the Nextflow Host and all the Worker Nodes have r/w access)
  • IP address of the OpCenter
  • Login credentials for the OpCenter (login credentials can also be provided using environment variables or the OpCenter secret manager)
  • Location of the shared directory if using an NFS server (not needed if using S3)

The operation of Nextflow using an S3 bucket as the working directory is shown in the following figure.

RStudio Session Connected to Gateway

Operation

The Nextflow job file (a file with extension .nf) describes the workflow and specifies the executor for each process. When the user submits a job using the nextflow run command (as shown in the figure), any process with executor defined as "float" is scheduled for the OpCenter. Combining information from the configuration file and the job file, the Nextflow plugin formulates a float sbatch command string and submits the job to the OpCenter. This procedure is repeated for every task in the workflow that has "float" as the executor. Every Worker Node mounts the same working directory so that the Nextflow Host and all the Worker Nodes read from, and write to, the same shared directory.

Note

CSPs impose limits on services instantiated by each account. In AWS, these limits are called "service quotas" and apply to every AWS service, generally on a region by region basis. Some Nextflow pipelines instantiate a large number of compute instances, a number large enough to exceed the AWS EC2 service quota. If this happens, increase your AWS EC2 service quota and rerun the pipeline.

Nextflow Operation with MMCloud

Requirements

To use Nextflow with MMCloud, you need the following:

  • MMCloud Carmel 2.0 release or later
  • Running instance of OpCenter with valid license
  • Linux virtual machine running in same VPC as OpCenter (call this the Nextflow Host)
  • On the Nextflow Host:
    • Java 11 or later release (the latest Long Term Support release is Java 17)
    • MMCloud CLI binary. You can download it from the OpCenter.
    • Nextflow
    • Nextflow configuration file
    • Nextflow job file
  • (Optional) NFS Server to provide shared working directory. There are other possibilities; for example, the shared working directory can be hosted on the Nextflow Host or the shared working directory can be mounted directly from AWS S3.

Prepare the Nextflow Host

The Nextflow Host is a Linux virtual machine running in the same VPC as the OpCenter. If the Nextflow host is in a different VPC subnet, ensure that the Nextflow host can reach the OpCenter and that it can mount the file system from the NFS Server (if used).

All network communication among the OpCenter, the Nextflow Host, NFS Server (if used), and Worker Nodes must use private IP addresses. If the Nextflow Host uses an NFS-mounted file system as the working directory, ensure that any firewall rules allow access to port 2049 (the port used by NFS).

  • Check the version of java installed on the Nextflow Host by entering:

    $ java -version
    openjdk version "17.0.6-ea" 2023-01-17 LTS
    OpenJDK Runtime Environment (Red_Hat-17.0.6.0.9-0.4.ea.el9) (build 17.0.6-ea+9-LTS)
    OpenJDK 64-Bit Server VM (Red_Hat-17.0.6.0.9-0.4.ea.el9) (build 17.0.6-ea+9-LTS, mixed mode, sharing)
    
  • If needed, install Java 11 or later. Commercial users of Oracle Java need a subscription. Alternatively, you can install OpenJDK under an open-source license by entering (on a Red Hat-based Linux system):

    sudo dnf install java-17-openjdk

  • Install nextflow by entering:

    sudo curl -s https://get.nextflow.io | bash

    This installs nextflow in the current directory. The installation described here assumes that you install nextflow in your home directory. You can also create a directory for your nextflow installation, for example, sudo mkdir ~/nextflow

  • Check your nextflow installation by entering:

    $ ./nextflow run hello
    N E X T F L O W  ~  version 23.04.2
    Launching `https://github.com/nextflow-io/hello` [voluminous_liskov] DSL2 - revision: 1d71f857bb [master]
    executor >  local (4)
    [13/1bb6ed] process > sayHello (3) [100%] 4 of 4 ✔
    Bonjour world!
    
    Ciao world!
    
    Hola world!
    
    Hello world!
    

    If this job does not run, check the log called .nextflow.log.

  • Upgrade to the latest version of Nextflow by entering ./nextflow self-update

  • Download the OpCenter CLI binary for Linux hosts from the following URL:

    https://<op_center_ip_address>/float

    Replace <op_center_ip_address> with the public (if you are outside the VPC) or private (if you are inside the VPC) IP address for the OpCenter. You can click on the link to download the CLI binary (called float) or you can enter the following.

    wget https://<op_center_ip_address>/float --no-check-certificate

    If you download the CLI binary to your local machine, move the file to the Nextflow Host.

  • Make the CLI binary file executable and add the path to the CLI binary file to your PATH variable.

Note

You can use the float submit --template nextflow:jfs command to create a Nextflow host with all the required software installed (including JuiceFS). Contact your MemVerge support team for additional details.

(Optional) Prepare the Working Directory Host

The Nextflow Host and the Worker Nodes must have access to a shared working directory. There are several ways to achieve this. In the example shown here, a separate Linux virtual machine (the NFS Server) is started in the same VPC as the OpCenter.

Alternatively, you can edit the Nextflow configuration file to automatically mount an S3 bucket as a filesystem. Instructions on how to do this are in the next section titled "Use S3 Bucket as Filesystem."

You can obtain instructions on turning a generic CentOS-based server into an NFS server from this link. NFS uses port 2049 for connections, so ensure that any firewall rules allow access to port 2049. If the Working Directory Host is in a different VPC subnet, ensure that it can reach the Nextflow host and Worker Nodes. Set the subnet mask in /etc/exports to allow the Nextflow Host and Worker Nodes to mount file systems from the Working Directory Host.

Example:

$ cat /etc/exports
/mnt/memverge/shared 172.31.0.0/16(rw,sync,no_root_squash)
  • Log in to the NFS Server and create the shared working directory.

    `sudo mkdir /mnt/memverge/shared
    `sudo chmod ugo+r+w+x /mnt/memverge/shared
    
  • Log in to the Nextflow Server and mount the shared working directory (use the NFS Server's private IP address). Use df to check that the volume mounted successfully.

    $ sudo mkdir /mnt/memverge/shared
    $ sudo mount -t nfs <nfs_server_ip_address>:/mnt/memverge/shared /mnt/memverge/shared
    $ df
    

(Optional) Use S3 Bucket as Filesystem

Some workflows initiate hundreds or even thousands of tasks simultaneously. If all these tasks access the NFS server at the same time, a bottleneck can occur. For these workflows, it can help to use an S3 bucket as the working directory.

Note

When used with the appropriate configuration file, the Nextflow Host and the Worker Nodes automatically mount the S3 bucket as a linux file system.

Complete the following steps.

  • Log in to your AWS Management Console.

    • Open the Amazon S3 console.
    • From the left-hand panel, select Buckets.
    • On the right-hand side, click Create bucket and follow the instructions.

      You must choose a bucket name (nfshareddir is used as a placeholder in this document) that is unique across all regions except China and the US government. Buckets are accessible across regions.

    • On the navigation bar, all the way to the right, click your username and go to Security credentials.

    • Scroll down the page to the section called Access keys and click Create access key.
    • Download the csv file.

      The csv file has two entries, one called Access key ID and one called Secret access key. You enter these in the Nextflow configuration file.

(Optional) Use Distributed File System

While NFS and S3 are viable options for providing the shared working directory, unacceptable performance may occur when running pipelines that generate I/O at high volume or high throughput. For these pipelines, the use of a high-performance distributed file system is recommended. OpCenter supports two distributed file systems.

  • Fusion

    Fusion is a POSIX-compliant distributed file system optimized for Nextflow pipelines. Fusion requires the use of Wave containers. A description of how to use the Fusion file system with MMCloud is available here.

  • JuiceFS

    JuiceFS is an open-source distributed file system that provides an API to access a POSIX-compliant file system built on top of a range of cloud storage services. If you use the float submit --template nextflow:jfs option to create a Nextflow host, the JuiceFS environment is automatically created.

Prepare the Configuration File

Nextflow configuration files can be extensive — they can include profiles for many executors. Create a simple configuration for using MMCloud as the sole executor by following these steps.

In the directory where you installed nextflow, create a file called nextflownfs.config. When a parameter requires an IP address, use a private IP address. The following configuration file uses the NFS server as the shared working directory.

$ cat nextflownfs.config
plugins {
  id 'nf-float'
}
workDir = '/mnt/memverge/shared'
podman.registry ='quay.io'
executor {
  queueSize = 100
}
float {
  address = 'OPCENTER_IP_ADDRESS'
  username = 'USERNAME'
  password = 'PASSWORD'
  nfs = 'nfs://NFS_SERVER_IP_ADDRESS/mnt/memverge/shared'
}
process {
  executor = 'float'
  container = 'docker.io/cactus'
}

Replace the following (keep the quotation marks).

  • USERNAME: username to log in to the OpCenter. If absent, value of environment variable MMC_USERNAME used.
  • PASSWORD: password to log in to the OpCenter. If absent, value of environment variable MMC_PASSWORD used.
  • OPCENTER_IP_ADDRESS: private IP address of the OpCenter. If absent, value of environment variable MMC_ADDRESS used. If using multiple OpCenters, separate entries with a comma.
  • NFS_SERVER_IP_ADDRESS: private IP address of the NFS server.

Nextflow secrets can supply values for USERNAME and PASSWORD as follows.

  • Set the values

    $ nextflow secrets set MMC_USERNAME "..."
    $ nextflow secrets set MMC_PASSWORD "..."
    
  • Insert in Nextflow configuration file.

    float {
        username = secrets.MMC_USERNAME
        password = secrets.MMC_PASSWORD
    }
    

Explanations of the parameters in the configuration file follow.

  • plugins section

    The MemVerge plugin called nf-float is included in the Nextflow Plugins index. This means that the reference to nf-float resolves to "nf-float version: latest" and Nextflow automatically downloads the latest version of the nf-float plugin. The nf-float plugin is updated frequently and configuration parameters change or new ones added. See the nf-float README on github for the latest details.

  • workDir specifies the path to the shared directory if the directory is NFS-mounted. If an S3 bucket is used, workDir specifies the bucket and path to folder in the format s3://bucket_name/folder

  • podman.registry specifies the default container registry (the choices are usually quay.io or docker.io). If docker.io is specified, then /memverge/ is preprended to the container name. For example, 'cactus' becomes 'docker.io/memverge/cactus'.
  • executor section

    queueSize limits the maximum number of concurrent requests sent to the OpCenter (default 100).

  • float section (all options are listed below)

    • address: private IP address of the OpCenter. Specify multiple OpCenters using the format 'A.B.C.D', 'E.F.G.H' or 'A.B.C.D, E.F.G.H'.
    • username, password: credentials for logging in to the OpCenter.
    • nfs: parameter describing (only if using NFS) where the shared directory must be mounted from. Do not use with S3.
    • The string following commonExtra is appended to the float command that is submitted to the OpCenter. The string shown here is an example: you can use any float command option.
    • migratePolicy: the migration policy used by WaveRider, specified as a map. Refer to the CLI command reference for the list of available options.
    • vmPolicy: the VM creation policy, specified as a map. Refer to the CLI command reference for the list of available options.
    • timeFactor: a number (default value is 1) that multiplies the time supplied by the Nextflow task. Use it to prevent task timeouts.
    • maxCpuFactor: a number (default value is 4) used to calculate the maximum number of CPU cores for the instance, namely, maximum number of CPU cores is set to maxCpuFactor multiplied by the cpus specified for the task.
    • maxMemoryFactor: a number (default value is 4) used to calculate the maximum memory for the instance, namely, maximum memory is set to maxMemoryFactor multiplied by the memory specified for the task.
    • commonExtra: a string that allows the user to specify additional options to float submit command. This string is appended to every float submit command.
  • process section

    The nextflow language defines process "directives," which are optional parameters that influence the execution environment for tasks in the nextflow job file. If the nextflow job file does not specify a value for a process directive, the default value is used. Use the configuration file to override the nextflow defaults.

    If a value for container is not specified and the task does not specify a container value, the job fails.

    If scratch is set to true, process execution occurs in a temporary folder that is local to the execution node. For MMCloud, the execution node is a container running in a virtual machine. The process executes in the container's root volume (default 6GB). Increase the root volume size (to create space for the temporary folder) by including the extra directive.

    For example,

    process {
            scratch = true
            extra = '--imageVolSize 120'
            ...
            }
            
            

  • aws section

    Specifies the credentials for accessing the S3 buckets if used.

The following configuration file uses the S3 bucket as the shared working directory. The Nextflow host and all worker nodes automatically mount the S3 bucket as a file system.

$ cat nextflows3.config
plugins {
  id 'nf-float'
}
workDir = 's3://[S3BUCKET]'
podman.registry = 'quay.io'
executor {
  queueSize = 100
}
float {
  address = 'OPCENTER_IP_ADDRESS'
  username = 'USERNAME'
  password = 'PASSWORD'
}
process {
  executor = 'float'
}
aws {
  accessKey = 'ACCESS_KEY'
  secretKey = 'SECRET_KEY'
  region = 'REGION'
}

Replace the following (keep the quotation marks).

  • S3BUCKET: name of the S3 bucket to use as the shared working directory (nfshareddir is used as a placeholder in this document)
  • OPCENTER_IP_ADDRESS: as described previously
  • USERNAME: as described previously
  • PASSWORD: as described previously
  • ACCESS_KEY: access key ID for your AWS account. See below for options for providing the access key ID.
  • SECRET_KEY: secret access key for your AWS account. See below for options for providing the secret access key.
  • REGION: region in which the S3 bucket is located.

Provide the access key ID and secret access key using one of the following methods.

  • Enter the access key ID and secret access key as cleartext
  • Set the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
  • Set the environment variables AWS_ACCESS_KEY and AWS_SECRET_KEY
  • Use the AWS CLI command aws configure to populate the default profile in the AWS credentials file located at ~/.aws/credentials
  • Use the temporary AWS credentials provided by an IAM instance role. See IAM Roles documentation for details.

Prepare the Nextflow Job File

The nextflow job file describes the workflow and how the workflow must be executed. To demonstrate how a simple workflow executes, follow these steps (example is adapted from nextflow.io.)

  • Create a sample fasta file called sample.fa and place it in the shared working directory, for example, in /mnt/memverge/shared if you are using the NFS server, or in s3://nfshareddir if you are using an S3 bucket (you can use any S3 bucket where you have r/w access — it does n't have to be the shared directory specified in the nextflow configuration file).

    $ cat /mnt/memverge/shared/sample.fa
    >seq0
    FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF
    >seq1
    KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMRLME LKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
    >seq2
    EEYQTWEEFARAAEKLYLTDPMKVRVVLKYRHCDGNLCMKVTDDAVCLQYKTDQAQDVKKVEKLHGK
    >seq3
    MYQVWEEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVCLQYKTDQAQDVK
    >seq4
    EEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVVSYEMRLFGVQKDNFALEHSLL
    >seq5
    SWEEFAKAAEVLYLEDPMKCRMCTKYRHVDHKLVVKLTDNHTVLKYVTDMAQDVKKIEKLTTLLMR
    >seq6
    FTNWEEFAKAAERLHSANPEKCRFVTKYNHTKGELVLKLTDDVVCLQYSTNQLQDVKKLEKLSSTLLRSI
    >seq7
    SWEEFVERSVQLFRGDPNATRYVMKYRHCEGKLVLKVTDDRECLKFKTDQAQDAKKMEKLNNIFF
    >seq8
    SWDEFVDRSVQLFRADPESTRYVMKYRHCDGKLVLKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
    >seq9
    KNWEDFEIAAENMYMANPQNCRYTMKYVHSKGHILLKMSDNVKCVQYRAENMPDLKK
    >seq10
    FDSWDEFVSKSVELFRNHPDTTRYVVKYRHCEGKLVLKVTDNHECLKFKTDQAQDAKKMEK
    
  • Build a workflow that splits the fasta sequences into separate files and reverses the order by creating a nextflow job file called splitfanfs.nf.

    If you are using the S3 bucket, replace params.in = "/mnt/memverge/shared/sample.fa" with params.in = "s3://nfshareddir/sample.fa" (use the name of the S3 bucket where you placed the sample fasta file if you did not place the file in nfshareddir).

    $ cat splitfanfs.nf
    #!/usr/bin/env nextflow
    
    params.in = "/mnt/memverge/shared/sample.fa"
    
    /*
     * Split a fasta file into multiple files
     */
    process splitSequences {
        executor = 'float'
        container = 'docker.io/memverge/cactus'
        cpus = '4'
        memory = '8 GB'
        extra = '--vmPolicy [spotOnly=true]'
    
        input:
        path 'input.fa'
    
        output:
        path 'seq_*'
    
        """
        awk '/^>/{f="seq_"++d} {print > f}' < input.fa
        """
    }
    
    /*
     * Reverse the sequences
     */
    process reverse {
        executor = 'float'
        container = 'docker.io/memverge/cactus'
        cpus = '4'
        memory = '8 GB'
    
        input:
        path x
    
        output:
        stdout
    
        """
        cat $x | rev
        """
    }
    
    /*
     * Define the workflow
     */
    workflow {
        splitSequences(params.in) \
          | reverse \
          | view
    }
    

    The string following extra is combined with the string following commonExtra in the config file and appended to the float command submitted to the OpCenter. The string value shown here after extra is an example: use any float command option. The extra setting overrides the commonExtra setting if they are in conflict.

Run a Workflow using Nextflow

Run a simple workflow by entering:

$ ./nextflow run splitfanfs.nf -c nextflownfs.config -cache false
N E X T F L O W  ~  version 23.04.2
Launching `splitfanfs.nf` [prickly_watson] DSL2 - revision: dadefd0d0b
executor >  float (2)
[67/59f2c6] process > splitSequences [100%] 1 of 1 ✔
[f6/32fab1] process > reverse        [100%] 1 of 1 ✔
0qes>
FKEIKKVDQAQDTRYVLCVLDDTVKICLNGDVHRYKLVVRVKMPDALYLKEAARSFEEWTQF
9qes>
KKLDPMNEARYQVCKVNDSMKLLIHGKSHVYKMTYRCNQPNAMYMNEAAIEFDEWNK
01qes>
KEMKKADQAQDTKFKLCEHNDTVKLVLKGECHRYKVVYRTTDPHNRFLEVSKSVFEDWSDF
1qes>
MLTFFINNLKEMKKAEQAQDTKFKLCEKNDTVKL EMLRMLQSHFKEIKKVDQAQDTRYLLCVVDDTVKICLNGDCHRYKLVVRVKMPDAQYLKEAARTFEEWTRYK
2qes>
KGHLKEVKKVDQAQDTKYQLCVADDTVKMCLNGDCHRYKLVVRVKMPDTLYLKEAARAFEEWTQYEE
3qes>
KVDQAQDTKYQLCVSNDTVKICLNGDCHRYKLVVRVKMPDTLYLKEVARSFEEWVQYM
4qes>
LLSHELAFNDKQVGFLRMEYSVVSNDTVKICLNGDCHRYKLVVRVKMPDTLYLKEVARSFEE
5qes>
RMLLTTLKEIKKVDQAMDTVYKLVTHNDTLKVVLKHDVHRYKTCMRCKMPDELYLVEAAKAFEEWS
6qes>
ISRLLTSSLKELKKVDQLQNTSYQLCVVDDTLKLVLEGKTHNYKTVFRCKEPNASHLREAAKAFEEWNTF
7qes>
FFINNLKEMKKADQAQDTKFKLCERDDTVKLVLKGECHRYKMVYRTANPDGRFLQVSREVFEEWS
8qes>
MLTFFINNLKEMKKAEQAQDTKFKLCEKNDTVKLVLKGDCHRYKMVYRTSEPDARFLQVSRDVFEDWS

Completed at: 02-Aug-2023 20:04:20
Duration    : 9m 23s
CPU hours   : (a few seconds)
Succeeded   : 2

This nextflow job file defines two processes that use MMCloud as the executor. Using the float squeue command or the OpCenter GUI, you can view the two processes executed by OpCenter.

$ float squeue -f image='cactus' -f status='Completed'
+--------+------------------+-------+-----------+-------------+----------+------------+
|  ID    |       NAME       | USER  |  STATUS   |SUBMIT TIME  | DURATION |    COST    |
+--------+------------------+-------+-----------+-------------+----------+------------+
| O7w... | cactus-m5.xlarge | admin | Completed | ...9:54:59Z | 2m54s    | 0.0031 USD |
| L17... | cactus-m5.xlarge | admin | Completed | ...9:58:22Z | 5m21s    | 0.0061 USD |
+--------+------------------+-------+-----------+-------------+----------+------------+
(edited)

Run an RNA Sequencing Workflow

This example (adapted from nextflow.io) uses publicly available data. Get the data here. For simple configuration, place this data in the shared working directory in a folder called /mnt/memverge/shared/nextflowtest/data/ggal if you are following the NFS example or in s3://nfshareddir/ggal if you are following the S3 bucket example. In general, input data is not stored in the shared working directory; for example, input data is often located in a publicly accessible S3 bucket.

  • Build a nextflow job file called rnaseqnfs.nf with the following content if you are following the NFS server example. If you are following the S3 example, create a folder called results in s3://nfshareddir and replace /mnt/memverge/shared/nextflowtest/data with s3://nfshareddir and replace /mnt/memverge/shared/results with s3://nfshareddir/results.

    The rnaseq-nf image is not a "built-in" image in the OpCenter App Library. Specifying the URI for the rnaseq-nf image in the job file causes the OpCenter to pull the latest version of the image from the default registry.

    $ cat rnaseqnfs.nf
    #!/usr/bin/env nextflow
    
    params.reads = "/mnt/memverge/shared/nextflowtest/data/ggal/ggal_gut_{1,2}.fq"
    params.transcriptome = "/mnt/memverge/shared/nextflowtest/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa"
    params.outdir = "/mnt/memverge/shared/results"
    
    workflow {
        read_pairs_ch = channel.fromFilePairs( params.reads, checkIfExists: true )
    
        INDEX(params.transcriptome)
        FASTQC(read_pairs_ch)
        QUANT(INDEX.out, read_pairs_ch)
    }
    
    process INDEX {
        executor = 'float'
        container = 'nextflow/rnaseq-nf'
        cpus = '4'
        memory = '16 GB'
        tag "$transcriptome.simpleName"
    
        input:
        path transcriptome
    
        output:
        path 'index'
    
        script:
        """
        salmon index --threads $task.cpus -t $transcriptome -i index
        """
    }
    
    process FASTQC {
        executor = 'float'
        container = 'nextflow/rnaseq-nf'
        cpus = '4'
        memory = '16 GB'
        tag "FASTQC on $sample_id"
        publishDir params.outdir
    
        input:
        tuple val(sample_id), path(reads)
    
        output:
        path "fastqc_${sample_id}_logs"
    
        script:
        """
        mkdir fastqc_${sample_id}_logs
        fastqc -o fastqc_${sample_id}_logs -f fastq -q ${reads}
        """
    
    }
    
    process QUANT {
        executor = 'float'
        container = 'nextflow/rnaseq-nf'
        cpus = '4'
        memory = '16 GB'
        tag "$pair_id"
        publishDir params.outdir
    
        input:
        path index
        tuple val(pair_id), path(reads)
    
        output:
        path pair_id
    
        script:
        """
        salmon quant --threads $task.cpus --libType=U -i $index -1 ${reads[0]} -2 ${reads[1]} -o $pair_id
        """
    }
    
  • Run the workflow by entering

    $ ./nextflow run rnaseqnfs.nf -c nextflownfs.config -cache false
    N E X T F L O W  ~  version 23.04.2
    Launching `rnaseqnfs.nf` [soggy_mccarthy] DSL2 - revision: fca2fdc7d3
    executor >  float (3)
    [10/7c2cc3] process > INDEX (ggal_1_48850000_49020000) [100%] 1 of 1 ✔
    [d6/8c9f6b] process > FASTQC (FASTQC on ggal_gut)      [100%] 1 of 1 ✔
    [2f/378946] process > QUANT (ggal_gut)                 [100%] 1 of 1 ✔
    Completed at: 02-Aug-2023 21:38:56
    Duration    : 6m 16s
    CPU hours   : 0.2
    Succeeded   : 3
    

    This nextflow job file defines three processes that use MMCloud as the executor. You can confirm that these processes execute on MMCloud.

    cat float squeue -f image='rnaseq-nf-np6l7z' -f status='Completed'
    +-----------+------------+-------+-----------+-----------+---------+----------+
    |  ID       | NAME       | USER  |  STATUS   |SUBMIT TIME| DURATION|   COST   |
    +-----------+------------+-------+-----------+-----------+---------+----------+
    | IvwWWY... |rnaseq-nf...| admin | Completed |21:52:33Z  | 2m38s   |0.0051 USD|
    | lakXQg... |rnaseq-nf...| admin | Completed |21:52:35Z  | 2m39s   |0.0052 USD|
    | SyVSmL... |rnaseq-nf...| admin | Completed |21:55:08Z  | 2m28s   |0.0048 USD|
    +-----------+------------+-------+-----------+-----------+---------+----------+
    (edited)
    
  • View the output at /mnt/memverge/shared/results (or s3://nfshareddir/results).

    $ ls
    fastqc_ggal_gut_logs  ggal_gut
    

    Some of the output is in html format, for example:

    Example Output from RNA Sequencing Workflow

Integration with Seqera Platform

Seqera Platform is a product from Seqera Labs that is used to launch, monitor, and manage computational pipelines from a web interface. You can also launch a Nextflow pipeline using the CLI on the Nextflow Host and monitor the progress in a cloud-hosted Seqera Platform instance provided by Seqera Labs. Instructions are available here.

  • Sign in to Seqera Platform. If you do not have an account, follow the instructions to register.
  • Create an access token using the procedure described here. Copy the access token to your clipboard.
  • From a terminal on the Nextflow Host, enter:

    $ export TOWER_ACCESS_TOKEN=eyxxxxxxxxxxxxxxxQ1ZTE=
    

    Replace eyxxxxxxxxxxxxxxxQ1ZTE= with the access token you copied to the clipboard.

  • Launch your Nextflow pipeline with the with-tower flag. For example:

    $ nextflow run rnaseqs3.nf -c nextflows3.config -cache false -with-tower
    N E X T F L O W  ~  version 23.04.2
    Launching `rnaseqs3.nf` [big_ramanujan] DSL2 - revision: 9c7f478123
    Monitor the execution with Nextflow Tower using this URL: https://tower.nf/user/user_name/watch/1rjpVakrhQ3wAf
    executor >  float (3)
    [31/94c152] process > INDEX (ggal_1_48850000_49020000) [100%] 1 of 1 ✔
    [1e/3c939b] process > FASTQC (FASTQC on ggal_gut)      [100%] 1 of 1 ✔
    [e7/6ceef4] process > QUANT (ggal_gut)                 [100%] 1 of 1 ✔
    Completed at: 02-Aug-2023 23:35:06
    Duration    : 5m 34s
    CPU hours   : (a few seconds)
    Succeeded   : 3
    
  • Open a browser and go to the URL.

Seqera Platform

Running the Nextflow Host as an MMCloud Job

Nextflow requires a Nextflow host to run the Nextflow executable. There are multiple ways of creating a Nextflow host:

  • Standalone Linux server that you instantiate
  • A containerized application that runs as an MMCloud job until you cancel the job (call this the "persistent Nextflow host")
  • A containerized application that runs as an MMCloud job for the duration of the Nextflow pipeline and is then automatically canceled (call this the "transient Nextflow host")

MMCloud includes a template for creating a persistent Nextflow host and a container image for creating a transient Nextflow host. Both these solutions automatically create a JuiceFS file system as the shared work space required by Nextflow. With a configuration change, Fusion can be used instead of JuiceFS.

Persistent Nextflow Host in AWS

To deploy a persistent Nextflow host with JuiceFS enabled in AWS, complete the following steps.

  • Log in to your AWS Management console
  • Create a security group to allow inbound access to port 6868 (the port used by JuiceFS). Copy the security group ID (it is a string that looks like sg-0054f1eaadec3bc76).
  • Create an S3 bucket to support JuiceFS. Copy the URL for the S3 bucket (for example, https://juicyfsforcedric.s3.amazonaws.com)

    Note

    Do not include any folders in the S3 bucket URL.

  • Start the Nextflow host by entering the following command.

    float submit --template nextflow:jfs -n JOBNAME -e BUCKET=BUCKETURL --migratePolicy [disable=true] --securityGroup SG_ID

    Replace:

    • JOBNAME: name to associate with job
    • BUCKETURL: URL to locate S3 bucket
    • SG_ID: security group ID

    Example:

    float submit --template nextflow:jfs -n tjfs -e BUCKET=https://juicyfsforcedric.s3.amazonaws.com --migratePolicy [disable=true] --securityGroup sg-0054f1eaadec3bc76
    
  • If security credentials are required to access the S3 bucket, add the following options to the float submit command.

    -e BUCKET_ACCESS_KEY={secret:KEY_NAME} -e BUCKET_SECRET_KEY={secret:SECRET_NAME}

    and replace:

    • KEY_NAME: name associated with access key ID
    • SECRET_NAME: name associated secret access key
    • Keep entering float list until the status of the job with the name JOBNAME changes to executing. Copy the ID associated with this job.
    • Retrieve the ssh key for this host by entering the following command.
    float secret get JOB_ID_SSHKEY > jfs_ssh.key
    

    Replace JOB_ID: the job ID associated with this job (prepend to '_SSHKEY').

    Example:

    float secret get S2zliQLp7NnNjFeUeVjOe_SSHKEY > jfs_ssh.key
    
  • Change the permissions on the ssh key file by entering the following.

    chmod 600 jfs_ssh.key
    
  • Establish an ssh session with the Nextflow host by entering the following.

    ssh -i jfs_ssh.key USER@NEXTFLOW_HOST_IP
    

    Replace:

    • USER: username of the user who submitted the job to create the Nextflow host. If admin submitted the job, use nextflow as the username.
    • NEXTFLOW_HOST_IP: public IP address of the Nextflow host.
    • Check that you are in the correct working directory, that the environment variables are set, and that the configuration template is available.
    # pwd
    /mnt/jfs/nextflow
    # env|grep HOME
    HOME=/mnt/jfs/nextflow
    NXF_HOME=/mnt/jfs/nextflow
    # ls
    mmcloud.config.template
    
  • Make a copy of the template file by entering the following.

    cp mmcloud.config.template mmcloud.config
    
  • Edit the config file as follows.

    # cat mmcloud.config
    plugins {
      id 'nf-float'
    }
    
    workDir = '/mnt/jfs/nextflow'
    
    process {
        executor = 'float'
        errorStrategy = 'retry'
        extra ='  --dataVolume [opts=" --cache-dir /mnt/jfs_cache "]jfs://NEXTFLOW_HOST_PRIVATE_IP:6868/1:/mnt/jfs --dataVolume [size=120]:/mnt/jfs_cache'
    }
    
    podman.registry = 'quay.io'
    
    float {
        address = 'OPCENTER_PRIVATE_IP:443'
        username = 'USERNAME'
        password = 'PASSWORD'
    }
    
    // AWS access info if needed
    aws {
      client {
        maxConnections = 20
        connectionTimeout = 300000
      }
    /*
      accessKey = 'BUCKET_ACCESS_KEY'
      secretKey = 'BUCKET_SECRET_KEY'
    */
    }
    

    Replace:

    • NEXTFLOW_HOST_PRIVATE_IP: private IP address of the Nextflow host.
    • OPCENTER_PRIVATE_IP: private IP address of the OpCenter.
    • USERNAME and PASSWORD: credentials to log in to the OpCenter.
    • If needed, uncomment the block containing the S3 bucket credentials and insert values for BUCKET_ACCESS_KEY and BUCKET_SECRET_KEY.

You are now ready to submit a Nextflow pipeline following the usual procedure.

Note

Upon completion, each Nextflow pipeline leaves a working directory and other related files and directories in the JuiceFS file system, which maps to many small data chunks in the specified S3 bucket. When the Nextflow host is deleted, these data chunks remain in the S3 bucket, but are not readable. It is recommended that you periodically delete the working directory and related files and directories. Delete all files and directories before deleting the Nextflow host.

Example: Running an nf-core/sarek pipeline

  • Sign in to Nextflow Tower. If you do not have an account, follow the instructions to register.
  • Create an access token using the procedure described here. Copy the access token to your clipboard.
  • Start a tmux session by entering the following.

    tmux new -s SESSION_NAME
    

    Replace SESSION_NAME with name to associate with tmux session.

    Example:

    tmux new -s nfjob
    

    Note

    If the ssh session disconnects, re-establish the connection and reattach to the tmux session by entering the following.

    tmux attach -t SESSION_NAME
    
  • At the terminal prompt, enter:

    export TOWER_ACCESS_TOKEN=eyxxxxxxxxxxxxxxxQ1ZTE=
    

    where eyxxxxxxxxxxxxxxxQ1ZTE= is the access token you copied to the clipboard.

  • Run the pipeline by entering the following command.

    nextflow run nf-core/sarek -c mmcloud.config -profile test_full --outdir 's3://OUTPUT_BUCKET' -cache false -with-tower
    

    Replace OUTPUT_BUCKET with the S3 bucket (or S3 bucket/folder) where output is written to (you must have rw access to this bucket).

  • Open a browser and go to the Tower monitoring console.

  • Click the Runs tab and select your job.
  • (Optional) When the pipeline completes, delete the working directory and related files.

    Example:

    rm -r 0*
    rm *.tsv
    

Before deleting the Nextflow host, delete all files in the JuiceFS file system and then unmount the JuiceFS file system by entering the following commands.

rm -rf /mnt/jfs/*
umount /mnt/jfs

Transient Nextflow Host in AWS

To deploy a transient Nextflow host with JuiceFS enabled in AWS, complete the following steps.

  • Log in to your AWS Management console.
  • Create a security group to allow inbound access to port 6868 (the port used by JuiceFS). Copy the security group ID (it is a string that looks like sg-0054f1eaadec3bc76).
  • Create an S3 bucket to support JuiceFS.
  • On your local machine, create a directory to act as the home directory for your Nextflow pipeline, and then cd to this directory.
  • Download the host-init script from MemVerge's public repository by entering the following command (you don't need to edit this script, but you use it later).

    wget https://mmce-data.s3.amazonaws.com/juiceflow/v1/aws/transient_JFS_AWS.sh

  • Download the job submit script from MemVerge's public repository by entering the following command (you need to edit this script).

    wget https://mmce-data.s3.amazonaws.com/juiceflow/v1/aws/job_submit_AWS.sh

  • Edit job_submit_AWS.sh to customize it for your Nextflow pipeline. Here is an example that runs a simple "Hello World" pipeline.

    Note

    The transient Nextflow host runs inside a container in a worker node. The job_submit_AWS.sh script executes a nextflow run command with its associated Nextflow script and configuration files. The Nextflow script and configuration files must be accessible inside the container. One way to accomplish this is to copy the Nextflow script and configuration files from an S3 bucket to a local volume mounted by the container. In the example shown, here files are embedded in job_submit_AWS.sh.

    #!/bin/bash
    
    # ---- User Configuration Section ----
    # These configurations must be set by the user before running the script.
    
    # ---- Optional Configuration Section ----
    # These configurations are optional and can be customized as needed.
    
    # JFS (JuiceFS) Private IP: Retrieved from the WORKER_ADDR environment variable.
    jfs_private_ip=$(echo $WORKER_ADDR)
    
    # Work Directory: Defines the root directory for working files. Optional suffix can be added.
    workDir_suffix=''
    workDir='/mnt/jfs/'$workDir_suffix
    mkdir -p $workDir  # Ensures the working directory exists.
    cd $workDir  # Changes to the working directory.
    export NXF_HOME=$workDir  # Sets the NXF_HOME environment variable to the working directory.
    
    # ---- Nextflow Configuration File Creation ----
    # This section creates a Nextflow configuration file with various settings for the pipeline execution.
    
    outbucket=$(echo $OUTBUCKET)
    
    # Use cat to create or overwrite the mmc.config file with the desired Nextflow configurations.
    # NOTE: S3 keys and OpCenter information appended to the end of the config file. No need to add them now
    # Modify the vmPolicy parameters as needed
    cat > mmc.config << EOF
    // enable nf-float plugin.
    plugins {
        id 'nf-float'
    }
    
    // Process settings: Executor, error strategy, and resource allocation specifics.
    process {
        executor = 'float'
        errorStrategy = 'retry'
        extra = '--dataVolume [opts=" --cache-dir /mnt/jfs_cache "]jfs://${jfs_private_ip}:6868/1:/mnt/jfs --dataVolume [size=120]:/mnt/jfs_cache --vmPolicy [spotOnly=true,retryLimit=10,retryInterval=300s]'
    }
    
    // Directories for Nextflow execution.
    workDir = '${workDir}'
    launchDir = '${workDir}'
    
    EOF
    
    
    cat > hw.nf << EOF
    #!/usr/bin/env nextflow
    
    process sayHello {
        container = 'docker.io/memverge/cactus'
        cpus = '4'
        memory = '8 GB'
    
        publishDir '${outbucket}/hwout', mode: 'copy', overwrite: true
    
        output:
            path 'sequences.txt'
    
        """
        echo 'Hello World! This is a test of JuiceFlow using a transient head node.'
        """
    }
    
    
    workflow {
        sayHello()
    }
    EOF
    
    # ---- Data Preparation ----
    # Use this section to copy essential files from S3 to the working directory.
    
    # For example, copy the sample sheet and params.yml from S3 to the current working directory.
    # aws s3 cp s3://nextflow-input/samplesheet.csv .
    # aws s3 cp s3://nextflow-input/scripts/params.yml .
    # Copy your nextflow job file (with extension nf) into the container (for example, from S3)
    # The example shown uses a here file to create a simple hello world job
    
    # ---- Nextflow Command Setup ----
    # Important: The -c option appends the mmc config file and soft overrides the nextflow configuration.
    
    # Assembles the Nextflow command with all necessary options and parameters.
    # This example uses a simple hello world job
    nextflow_command='nextflow run hw.nf \
    --outdir $outbucket \
    -c mmc.config '
    
    # -------------------------------------
    # ---- DO NOT EDIT BELOW THIS LINE ----
    # -------------------------------------
    # The following section contains functions and commands that should not be modified by the user.
    
    function install_float {
    # Install float
    local address=$(echo "$FLOAT_ADDR" | cut -d':' -f1)
    wget https://$address/float --no-check-certificate --quiet
    chmod +x float
    }
    
    function get_secret {
    input_string=$1
    local address=$(echo "$FLOAT_ADDR" | cut -d':' -f1)
    secret_value=$(./float secret get $input_string -a $address)
    if [[ $? -eq 0 ]]; then
        # Have this secret, will use the secret value
        echo $secret_value
        return
    else
        # Don't have this secret, will still use the input string
        echo $1
    fi
    }
    
    function remove_old_metadata () {
    echo $(date): "First finding and removing old metadata..."
    if [[ $BUCKET == *"amazonaws.com"* ]]; then
        # If default `amazonaws.com` endpoint url
        S3_MOUNT=s3://$(echo $BUCKET | sed 's:.*/::' | awk -F'[/.]s3.' '{print $1}')
    else
        # If no 'amazonaws.com,' the bucket is using a custom endpoint
        local bucket_name=$(echo $BUCKET | sed 's:.*/::' | awk -F'[/.]s3.' '{print $1}')
        S3_MOUNT="--endpoint-url $(echo "${BUCKET//$bucket_name.}") s3://$bucket_name"
    fi
    # If a previous job id was given, we use that as the old metadata
    if [[ ! -z $PREVIOUS_JOB_ID ]]; then
        echo $(date): "Previous job id $PREVIOUS_JOB_ID specified. Looking for metadata file in bucket..."
        FOUND_METADATA=$(aws s3 ls $S3_MOUNT | grep "$PREVIOUS_JOB_ID.meta.json.gz" | awk '{print $4}')
    fi
    
    if [[ -z "$FOUND_METADATA" ]]; then
        # If no previous job id was given, there is no old metadata to remove.
        echo $(date): "No previous metadata dump found. Continuing with dumping current JuiceFs"
    else
        echo $(date): "Previous metadata dump found! Removing $FOUND_METADATA"
        aws s3 rm $S3_MOUNT/$FOUND_METADATA
        echo $(date): "Previous metadata $FOUND_METADATA removed"
    fi
    
    }
    
    function dump_and_cp_metadata() {
    echo $(date): "Attempting to dump JuiceFS data"
    
    if [[ -z "$FOUND_METADATA" ]]; then
        # If no previous metadata was found, use the current job id
        juicefs dump redis://$(echo $WORKER_ADDR):6868/1 $(echo $FLOAT_JOB_ID).meta.json.gz --keep-secret-key
        echo $(date): "JuiceFS metadata $FLOAT_JOB_ID.meta.json.gz created. Copying to JuiceFS Bucket"
        aws s3 cp "$(echo $FLOAT_JOB_ID).meta.json.gz" $S3_MOUNT
    else
        # If previous metadata was found, use the id of the previous metadata
        # This means for all jobs that use the same mount, their id will always be their first job id
        metadata_name=$PREVIOUS_JOB_ID
        juicefs dump redis://$(echo $WORKER_ADDR):6868/1 $(echo $metadata_name).meta.json.gz --keep-secret-key
        echo $(date): "JuiceFS metadata $metadata_name.meta.json.gz created. Copying to JuiceFS Bucket"
        aws s3 cp "$(echo $metadata_name).meta.json.gz" $S3_MOUNT
    fi
    
    echo $(date): "Copying to JuiceFS Bucket complete!"
    }
    
    function copy_nextflow_log() {
    echo $(date): "Copying .nextflow.log to bucket.."
    if [[ ! -z $PREVIOUS_JOB_ID ]]; then
        aws s3 cp ".nextflow.log" $S3_MOUNT/$PREVIOUS_JOB_ID.nextflow.log
        echo $(date): "Copying .nextflow.log complete! You can find it with aws s3 ls $S3_MOUNT/$PREVIOUS_JOB_ID.nextflow.log"
    else
        aws s3 cp ".nextflow.log" $S3_MOUNT/$(echo $FLOAT_JOB_ID).nextflow.log
        echo $(date): "Copying .nextflow.log complete! You can find it with aws s3 ls $S3_MOUNT/$(echo $FLOAT_JOB_ID).nextflow.log"
    fi
    }
    
    # Variables
    S3_MOUNT=""
    FOUND_METADATA=""
    
    # Functions pre-Nextflow run
    # AWS S3 Access and Secret Keys: For accessing S3 buckets.
    install_float 
    access_key=$(get_secret AWS_BUCKET_ACCESS_KEY)
    secret_key=$(get_secret AWS_BUCKET_SECRET_KEY)
    export AWS_ACCESS_KEY_ID=$access_key
    export AWS_SECRET_ACCESS_KEY=$secret_key
    
    opcenter_ip_address=$(get_secret OPCENTER_IP_ADDRESS)
    opcenter_username=$(get_secret OPCENTER_USERNAME)
    opcenter_password=$(get_secret OPCENTER_PASSWORD)
    
    # Append to config file
    cat <<EOT >> mmc.config
    
    // OpCenter connection settings.
    float {
        address = '${opcenter_ip_address}'
        username = '${opcenter_username}'
        password = '${opcenter_password}'
    }
    
    // AWS S3 Client configuration.
    aws {
    client {
        maxConnections = 20
        connectionTimeout = 300000
    }
    accessKey = '${access_key}'
    secretKey = '${secret_key}'
    }
    EOT
    
    # Create side script to tag head node - exits when properly tagged
    cat > tag_nextflow_head.sh << EOF
    #!/bin/bash
    
    runname="\$(cat .nextflow.log 2>/dev/null | grep nextflow-io-run-name | head -n 1 | grep -oP '(?<=nextflow-io-run-name:)[^ ]+')"
    workflowname="\$(cat .nextflow.log 2>/dev/null | grep nextflow-io-project-name | head -n 1 | grep -oP '(?<=nextflow-io-project-name:)[^ ]+')"
    
    while true; do
    
    # Runname and workflowname will be populated at the same time
    # If the variables are populated and not tagged it, tag the head node
    if [ ! -z \$runname ]; then
        ./float modify -j "$(echo $FLOAT_JOB_ID)" --addCustomTag run-name:\$runname 2>/dev/null
        ./float modify -j "$(echo $FLOAT_JOB_ID)" --addCustomTag workflow-name:\$workflowname 2>/dev/null
        break
    fi
    
    runname="\$(cat .nextflow.log 2>/dev/null | grep nextflow-io-run-name | head -n 1 | grep -oP '(?<=nextflow-io-run-name:)[^ ]+')"
    workflowname="\$(cat .nextflow.log 2>/dev/null | grep nextflow-io-project-name | head -n 1 | grep -oP '(?<=nextflow-io-project-name:)[^ ]+')"
    
    sleep 1s
    
    done
    EOF
    
    # Start tagging side-script
    chmod +x ./tag_nextflow_head.sh
    ./tag_nextflow_head.sh &
    
    # Start Nextflow run
    $nextflow_command
    
    if [[ $? -ne 0 ]]; then
    echo $(date): "Nextflow command failed."
    remove_old_metadata
    dump_and_cp_metadata
    copy_nextflow_log
    exit 1
    else 
    echo $(date): "Nextflow command succeeded."
    remove_old_metadata
    dump_and_cp_metadata
    copy_nextflow_log
    exit 0
    fi
    
  • Use the float secret command to store sensitive variables.

    float secret set OPCENTER_IP_ADDRESS OC_PRIVATE_IP
    float secret set OPCENTER_USERNAME NAME
    float secret set OPCENTER_PASSWORD PASSWORD
    float secret set AWS_BUCKET_ACCESS_KEY KEY
    float secret set AWS_BUCKET_SECRET_KEY SECRET
    
    Replace:

    • OC_PRIVATE_IP: private IP address of OpCenter
    • NAME and PASSWORD: credentials to access OpCenter
    • KEY and SECRET: credentials to access S3 bucket
  • Submit the Nextflow pipeline as an MMCloud job. For simplicity, you can insert the float submit command (with options) into a shell script.

    $ cat run_flow.sh
    float submit --hostInit transient_JFS_AWS.sh \
    -i docker.io/memverge/juiceflow \
    --vmPolicy '[onDemand=true]' \
    --migratePolicy '[disable=true]' \
    --dataVolume '[size=60]:/mnt/jfs_cache' \
    --dirMap /mnt/jfs:/mnt/jfs \
    -c 2 -m 4 \
    -n JOB_NAME \
    --securityGroup SG_ID \
    --env BUCKET=https://BUCKET_NAME.s3.REGION.amazonaws.com \
    --env 'OUTBUCKET=s3://OUTBUCKET_NAME' \
    -j job_submit_AWS.sh
    $ chmod +x run_flow.sh
    $ ./run_flow.sh
    

    Replace:

    • JOB_NAME: name to associate with transient Nextflow host
    • SG_ID: security group ID to open port 6868
    • BUCKET_NAME: S3 bucket you created for this pipeline
    • REGION: region where S3 bucket is located
    • OUTBUCKET_NAME: S3 where results are written
  • Check that the Nextflow pipeline completes sucessfully.

    The Nextflow log is written to the S3 bucket you created for this pipeline. Find the name of the log file by viewing the contents of stdout.autosave for JOB_NAME, for example,

    ```
    ...
    Wed Jul 24 21:30:52 UTC 2024: Copying .nextflow.log to bucket..
    Completed 29.3 KiB/29.3 KiB (240.7 KiB/s) with 1 file(s) remaining
    upload: ./.nextflow.log to s3://welcometojuicefs/21yjg22ze9qlt74ls69k5.nextflow.log
    Wed Jul 24 21:30:56 UTC 2024: Copying .nextflow.log complete! You can find it with aws s3 ls s3://welcometojuicefs/21yjg22ze9qlt74ls69k5.nextflow.log
    [edited]
    ```
    

    For this simple "Hello World" pipeline, the Nextflow job creates one executor to run the "Hello World" script. Check the contents of hwout/sequences.txt in the S3 bucket you used for OUTBUCKET_NAME.

    ```
    Hello World! This is a test of JuiceFlow using a transient head node.
    ```
    

Using Fusion with MMCloud

Fusion is only used with Nextflow pipelines. Using Nextflow with MMCloud requires nf-float, the Nextflow plugin for MMCloud. A description of nf-float, its use, and the configuration changes required to use Fusion, are available here.

When Fusion is used with MMCloud, SpotSurfer and WaveRider are not supported. To turn off WaveRider and to specify On-demand Instances when using Fusion, use the following Nextflow configuration file.

plugins {
  id 'nf-float'
}

workDir = 's3://S3_BUCKET'

// limit concurrent requests sent to MMCE
// by default, it's 100
executor {
    queueSize = 20
}

podman.registry = 'quay.io'

process {
    executor = 'float'
    errorStrategy = 'retry'
    cpus = 2
    memory = '4 GB'
}

wave {
  enabled = true
}

fusion {
  enabled                  = true
  exportStorageCredentials = true
  exportAwsAccessKeys      = true
}

float {
    address = 'OPCENTER_PRIVATE_IP'
    username = 'USERNAME'
    password = 'PASSWORD'
    vmPolicy = [
        onDemand: true,
        retryLimit: 3,
        retryInterval: '10m'
    ]
    migratePolicy = [
        disable: true
    ]
}

aws {
    accessKey = 'BUCKET_ACCESS_KEY'
    secretKey = 'BUCKET_SECRET_KEY'
}

Troubleshooting

As the nextflow job runs, log messages are written to a log file called ".nextflow.log", created in the directory where the nextflow job is running.