Using Nextflow with MMCloud

Summary

Nextflow is a workflow manager used to run and manage computational pipelines such as those found in bioinformatics. Workflow managers make it easier to run complex analyses where there are a series of — sometimes interconnected — tasks, each of which may involve different software and dependencies.

Nextflow provides a framework for describing how a workflow must be executed and includes a CLI for issuing nextflow commands. The execution environment for each task is described using the Nextflow DSL (Domain-Specific Language). In Nextflow terminology, each task is assigned to an "executor," which is a complete environment for running that step in the analysis.

By attaching a MemVerge-developed plugin to a workflow, Nextflow can use MMCloud as an "executor." From an MMCloud point of view, the execution task that Nextflow assigns to it is an independent job that it runs just like any other batch job. The Nextflow user enjoys the benefits that accrue because of using MMCloud, such as reduced costs and shorter execution times.

This document describes how to use Nextflow with MMCloud so that Nextflow can schedule one or more (or all) of the tasks in a workflow to run on MMCloud. Examples are used to demonstrate the principles; you can adapt and modify as needed to fit your workflow.

Configuration

A Nextflow workflow requires a working directory where temporary files are stored and where a process can access the output of an earlier step. When MMCloud is engaged as an executor, the OpCenter instantiates a Worker Node (a container running in its own virtual machine) for each step in the process pipeline. Every Worker Node and the Nextflow Host (the host where the nextflow binary is installed) must have access to the same working directory — for example, the working directory can be an NFS-mounted directory or an S3 bucket. The figure below shows a configuration where an NFS Server provides shared access to the working directory and also acts as repository for input data and the final output.

Figure 1. Nextflow Configuration using NFS Server
The Nextflow configuration file describes the environment for each executor. To use MMCloud as an executor, the configuration file must contain definitions for:
  • Nextflow plugin (source code and documentation are available here)
  • Working directory (directory where the Nextflow Host and all the Worker Nodes have r/w access)
  • IP address of the OpCenter
  • Login credentials for the OpCenter (login credentials can also be provided using environment variables or the OpCenter secret manager)
  • Location of the shared directory if using an NFS server (not needed if using S3)

The operation of Nextflow using an S3 bucket as the working directory is shown in the following figure.

Figure 2. Nextflow Configuration using S3 Bucket

Operation

The Nextflow job file (a file with extension .nf) describes the workflow and specifies the executor for each process. When the user submits a job using the nextflow run command (as shown in the figure), any process with executor defined as "float" is scheduled for the OpCenter. Combining information from the configuration file and the job file, the Nextflow plugin formulates a float sbatch command string and submits the job to the OpCenter. This procedure is repeated for every task in the workflow that has "float" as the executor. Every Worker Node mounts the same working directory so that the Nextflow Host and all the Worker Nodes read from, and write to, the same shared directory.
Note: CSPs impose limits on services instantiated by each account. In AWS, these limits are called "service quotas" and apply to every AWS service, generally on a region by region basis. Some Nextflow pipelines instantiate a large number of compute instances, a number large enough to exceed the AWS EC2 service quota. If this happens, increase your AWS EC2 service quota and rerun the pipeline.
Figure 3. Nextflow Operation with MMCloud

Requirements

To use Nextflow with MMCloud, you need the following:
  • MMCloud Carmel 2.0 release or later
  • Running instance of OpCenter with valid license
  • Linux virtual machine running in same VPC as OpCenter (call this the Nextflow Host)
  • On the Nextflow Host:
    • Java 11 or later release (the latest Long Term Support release is Java 17)
    • MMCloud CLI binary. You can download it from the OpCenter.
    • Nextflow
    • Nextflow configuration file
    • Nextflow job file
  • (Optional) NFS Server to provide shared working directory. There are other possibilities; for example, the shared working directory can be hosted on the Nextflow Host or the shared working directory can be mounted directly from AWS S3.

Prepare the Nextflow Host

The Nextflow Host is a Linux virtual machine running in the same VPC as the OpCenter. If the Nextflow host is in a different VPC subnet, ensure that the Nextflow host can reach the OpCenter and that it can mount the file system from the NFS Server (if used).

All network communication among the OpCenter, the Nextflow Host, NFS Server (if used), and Worker Nodes must use private IP addresses. If the Nextflow Host uses an NFS-mounted file system as the working directory, ensure that any firewall rules allow access to port 2049 (the port used by NFS).

  • Check the version of java installed on the Nextflow Host by entering:
    java -version
    openjdk version "17.0.6-ea" 2023-01-17 LTS
    OpenJDK Runtime Environment (Red_Hat-17.0.6.0.9-0.4.ea.el9) (build 17.0.6-ea+9-LTS)
    OpenJDK 64-Bit Server VM (Red_Hat-17.0.6.0.9-0.4.ea.el9) (build 17.0.6-ea+9-LTS, mixed mode, sharing)
  • If needed, install Java 11 or later. Commercial users of Oracle Java need a subscription. Alternatively, you can install OpenJDK under an open-source license by entering (on a Red Hat-based Linux system):

    sudo dnf install java-17-openjdk

  • Install nextflow by entering:

    sudo curl -s https://get.nextflow.io | bash

    This installs nextflow in the current directory. The installation described here assumes that you install nextflow in your home directory. You can also create a directory for your nextflow installation, for example, sudo mkdir ~/nextflow

  • Check your nextflow installation by entering:
    ./nextflow run hello
    N E X T F L O W  ~  version 23.04.2
    Launching `https://github.com/nextflow-io/hello` [voluminous_liskov] DSL2 - revision: 1d71f857bb [master]
    executor >  local (4)
    [13/1bb6ed] process > sayHello (3) [100%] 4 of 4 ✔
    Bonjour world!
    
    Ciao world!
    
    Hola world!
    
    Hello world!
    

    If this job does not run, check the log called .nextflow.log.

  • Upgrade to the latest version of Nextflow by entering ./nextflow self-update
  • Download the OpCenter CLI binary for Linux hosts from the following URL:

    https://<op_center_ip_address>/float

    where <op_center_ip_address> is the public (if you are outside the VPC) or private (if you are inside the VPC) IP address for the OpCenter. You can click on the link to download the CLI binary (called float) or you can enter:

    wget https://<op_center_ip_address>/float --no-check-certificate

    If you download the CLI binary to your local machine, move the file to the Nextflow Host.

  • Make the CLI binary file executable and add the path to the CLI binary file to your PATH variable.
Note: You can use the float submit --template nextflow:jfs command to create a Nextflow host with all the required software installed (including JuiceFS). Contact your MemVerge support team for additional details.

(Optional) Prepare the Working Directory Host

The Nextflow Host and the Worker Nodes must have access to a shared working directory. There are several ways to achieve this. In the example shown here, a separate Linux virtual machine (the NFS Server) is started in the same VPC as the OpCenter.

Alternatively, you can edit the Nextflow configuration file to automatically mount an S3 bucket as a filesystem. Instructions on how to do this are in the next section.

You can obtain instructions on turning a generic CentOS-based server into an NFS server from this link. NFS uses port 2049 for connections, so ensure that any firewall rules allow access to port 2049. If the Working Directory Host is in a different VPC subnet, ensure that it can reach the Nextflow host and Worker Nodes. Set the subnet mask in /etc/exports to allow the Nextflow Host and Worker Nodes to mount file systems from the Working Directory Host.

For example:
cat /etc/exports
/mnt/memverge/shared 172.31.0.0/16(rw,sync,no_root_squash)
  • Log in to the NFS Server and create the shared working directory.
    sudo mkdir /mnt/memverge/shared
    sudo chmod ugo+r+w+x /mnt/memverge/shared
  • Log in to the Nextflow Server and mount the shared working directory (use the NFS Server's private IP address). Use df to check that the volume mounted successfully.
    sudo mkdir /mnt/memverge/shared
    sudo mount -t nfs <nfs_server_ip_address>:/mnt/memverge/shared /mnt/memverge/shared
    df

(Optional) Use S3 Bucket as Filesystem

Some workflows initiate hundreds or even thousands of tasks simultaneously. If all these tasks access the NFS server at the same time, a bottleneck can occur. For these workflows, it can help to use an S3 bucket as the working directory.

Note: When used with the appropriate configuration file, the Nextflow Host and the Worker Nodes automatically mount the S3 bucket as a linux file system.

Complete the following steps.

  • Log in to your AWS Management Console.
    • Open the Amazon S3 console.
    • From the left-hand panel, select Buckets.
    • On the right-hand side, click Create bucket and follow the instructions.

      You must choose a bucket name (nfshareddir is used as a placeholder in this document) that is unique across all regions except China and the US government. Buckets are accessible across regions.

    • On the navigation bar, all the way to the right, click your username and go to Security credentials.
    • Scroll down the page to the section called Access keys and click Create access key.
    • Download the csv file.

      The csv file has two entries, one called Access key ID and one called Secret access key. You enter these in the Nextflow configuration file.

(Optional) Use Distributed File System

While NFS and S3 are viable options for providing the shared working directory, unacceptable performance may occur when running pipelines that generate I/O at high volume or high throughput. For these pipelines, the use of a high-performance distributed file system is recommended. OpCenter supports two distributed file systems.
  • Fusion

    Fusion is a POSIX-compliant distributed file system optimized for Nextflow pipelines. Fusion requires the use of Wave containers. A description of how to use the Fusion file system with MMCloud is available here.

  • JuiceFS

    JuiceFS is an open-source distributed file system that provides an API to access a POSIX-compliant file system built on top of a range of cloud storage services. If you use the float submit --template nextflow:jfs option to create a Nextflow host, the JuiceFS environment is automatically created.

Prepare the Configuration File

Nextflow configuration files can be extensive — they can include profiles for many executors. Create a simple configuration for using MMCloud as the sole executor by following these steps.

In the directory where you installed nextflow, create a file called nextflownfs.config. When a parameter requires an IP address, use a private IP address. The following configuration file uses the NFS server as the shared working directory.
cat nextflownfs.config
plugins {
  id 'nf-float'
}
workDir = '/mnt/memverge/shared'
podman.registry ='quay.io'
executor {
  queueSize = 100
}
float {
  address = 'OPCENTER_IP_ADDRESS'
  username = 'USERNAME'
  password = 'PASSWORD'
  nfs = 'nfs://NFS_SERVER_IP_ADDRESS/mnt/memverge/shared'
}
process {
  executor = 'float'
  container = 'docker.io/cactus'
}
Replace the following (keep the quotation marks).
  • USERNAME: username to log in to the OpCenter. If absent, value of environment variable MMC_USERNAME used.
  • PASSWORD: password to log in to the OpCenter. If absent, value of environment variable MMC_PASSWORD used.
  • OPCENTER_IP_ADDRESS: private IP address of the OpCenter. If absent, value of environment variable MMC_ADDRESS used. If using multiple OpCenters, separate entries with a comma.
  • NFS_SERVER_IP_ADDRESS: private IP address of the NFS server.
Nextflow secrets can supply values for USERNAME and PASSWORD as follows.
  • Set the values
    nextflow secrets set MMC_USERNAME "..."
    nextflow secrets set MMC_PASSWORD "..."
  • Insert in Nextflow configuration file.
    float {
        username = secrets.MMC_USERNAME
        password = secrets.MMC_PASSWORD
    }
Explanations of the parameters in the configuration file follow.
  • plugins section

    The MemVerge plugin called nf-float is included in the Nextflow Plugins index. This means that the reference to nf-float resolves to "nf-float version: latest" and Nextflow automatically downloads the latest version of the nf-float plugin. The nf-float plugin is updated frequently and configuration parameters change or new ones added. See the nf-float README on github for the latest details.

  • workDir specifies the path to the shared directory if the directory is NFS-mounted. If an S3 bucket is used, workDir specifies the bucket and path to folder in the format s3://bucket_name/folder
  • podman.registry specifies the default container registry (the choices are usually quay.io or docker.io). If docker.io is specified, then /memverge/ is preprended to the container name. For example, 'cactus' becomes 'docker.io/memverge/cactus'.
  • executor section

    queueSize limits the maximum number of concurrent requests sent to the OpCenter (default 100).

  • float section (all options are listed below)
    • address: private IP address of the OpCenter. Specify multiple OpCenters using the format 'A.B.C.D', 'E.F.G.H' or 'A.B.C.D, E.F.G.H'.
    • username, password: credentials for logging in to the OpCenter.
    • nfs: parameter describing (only if using NFS) where the shared directory must be mounted from. Do not use with S3.
    • The string following commonExtra is appended to the float command that is submitted to the OpCenter. The string shown here is an example: you can use any float command option.
    • migratePolicy: the migration policy used by WaveRider, specified as a map. Refer to the CLI command reference for the list of available options.
    • vmPolicy: the VM creation policy, specified as a map. Refer to the CLI command reference for the list of available options.
    • timeFactor: a number (default value is 1) that multiplies the time supplied by the Nextflow task. Use it to prevent task timeouts.
    • maxCpuFactor: a number (default value is 4) used to calculate the maximum number of CPU cores for the instance, namely, maximum number of CPU cores is set to maxCpuFactor * cpus of specified for the task.
    • maxMemoryFactor: a number (default value is 4) used to calculate the maximum memory for the instance, namely, maximum memory is set to maxMemoryFactor * memory specified for the task.
    • commonExtra: a string that allows the user to specify additional options to float submit command. This string is appended to every float submit command.
  • process section

    Specifies the default parameters (using nextflow terminology) for tasks in the nextflow file. If a value for container is not specified and the task does not specify a container value, the job fails.

  • aws section

    Specifies the credentials for accessing the S3 buckets if used.

The following configuration file uses the S3 bucket as the shared working directory. The Nextflow host and all worker nodes automatically mount the S3 bucket as a file system.
cat nextflows3.config
plugins {
  id 'nf-float'
}
workDir = 's3://[S3BUCKET]'
podman.registry = 'quay.io'
executor {
  queueSize = 100
}
float {
  address = 'OPCENTER_IP_ADDRESS'
  username = 'USERNAME'
  password = 'PASSWORD'
}
process {
  executor = 'float'
}
aws {
  accessKey = 'ACCESS_KEY'
  secretKey = 'SECRET_KEY'
  region = 'REGION'
}
Replace the following (keep the quotation marks).
  • S3BUCKET: name of the S3 bucket to use as the shared working directory (nfshareddir is used as a placeholder in this document)
  • OPCENTER_IP_ADDRESS: as described previously
  • USERNAME: as described previously
  • PASSWORD: as described previously
  • ACCESS_KEY: access key ID for your AWS account. See below for options for providing the access key ID.
  • SECRET_KEY: secret access key for your AWS account. See below for options for providing the secret access key.
  • REGION: region in which the S3 bucket is located.
Provide the access key ID and secret access key using one of the following methods.
  • Enter the access key ID and secret access key as cleartext
  • Set the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
  • Set the environment variables AWS_ACCESS_KEY and AWS_SECRET_KEY
  • Use the AWS CLI command aws configure to populate the default profile in the AWS credentials file located at ~/.aws/credentials
  • Use the temporary AWS credentials provided by an IAM instance role. See IAM Roles documentation for details.

Prepare the Nextflow Job File

The nextflow job file describes the workflow and how the workflow must be executed. To demonstrate how a simple workflow executes, follow these steps (example is adapted from nextflow.io.)
  • Create a sample fasta file called sample.fa and place it in the shared working directory, for example, in /mnt/memverge/shared if you are using the NFS server, or in s3://nfshareddir if you are using an S3 bucket (you can use any S3 bucket where you have r/w access — it does n't have to be the shared directory specified in the nextflow configuration file).
    cat /mnt/memverge/shared/sample.fa
    >seq0
    FQTWEEFSRAAEKLYLADPMKVRVVLKYRHVDGNLCIKVTDDLVCLVYRTDQAQDVKKIEKF
    >seq1
    KYRTWEEFTRAAEKLYQADPMKVRVVLKYRHCDGNLCIKVTDDVVCLLYRTDQAQDVKKIEKFHSQLMRLME LKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
    >seq2
    EEYQTWEEFARAAEKLYLTDPMKVRVVLKYRHCDGNLCMKVTDDAVCLQYKTDQAQDVKKVEKLHGK
    >seq3
    MYQVWEEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVCLQYKTDQAQDVK
    >seq4
    EEFSRAVEKLYLTDPMKVRVVLKYRHCDGNLCIKVTDNSVVSYEMRLFGVQKDNFALEHSLL
    >seq5
    SWEEFAKAAEVLYLEDPMKCRMCTKYRHVDHKLVVKLTDNHTVLKYVTDMAQDVKKIEKLTTLLMR
    >seq6
    FTNWEEFAKAAERLHSANPEKCRFVTKYNHTKGELVLKLTDDVVCLQYSTNQLQDVKKLEKLSSTLLRSI
    >seq7
    SWEEFVERSVQLFRGDPNATRYVMKYRHCEGKLVLKVTDDRECLKFKTDQAQDAKKMEKLNNIFF
    >seq8
    SWDEFVDRSVQLFRADPESTRYVMKYRHCDGKLVLKVTDNKECLKFKTDQAQEAKKMEKLNNIFFTLM
    >seq9
    KNWEDFEIAAENMYMANPQNCRYTMKYVHSKGHILLKMSDNVKCVQYRAENMPDLKK
    >seq10
    FDSWDEFVSKSVELFRNHPDTTRYVVKYRHCEGKLVLKVTDNHECLKFKTDQAQDAKKMEK
  • Build a workflow that splits the fasta sequences into separate files and reverses the order by creating a nextflow job file called splitfanfs.nf.

    If you are using the S3 bucket, replace params.in = "/mnt/memverge/shared/sample.fa" with params.in = "s3://nfshareddir/sample.fa" (use the name of the S3 bucket where you placed the sample fasta file if you did not place the file in nfshareddir).

    cat splitfanfs.nf
    #!/usr/bin/env nextflow
     
    params.in = "/mnt/memverge/shared/sample.fa"
     
    /*
     * Split a fasta file into multiple files
     */
    process splitSequences {
        executor = 'float'
        container = 'docker.io/memverge/cactus'
        cpus = '4'
        memory = '8 GB'
        extra = '--vmPolicy [spotOnly=true]'
     
        input:
        path 'input.fa'
     
        output:
        path 'seq_*'
     
        """
        awk '/^>/{f="seq_"++d} {print > f}' < input.fa
        """
    }
     
    /*
     * Reverse the sequences
     */
    process reverse {
        executor = 'float'
        container = 'docker.io/memverge/cactus'
        cpus = '4'
        memory = '8 GB'
     
        input:
        path x
     
        output:
        stdout
     
        """
        cat $x | rev
        """
    }
     
    /*
     * Define the workflow
     */
    workflow {
        splitSequences(params.in) \
          | reverse \
          | view
    }
    

    The string following extra is combined with the string following commonExtra in the config file and appended to the float command submitted to the OpCenter. The string value shown here after extra is an example: use any float command option. The extra setting overrides the commonExtra setting if they are in conflict.

Run a Workflow using Nextflow

Run a simple workflow by entering:
./nextflow run splitfanfs.nf -c nextflownfs.config -cache false
N E X T F L O W  ~  version 23.04.2
Launching `splitfanfs.nf` [prickly_watson] DSL2 - revision: dadefd0d0b
executor >  float (2)
[67/59f2c6] process > splitSequences [100%] 1 of 1 ✔
[f6/32fab1] process > reverse        [100%] 1 of 1 ✔
0qes>
FKEIKKVDQAQDTRYVLCVLDDTVKICLNGDVHRYKLVVRVKMPDALYLKEAARSFEEWTQF
9qes>
KKLDPMNEARYQVCKVNDSMKLLIHGKSHVYKMTYRCNQPNAMYMNEAAIEFDEWNK
01qes>
KEMKKADQAQDTKFKLCEHNDTVKLVLKGECHRYKVVYRTTDPHNRFLEVSKSVFEDWSDF
1qes>
MLTFFINNLKEMKKAEQAQDTKFKLCEKNDTVKL EMLRMLQSHFKEIKKVDQAQDTRYLLCVVDDTVKICLNGDCHRYKLVVRVKMPDAQYLKEAARTFEEWTRYK
2qes>
KGHLKEVKKVDQAQDTKYQLCVADDTVKMCLNGDCHRYKLVVRVKMPDTLYLKEAARAFEEWTQYEE
3qes>
KVDQAQDTKYQLCVSNDTVKICLNGDCHRYKLVVRVKMPDTLYLKEVARSFEEWVQYM
4qes>
LLSHELAFNDKQVGFLRMEYSVVSNDTVKICLNGDCHRYKLVVRVKMPDTLYLKEVARSFEE
5qes>
RMLLTTLKEIKKVDQAMDTVYKLVTHNDTLKVVLKHDVHRYKTCMRCKMPDELYLVEAAKAFEEWS
6qes>
ISRLLTSSLKELKKVDQLQNTSYQLCVVDDTLKLVLEGKTHNYKTVFRCKEPNASHLREAAKAFEEWNTF
7qes>
FFINNLKEMKKADQAQDTKFKLCERDDTVKLVLKGECHRYKMVYRTANPDGRFLQVSREVFEEWS
8qes>
MLTFFINNLKEMKKAEQAQDTKFKLCEKNDTVKLVLKGDCHRYKMVYRTSEPDARFLQVSRDVFEDWS

Completed at: 02-Aug-2023 20:04:20
Duration    : 9m 23s
CPU hours   : (a few seconds)
Succeeded   : 2

This nextflow job file defines two processes that use MMCloud as the executor. Using the float squeue command or the OpCenter GUI, you can view the two processes executed by OpCenter.

float squeue -f image='cactus' -f status='Completed'
+--------+------------------+-------+-----------+-------------+----------+------------+
|  ID    |       NAME       | USER  |  STATUS   |SUBMIT TIME  | DURATION |    COST    |
+--------+------------------+-------+-----------+-------------+----------+------------+
| O7w... | cactus-m5.xlarge | admin | Completed | ...9:54:59Z | 2m54s    | 0.0031 USD |
| L17... | cactus-m5.xlarge | admin | Completed | ...9:58:22Z | 5m21s    | 0.0061 USD |
+--------+------------------+-------+-----------+-------------+----------+------------+
(edited)

Run an RNA Sequencing Workflow

This example (adapted from nextflow.io) uses publicly available data. Get the data here. For simple configuration, place this data in the shared working directory in a folder called /mnt/memverge/shared/nextflowtest/data/ggal if you are following the NFS example or in s3://nfshareddir/ggal if you are following the S3 bucket example. In general, input data is not stored in the shared working directory; for example, input data is often located in a publicly accessible S3 bucket.

  • Build a nextflow job file called rnaseqnfs.nf with the following content if you are following the NFS server example. If you are following the S3 example, create a folder called results in s3://nfshareddir and replace /mnt/memverge/shared/nextflowtest/data with s3://nfshareddir and replace /mnt/memverge/shared/results with s3://nfshareddir/results.

    The rnaseq-nf image is not a "built-in" image in the OpCenter App Library. Specifying the URI for the rnaseq-nf image in the job file causes the OpCenter to pull the latest version of the image from the default registry.

    cat rnaseqnfs.nf
    #!/usr/bin/env nextflow
    
    params.reads = "/mnt/memverge/shared/nextflowtest/data/ggal/ggal_gut_{1,2}.fq"
    params.transcriptome = "/mnt/memverge/shared/nextflowtest/data/ggal/ggal_1_48850000_49020000.Ggal71.500bpflank.fa"
    params.outdir = "/mnt/memverge/shared/results"
     
    workflow {
        read_pairs_ch = channel.fromFilePairs( params.reads, checkIfExists: true )
     
        INDEX(params.transcriptome)
        FASTQC(read_pairs_ch)
        QUANT(INDEX.out, read_pairs_ch)
    }
     
    process INDEX {
        executor = 'float'
        container = 'nextflow/rnaseq-nf'
        cpus = '4'
        memory = '16 GB'
        tag "$transcriptome.simpleName"
     
        input:
        path transcriptome
     
        output:
        path 'index'
     
        script:
        """
        salmon index --threads $task.cpus -t $transcriptome -i index
        """
    }
     
    process FASTQC {
        executor = 'float'
        container = 'nextflow/rnaseq-nf'
        cpus = '4'
        memory = '16 GB'
        tag "FASTQC on $sample_id"
        publishDir params.outdir
     
        input:
        tuple val(sample_id), path(reads)
     
        output:
        path "fastqc_${sample_id}_logs"
     
        script:
        """
        mkdir fastqc_${sample_id}_logs
        fastqc -o fastqc_${sample_id}_logs -f fastq -q ${reads}
        """
    
    }
     
    process QUANT {
        executor = 'float'
        container = 'nextflow/rnaseq-nf'
        cpus = '4'
        memory = '16 GB'
        tag "$pair_id"
        publishDir params.outdir
     
        input:
        path index
        tuple val(pair_id), path(reads)
     
        output:
        path pair_id
     
        script:
        """
        salmon quant --threads $task.cpus --libType=U -i $index -1 ${reads[0]} -2 ${reads[1]} -o $pair_id
        """
    }
  • Run the workflow by entering
    ./nextflow run rnaseqnfs.nf -c nextflownfs.config -cache false
    N E X T F L O W  ~  version 23.04.2
    Launching `rnaseqnfs.nf` [soggy_mccarthy] DSL2 - revision: fca2fdc7d3
    executor >  float (3)
    [10/7c2cc3] process > INDEX (ggal_1_48850000_49020000) [100%] 1 of 1 ✔
    [d6/8c9f6b] process > FASTQC (FASTQC on ggal_gut)      [100%] 1 of 1 ✔
    [2f/378946] process > QUANT (ggal_gut)                 [100%] 1 of 1 ✔
    Completed at: 02-Aug-2023 21:38:56
    Duration    : 6m 16s
    CPU hours   : 0.2
    Succeeded   : 3

    This nextflow job file defines three processes that use MMCloud as the executor. You can confirm that these processes execute on MMCloud.

    float squeue -f image='rnaseq-nf-np6l7z' -f status='Completed'
    +-----------+------------+-------+-----------+-----------+---------+----------+
    |  ID       | NAME       | USER  |  STATUS   |SUBMIT TIME| DURATION|   COST   |
    +-----------+------------+-------+-----------+-----------+---------+----------+
    | IvwWWY... |rnaseq-nf...| admin | Completed |21:52:33Z  | 2m38s   |0.0051 USD|
    | lakXQg... |rnaseq-nf...| admin | Completed |21:52:35Z  | 2m39s   |0.0052 USD|
    | SyVSmL... |rnaseq-nf...| admin | Completed |21:55:08Z  | 2m28s   |0.0048 USD|
    +-----------+------------+-------+-----------+-----------+---------+----------+
    (edited)
  • View the output at /mnt/memverge/shared/results (or s3://nfshareddir/results).
    ls
    fastqc_ggal_gut_logs  ggal_gut

    Some of the output is in html format, for example:

    Figure 4. Example Output from RNA Sequencing Workflow

Integration with Nextflow Tower

Nextflow Tower is a product from Seqera Labs that is used to launch, monitor, and manage computational pipelines from a web interface. You can also launch a Nextflow pipeline using the CLI on the Nextflow Host and monitor the progress in a cloud-hosted Nextflow Tower instance provided by Seqera Labs. Instructions are available here.

  • Sign in to Nextflow Tower. If you do not have an account, follow the instructions to register.
  • Create an access token using the procedure described here. Copy the access token to your clipboard.
  • From a terminal on the Nextflow Host, enter:
    export TOWER_ACCESS_TOKEN=eyxxxxxxxxxxxxxxxQ1ZTE=
    where eyxxxxxxxxxxxxxxxQ1ZTE= is the access token you copied to the clipboard.
  • Launch your Nextflow pipeline with the with-tower flag. For example:
    nextflow run rnaseqs3.nf -c nextflows3.config -cache false -with-tower
    N E X T F L O W  ~  version 23.04.2
    Launching `rnaseqs3.nf` [big_ramanujan] DSL2 - revision: 9c7f478123
    Monitor the execution with Nextflow Tower using this URL: https://tower.nf/user/user_name/watch/1rjpVakrhQ3wAf
    executor >  float (3)
    [31/94c152] process > INDEX (ggal_1_48850000_49020000) [100%] 1 of 1 ✔
    [1e/3c939b] process > FASTQC (FASTQC on ggal_gut)      [100%] 1 of 1 ✔
    [e7/6ceef4] process > QUANT (ggal_gut)                 [100%] 1 of 1 ✔
    Completed at: 02-Aug-2023 23:35:06
    Duration    : 5m 34s
    CPU hours   : (a few seconds)
    Succeeded   : 3
  • Open a browser and go to the URL.

Troubleshooting

As the nextflow job runs, log messages are written to a log file called ".nextflow.log", created in the directory where the nextflow job is running.