Using RStudio with MMCloud

Summary

MMCloud is a software product from MemVerge that is used to deploy containerized applications in public clouds. Most often, the containers are used to run batch jobs, that is, jobs that run without an attached terminal — results are written to a file (or files) and retrieved when the job completes. RStudio is an integrated development environment (IDE) for the R programming language, which means that an RStudio user engages in an interactive session. This document describes how to use MMCloud to deploy a container that supports an interactive, browser-based RStudio session. The container can run on a Spot Instance in which case MMCloud seamlessly migrates the job to a new virtual machine if the Spot Instance is reclaimed.

MMCloud's support for RStudio sessions includes a gateway, which is a reverse proxy that ensures that the RStudio session appears to clients as an IP address that never changes even when the RStudio session moves to a different virtual machine.

Architecture

As shown in the figure, the gateway has two sides: the client side and the server side. The gateway represents all back-end servers as a single IP address on the client side. Multiple servers, hosting the same or different services, can connect to the same gateway. For example, RStudio servers and Jupyter Notebook servers can run simultaneously in a server farm. On the client side, each service, for example, an RStudio session, is represented as an IP address:Port combination.

In the case of a Spot reclaim event, the RStudio session moves to a new virtual machine instance (either Spot or On-demand depending on your policy). The IP address of the new virtual machine is different from the IP address of the reclaimed Spot instance, but to the client, this change is transparent — the client session continues without breaking the connection.

Figure 1. RStudio Session Connected to Gateway

Operation

To the OpCenter, the RStudio session is a job, identified by a Job ID, that continues running until it is manually canceled. This job can run on a Spot Instance or an On-demand Instance. The gateway is a specialized server, started by the OpCenter, that runs on an On-demand Instance and continues running until it is manually canceled. The gateway is identified by a gateway ID.

The gateway and the RStudio server connect via a TCP port (the default port number for RStudio is 8787). The gateway selects the first available port from the range of ports configured on the client side of the gateway to identify the RStudio server to clients. In the example shown, the port 10001 is used by the gateway to identify the RStudio server to clients. When the client opens an http session on the client side of the gateway, the client uses port 10001 to indicate which RStudio server to connect to.

Prerequisites

To use RStudio, and the gateway, with MMCloud, you need the following:
  • MMCloud Carmel 2.0 release or later
  • Running instance of OpCenter with valid license

Create Inbound Firewall Rules

To access RStudio running on a container, a port on the container is published, that is, a port on the container host is mapped to a port on the container. This enables a browser to establish a TCP connection with the container. To open a port on the container host, create an inbound firewall rule by following these steps.

  • Log in to your Cloud Service Provider (CSP)'s Management Console and follow the steps to create an inbound rule. The steps are different for each CSP. The steps shown here are for AWS (firewall rules are called security groups in AWS).
    • Open the EC2 Dashboard and click Security Groups.
    • On the upper right, click Create security group. The Create security group screen opens.
    • Enter a name and a description for the security group.
    • In the Inbound rules box, click Add rule.
    • In the Port range box, enter 8787 (this is the default port number for using http to connect to RStudio).
    • In the Source box, enter 0.0.0.0/0 (this allows access from any host).
    • Scroll to the bottom of the page and click Create security group.
    • After the security group is created, it appears in the table of Security Groups. Note the entry in the column titled Security group ID. It has the format sg-xxxx.
  • Repeat these steps to create an inbound rule to allow access to a range of ports on the gateway. For example, if the gateway assigns ports in the range from 10000 to 10500, create an inbound rule to allow access to all ports in the range 10000 to 10500. Note the Security group ID for this rule.

Start a Gateway using the CLI

  • Log in to OpCenter.
  • Enter float gateway create -h to show the options that are available when starting a new gateway. For most deployments, the default options are sufficient except for portRange and securityGroup, which must be provided. The port range must be between 10000 and 65535 (inclusive).
  • Create a gateway by entering the following command:

    float gateway create --portRange <minport>:<maxport> --securityGroup <sg-yyyy>

    where <minport> is the lowest port number in the range, <maxport> is the highest port number in the range, and <sg-yyyy> is the ID for the Security Group that allows access to all the ports in the range. For example:
    float gateway create --portRange 10000:10500 --securityGroup sg-0fbb6a83983183364
  • Show all running gateways by entering float gateway list. For example:
    float gateway list
    +----------+------+----------+---------------+------------+--------+-----+-------+
    |   ID     |STATUS| CONFIG   |    PUBLICIP   |  PORTRANGE | START  | JOBS|  COST |
    +----------+------+----------+---------------+------------+--------+-----+-------+
    | g-4Ui... | Ready| 2Cores4GB| 54.175.247.146| 10000-10500|16:10:09| 1   | 14.20 |
    +----------+------+----------+---------------+------------+--------+-----+-------+
    (edited for clarity)
  • To add a security group to (or to remove a security group) from a gateway, enter

    float gateway modify -g <gw_id> --addSecurityGroup | rmSecurityGroup <sg_zzzz>

    where <gw_id> is the ID of the gateway to modify and <sg_zzzz> is the security group to add (or remove).
  • To disconnect a job from a gateway, enter float gateway disconnect -g <gw_id> -j <job_id> --port <target_port> where <gw_id> is the ID of the gateway, <job_id> is the job to disconnect, and <target_port> is the port to disconnect.
  • To delete a gateway, enter float gateway destroy -g <gw_id> where <gw_id> is the ID of the gateway to delete. Disconnect all jobs before deleting the gateway.

Start a Gateway using the Web Interface

  • On the left-hand panel, go to SERVICE > Gateways.
  • Click Create Gateway (top, right-hand side).
  • In the pop-up window, fill in the required fields (including the security group to open ports on the client side of the gateway) and then click Create (bottom right).
  • Manage the gateway using the Gateways screen.

Start an RStudio Server using the CLI

  • View the default settings included in the rstudio template
    float template show -T rstudio
    name: rstudio
    tag: "2022.12"
    type: JobTemplate
    lastUpdated: 2023-06-30T13:50:51Z
    jobParams:
        - image: rstudio:latest
          cpu: "2"
          mem: "4"
          env:
            - RSTUDIO_USER=rstudio
            - RSTUDIO_PASS=Welcome123!
          publish:
            - 8787:8787
          extraOptions: --irmap-scan-path /home/rstudio/
          withRoot: true
  • Enter float template deploy -h to view the available options (including parameters that can be overwritten).
  • To use the default settings, enter:

    float template deploy -T rstudio --gateway auto --targetPort 8787 --securityGroup sg-<xxxx>

    where sg-<xxxx> is the security group to open port 8787.

    Example:

    float template deploy -T rstudio --targetPort 8787 --securityGroup sg-0b23d4d7482ddb825 --gateway auto
    id: u6grzQqzwmnu8JFXTRq5N
    name: rstudio
    user: admin
    imageID: docker.io/memverge/rstudio:latest
    status: Submitted
    ...
    gateway:
        gatewayID: g-4Ui24EjdZXJAyeTHClufF
        IPAddress: 54.175.247.146
        portMappings:
            - 8787 -> 54.175.247.146:10000
  • Keep entering float squeue -j <job_id> (job id is u6grzQqzwmnu8JFXTRq5N in the example) until the job status changes to "Executing."
  • Establish a browser session on the Rstudio server by opening a browser and going to <gw_ip_address>:<port>, for example, go to 54.175.247.146:10000 in the above example.
  • Log in with the default username/password combination rstudio/Welcome123!

Start an RStudio Server using the Web Interface

  • On the left-hand panel, select Job Templates and then click rstudio.
  • Click Submit Job (right-hand side). The Submit Job pop-up window opens to the Apply a Template tab.
  • On the right-hand side of the Apply a Template screen, populate the following fields.
    • Security Group: enter the ID of the security group that opens port 8787
    • Gateway: enter the word "auto"
    • Target Port: enter "8787"
  • Click Submit (bottom, right-hand side).
  • On the left-hand panel, select Jobs.
  • Identify your job and then click the Refresh button until the status of your job changes to Executing.
  • On the left-hand panel, go to SERVICE > Gateways.
  • In the Gateways screen, click the ID associated with your gateway.
  • In the Connected Jobs table, copy the entry in the Access URL column and paste into your browser address bar.
  • Log in with the default username/password combination rstudio/Welcome123! (or the username/password combination you configured using the Environmental Variables fields in the Apply a Template screen).

Managing an RStudio Job

There is no difference between running an RStudio job on an On-demand instance and running on a Spot Instance. For either instance type, the RStudio job can be manually migrated to a new instance of any type.

If the RStudio job runs on a Spot Instance, the Spot Instance can be reclaimed by the CSP. If this happens, OpCenter automatically migrates the RStudio job to a new instance and resumes execution — without losing any work in progress. Effectively, the RStudio session continues uninterrupted.

When your browser connects to the gateway, the gateway places a web widget on the top, right-hand side (the widget can be moved).

  • Click the widget to open the login screen.
  • Log in to the OpCenter with your credentials.

Using the widget you can Migrate, Suspend a running job, Restore a suspended job, or Cancel a (running or suspended) job.

Regardless of whether the RStudio job migrated manually or automatically, the IP address of the new instance is different. The ID associated with the RStudio job does not change, neither does the client-side gateway IP address and port number. The RStudio job remains connected to the gateway. Your browser session is not interrupted and you do not have to log in again.