New Features in MMCloud Goa 2.4 Release

Date Released

Goa 2.4 released on 12-05-2023.

Supported Clouds

MMCloud is designed to work on any cloud infrastructure. The Goa 2.4 release supports the following clouds:

New Features in Goa 2.4 Release

Type Domain Description

Feature

AppCapsule

The default behavior of AppCapsule is to capture a snapshot only when a migration or spot reclaim event is triggered. If periodic snapshots are enabled, snapshots are taken at fixed intervals and when a migration or spot reclaim event is triggered. When a migration or reclaim event is triggered, the most recent complete snapshot is saved to disk and used to resume the job on a new instance.

AppCapsule++ improves on AppCapsule, especially for workloads with large memory footprints, by incorporating the following performance-enhancing capabilities.
  • Instead of taking a complete snapshot every time, only incremental changes are captured after the first snapshot is taken.
  • Instead of taking snapshots at fixed intervals, AppCapsule++ only takes an incremental snapshot when the number of changed memory pages reaches a certain limit (the limit depends on the I/O bandwidth of the device on which the snapshot is stored).
  • The application and the process that captures the snapshot (and writes it to disk) are asynchronous, which means that the application continues to run while the snapshot is saved.

The complete snapshot is assembled only when the migration or reclaim event is triggered.

Enable AppCapsule++ on AWS from the Submit Job screen in the web interface by selecting the Misc. tab and setting Incremental Snapshot to On. From the CLI, use the float submit command with the --dumpMode incremental option. AppCapsule++ is not currently supported on Google Cloud.

Feature

License

The Essentials plan is discontinued. The Pro plan changes its method of calculating usage charges. Instead of charging based on savings relative to on-demand prices, the Pro plan now charges based on a fixed rate per CPU core hour. The OpCenter is not subject to usage charges. The Enterprise plan continues unchanged.

Feature

Platform

When submitting a job, the user can specify an "allow list" of compute instances (the default is to allow all instances). The OpCenter applies this list when selecting a compute instance for the job. The Goa 2.4 release adds a "deny list," that is, the OpCenter excludes instances on the deny list when selecting a compute instance for the job (the default is not to exclude any instances). If an instance appears on the allow list and on the deny list, the OpCenter excludes the instance.

Feature

Web interface

A user with admin privileges can add, delete, and modify users and groups that have access to the OpCenter. In earlier releases, the CLI must be used to complete these actions. The Goa 2.4 release provides this capability in a Users and Groups screen in the web interface.

Feature

Web interface

The OpCenter has a list of parameters that apply to the system configuration (for example, login session timeout) or that are used as defaults for every job submitted (for example, VM creation policy). In earlier releases, the CLI must be used to modify these parameters. The Goa 2.4 release provides this capability in a System Settings screen in the web interface.

Feature

Web interface

A redesigned Submit Job screen improves usability: for example, it is easier to start a job from a template and the grouping of job submit options is more intuitive.

Feature

Web interface/Platform

Filtering the jobs display by the name of the user who submitted the job is now available using the Jobs screen or the float list --filter owner=[username] command.

Feature

WaveRider

The default policies for WaveRider (automigration) and automatic calculation of the migration step size are now "On." An improved algorithm for calculating the step size applies.

Feature

Google Cloud

Google Cloud FUSE is an open-source adapter that allows Google Cloud users to access Cloud Storage buckets as local file systems. The Goa 2.4 release enables containers deployed by the OpCenter in Google Cloud to use gcsfuse to mount Cloud Storage buckets natively.

Feature

Platform

To issue float commands from a terminal session, the user must first log in to the OpCenter. If the float version and the OpCenter version are different, the user sees a warning message to enter float release sync to make the float version the same as the OpCenter version.

Feature

Platform

An option to select a specific subnet is available when submitting a job. Use the --subnet [subnet_ID] option in the CLI or enter the subnet_ID in the Misc. tab in the web interface's Submit Job screen. The default behavior is for the OpCenter to automatically choose a subnet.

Feature

Nextflow

For use when running a Nextflow pipeline with Wave enabled: Wave is a container-provisioning service (developed by Seqera) that dynamically selects the container image based on task requirements. The image is used once only so there is no need for the OpCenter to cache the image. This feature enables an option to turn off container image caching for a job.

Feature

Preview

The Job Template library includes a template to create a Script-of-Scripts (SOS) polyglot notebook server.

Feature

Preview

Some applications, such as RStudio, can be customized for the user's needs. The configuration details as well as application state are stored in local storage volumes. If the application fails (not the same as a spot reclaim event), the OpCenter reclaims all the cloud resources associated with the job. If a job is submitted with the --errPolicy retainVolumes option, the OpCenter retains all volumes associated with the job (including the root volume). This means that when the application restarts, all the volumes are remounted in the state that existed when the failure occurred.

If the --errPolicy reclaimAll option (the default) is used, the OpCenter reclaims all cloud resources if the application fails.

Bug fix

Platform

Resource cleanup service: The OpCenter maintains a database that tracks cloud resources (compute instances and storage volumes) that are managed by the OpCenter. In some circumstances, the state of the resource in the database diverges from the true state of the resource (for example, resource not in database or resource tagged as inactive). The result is that resources can continue to run in an unmanaged state. The Goa 2.4 release introduces an automatic resource cleanup service that periodically (interval is configurable) audits all resources and removes any resources that are unmanaged.