Skip to content

New Features in MMCloud Jericho 3.1 Release

Date Released

Jericho 3.1 released on April 1, 2025.

Supported Clouds

MMCloud is designed to work on any cloud infrastructure. The Jericho 3.1 release supports the following clouds:

  • AWS
  • Google Cloud (except for HPC mode)
  • Alibaba Cloud
  • AWS China
  • Tencent Cloud

New Features in Jericho 3.1 Release

Type Domain Description
Feature Platform High Performance Computing mode (HPC mode) allows an MMCloud subscriber to use the OpCenter to create (and manage) a cluster of compute nodes that execute jobs scheduled from an input queue.
  • HPC mode is supported on all cloud services except Google Cloud
  • The HPC cluster comprises three types of nodes.
    • One or more Login nodes from which users submit and manage jobs
    • One Head node (also known as the Control Node) that provides the scheduling service (SLURM)
    • One or more Compute nodes which are dynamically created and destroyed in response to the workloads applied
  • Two services are mandatory for the operation of the cluster.
    • LDAP service for authentication and access control
    • Shared Storage so that all nodes have access to the same tools and directories
Feature UX System Status Dashboard, available in the web interface, allows an admin user to see the current status, in a single screen, of jobs, resources, instances, and utilization. Aggregated totals over past periods (this week, this month, and so on) are also available. If the OpCenter is used in HPC mode, information on resource utilization in clusters is shown.
Feature Security Multiple LDAP and NIS server registration allows the OpCenter to provide redundant distributed directory services to ensure consistent system configuration data (such as user and host names).
  • Multiple LDAP servers are used for redundancy and load sharing as well as partitioning of network resources into domains.
  • Network Information Service (NIS) evolved from Sun Microsystems' YP (Yellow Pages) service for distributing configuration information. LDAP has largely replaced NIS today.
Feature Platform Container image caching using volume snapshots decreases the time to instantiate a worker node compared with loading the container image directly from S3 or an NFS server.
  • Every worker node requires a storage volume to act as the container root volume. The container image is loaded directly from the OpCenter or from a cache maintained on an NFS server or in AWS S3.
  • If the container image is large (multiple GB), the time to load the image into the root volume becomes a significant fraction of the time to instantiate a worker node.
  • To overcome this, the 3.1 release allows a user to snapshot the container root volume using AWS's EBS snapshot service. When a new worker node starts with the same image, the volume snapshot is used to instantiate a new container image volume. This process is highly optimized so the time to start a new worker node is not impacted by the size of the container image.
Feature Platform Instance cost as a trigger for job migration avoids situations where jobs migrate to high cost instances and remain there until the jobs complete.
  • Overutilization of resources (for example, CPU or memory) can trigger job migration to a virtual machine with more resources. When selecting a new instance, cost is not considered so it is possible for a job to migrate to a high cost virtual machine and remain there until the job completes.
  • This feature allows the user to specify a price floor so that if a job migrates to a virtual machine with an hourly rate higher than the price floor, the OpCenter will look for a less costly virtual machine (that still meets the resource requirements).
Feature Platform Per availability zone quarantining of instance types improves global spot instance availability by restricting quarantining of spot instance types to the availability zones where those spot instances are reclaimed.
  • If the cloud service provider reclaims a spot instance, it can mean that instance type is at an increased risk of reclamation.
  • To avoid multiple job migrations because of spot instance reclaims, previous releases quarantine an instance type after a reclaim event. However, spot instance reclaims do not occur uniformly across availability zones, so a global quarantine policy sometimes limits the availability of an instance type.
  • In the 3.1 release, the quarantine policy is limited to the availability zone in which the reclaim event occurs.
Feature UX Improved Cost Summary page, available to admin users of the web interface, clarifies how cloud costs are generated and savings calculated.
Feature Platform Refactoring of OpCenter software, especially with regard to data structures and data retrieval, improves performance and reduces memory usage in the OpCenter.

To implement the new data structures, users upgrading to the 3.1 release from an earlier release must follow the prescribed upgrade procedure. Alternatively, users can start a new OpCenter, already running the 3.1 release, from the AWS Marketplace.
Feature Platform OpCenter manages up to twenty million jobs as a result of software refactoring. Note that the total number of jobs includes running jobs, completed jobs, stopped jobs, initializing jobs, failed jobs, and so on.