Skip to content

GPU Cluster Manager v0.5.0 Release Notes

Overview

This release has been re-architected and re-implemented from the previous release.

Highlights

  • Enhances Fractional GPU (NVIDIA) support
  • Creates New Upgrade path from 0.4.0 to 0.5.0
  • Versions prior to 0.4.0 must uninstall completely before re-installing new version.
  • Introduces Multi-GPU workload checkpoint/restore
  • Adds AMD GPU Telemetry, Metrics Collection, and Visualization support in Mamangement Server UI
  • General bug fixes and improvements that could not be implmented in previous release.

GPU Feature Matrix: NVIDIA and AMD Support

Below is a summary of current feature support for MemVerge GPU Cluster Manager on NVIDIA and AMD GPUs:

Feature NVIDIA GPUs AMD GPUs
Fractional GPU Allocation ✔️ X (Planned for a future release)
GPU Partitioning ✔️ No direct equivalent
Transparent Checkpointing ✔️3 X
GPU Orchestration/Sharing ✔️ ✔️
Workload Priority/Preemption ✔️ ✔️
Real-Time Utilization Metrics ✔️ ✔️
AI Training & Inference Jobs ✔️ ✔️

3 Transparent Checkpointing for NVIDIA: Requires NVIDIA DataCenter Driver version 575 or higher.

Bug Fixes

Numerous bug fixes and additional content were added to this release.

Known Issues

See Known Issues for a complete list.

Deprecations

There are no known deprecations for this release.

Upgrade Instructions

It is now possible to upgrade from 0.4.0 to 0.5.0. See the Install Guide - Upgrading GPU Cluster Manager for step-by-step instructions to upgrade this release of MemVerge.ai. If you are running a version beneath 0.4.0, you must uninstall your existing version and then re-install 0.5.0. See the Install Guide for step-by-step instructions.

Feedback & Support

We value your input and want to ensure a smooth experience with our platform. If you have questions, encounter any issues, or wish to suggest new features, please reach out to us:

  • File an Issue or Support Request:
  • Visit the Support page to learn how to submit feedback, open a support ticket, or request additional documentation.
  • Directly contact your Sales Manager, Product Manager, or Support Engineer.
  • Known Issues:
  • Check the Known Issues before reporting a new issue.

Your insights help us prioritize enhancements and guide future development, ensuring the platform remains powerful, user-friendly, and aligned with your organization’s AI goals.