Skip to content

GPU Cluster Manager v0.4.0 Release Notes

Overview

This release has been re-architected and re-implemented from the previous release.

Highlights

  • Fractional GPU support for Checkpoint/Restore functionality
    • Requires NVIDIA GPUs running 575.x.x or higher.
  • Enhanced reporting for Pods, Storages, Metrics, and Conditions
  • Support for AMD GPUs:
    • New Metrics and Telemetry
    • AMD Docker Images for Workspaces
    • Automation of AMD GPU Driver and Operator Installation/Management via Helm
    • Integration of AMD GPU Usage into the MemVerge.AI Billing Engine
    • Update mvaictl to support AMD GPUs
    • Extension of MemVerge AI REST API to Include AMD GPU Information
    • Automation of a Default AMD GPU Node Group Creation upon Installation
    • Enhancment of UI Workload Creation to Specify GPU Vendor
    • AMD GPU Compatibility for Resource Reservations
  • Enhanced de-installation support with supplementary mvai-cleanup.sh script.

GPU Feature Matrix: NVIDIA and AMD Support

Below is a summary of current feature support for MemVerge GPU Cluster Manager on NVIDIA and AMD GPUs:

Feature NVIDIA GPUs AMD GPUs
Fractional GPU Allocation ✔️1 X (Planned for a future release)
GPU Partitioning X (NVIDIA MIG)2 No direct equivalent
Transparent Checkpointing ✔️3 X
GPU Orchestration/Sharing ✔️ ✔️
Workload Priority/Preemption ✔️ ✔️
Real-Time Utilization Metrics ✔️ ✔️
AI Training & Inference Jobs ✔️ ✔️

1 Fractional GPU Allocation for NVIDIA: MemVerge GPU Cluster Manager v0.5.0 delivers improved support.

2 GPU Partitioning for NVIDIA: NVIDIA’s Multi-Instance GPU (MIG) is not supported in this release. This is planned for a future release.

3 Transparent Checkpointing for NVIDIA: Requires NVIDIA DataCenter Driver version 575 or higher.

Bug Fixes

Numerous bug fixes and additional content were added to this release.

Known Issues

See Known Issues for a complete list.

Deprecations

There are no known deprecations for this release.

Upgrade Instructions

It is not possible to upgrade from a previous release. See the Install Guide for step-by-step instructions to install this release of MemVerge.ai.

Feedback & Support

We value your input and want to ensure a smooth experience with our platform. If you have questions, encounter any issues, or wish to suggest new features, please reach out to us:

  • File an Issue or Support Request:
  • Visit the Support page to learn how to submit feedback, open a support ticket, or request additional documentation.
  • Directly contact your Sales Manager, Product Manager, or Support Engineer.
  • Known Issues:
  • Check the Known Issues before reporting a new issue.

Your insights help us prioritize enhancements and guide future development, ensuring the platform remains powerful, user-friendly, and aligned with your organization’s AI goals.