Skip to content

GPU Cluster Manager v0.4.0 Release Notes

Overview

This release has been re-architected and re-implemented from the previous release.

Highlights

  • Fractional GPU support for Checkpoint/Restore functionality
    • Requires NVIDIA GPUs running 575.x.x or higher.
  • Enhanced reporting for Pods, Storages, Metrics, and Conditions
  • Support for AMD GPUs:
    • New Metrics and Telemetry
    • AMD Docker Images for Workspaces
    • Automation of AMD GPU Driver and Operator Installation/Management via Helm
    • Integration of AMD GPU Usage into the MemVerge.AI Billing Engine
    • Update mvaictl to support AMD GPUs
    • Extension of MemVerge AI REST API to Include AMD GPU Information
    • Automation of a Default AMD GPU Node Group Creation upon Installation
    • Enhancment of UI Workload Creation to Specify GPU Vendor
    • AMD GPU Compatibility for Resource Reservations
  • Enhanced de-installation support with supplementary cleanup.sh script.

Bug Fixes

Numerous bug fixes and additional content were added to this release.

Known Issues

See Known Issues for a complete list.

Deprecations

There are no known deprecations for this release.

Upgrade Instructions

It is not possible to upgrade from a previous release. See the Install Guide for step-by-step instructions to install this release of MemVerge.ai.

Feedback & Support

We value your input and want to ensure a smooth experience with our platform. If you have questions, encounter any issues, or wish to suggest new features, please reach out to us:

  • File an Issue or Support Request:
  • Visit the Support page to learn how to submit feedback, open a support ticket, or request additional documentation.
  • Directly contact your Sales Manager, Product Manager, or Support Engineer.
  • Known Issues:
  • Check the Known Issues before reporting a new issue.

Your insights help us prioritize enhancements and guide future development, ensuring the platform remains powerful, user-friendly, and aligned with your organization’s AI goals.