GPU Cluster Manager v0.4.0 Release Notes¶
Overview¶
This release has been re-architected and re-implemented from the previous release.
Highlights¶
- Fractional GPU support for Checkpoint/Restore functionality
- Requires NVIDIA GPUs running 575.x.x or higher.
- Enhanced reporting for Pods, Storages, Metrics, and Conditions
- Support for AMD GPUs:
- New Metrics and Telemetry
- AMD Docker Images for Workspaces
- Automation of AMD GPU Driver and Operator Installation/Management via Helm
- Integration of AMD GPU Usage into the MemVerge.AI Billing Engine
- Update mvaictl to support AMD GPUs
- Extension of MemVerge AI REST API to Include AMD GPU Information
- Automation of a Default AMD GPU Node Group Creation upon Installation
- Enhancment of UI Workload Creation to Specify GPU Vendor
- AMD GPU Compatibility for Resource Reservations
- Enhanced de-installation support with supplementary
cleanup.sh
script.
Bug Fixes¶
Numerous bug fixes and additional content were added to this release.
Known Issues¶
See Known Issues for a complete list.
Deprecations¶
There are no known deprecations for this release.
Upgrade Instructions¶
It is not possible to upgrade from a previous release. See the Install Guide for step-by-step instructions to install this release of MemVerge.ai.
Feedback & Support¶
We value your input and want to ensure a smooth experience with our platform. If you have questions, encounter any issues, or wish to suggest new features, please reach out to us:
- File an Issue or Support Request:
- Visit the Support page to learn how to submit feedback, open a support ticket, or request additional documentation.
- Directly contact your Sales Manager, Product Manager, or Support Engineer.
- Known Issues:
- Check the Known Issues before reporting a new issue.
Your insights help us prioritize enhancements and guide future development, ensuring the platform remains powerful, user-friendly, and aligned with your organization’s AI goals.