GPU Cluster Manager v0.5.0 Release Notes¶
Overview¶
This release has been re-architected and re-implemented from the previous release.
Highlights¶
- Enhances Fractional GPU (NVIDIA) support
- Creates New Upgrade path from 0.4.0 to 0.5.0
- Versions prior to 0.4.0 must uninstall completely before re-installing new version.
- Introduces Multi-GPU workload checkpoint/restore
- Adds AMD GPU Telemetry, Metrics Collection, and Visualization support in Mamangement Server UI
- General bug fixes and improvements that could not be implmented in previous release.
GPU Feature Matrix: NVIDIA and AMD Support¶
Below is a summary of current feature support for MemVerge GPU Cluster Manager on NVIDIA and AMD GPUs:
Feature | NVIDIA GPUs | AMD GPUs |
---|---|---|
Fractional GPU Allocation | ✔️ | X (Planned for a future release) |
GPU Partitioning | ✔️ | No direct equivalent |
Transparent Checkpointing | ✔️3 | X |
GPU Orchestration/Sharing | ✔️ | ✔️ |
Workload Priority/Preemption | ✔️ | ✔️ |
Real-Time Utilization Metrics | ✔️ | ✔️ |
AI Training & Inference Jobs | ✔️ | ✔️ |
3 Transparent Checkpointing for NVIDIA: Requires NVIDIA DataCenter Driver version 575 or higher.
Bug Fixes¶
Numerous bug fixes and additional content were added to this release.
Known Issues¶
See Known Issues for a complete list.
Deprecations¶
There are no known deprecations for this release.
Upgrade Instructions¶
It is now possible to upgrade from 0.4.0 to 0.5.0. See the Install Guide - Upgrading GPU Cluster Manager for step-by-step instructions to upgrade this release of MemVerge.ai. If you are running a version beneath 0.4.0, you must uninstall your existing version and then re-install 0.5.0. See the Install Guide for step-by-step instructions.
Feedback & Support¶
We value your input and want to ensure a smooth experience with our platform. If you have questions, encounter any issues, or wish to suggest new features, please reach out to us:
- File an Issue or Support Request:
- Visit the Support page to learn how to submit feedback, open a support ticket, or request additional documentation.
- Directly contact your Sales Manager, Product Manager, or Support Engineer.
- Known Issues:
- Check the Known Issues before reporting a new issue.
Your insights help us prioritize enhancements and guide future development, ensuring the platform remains powerful, user-friendly, and aligned with your organization’s AI goals.