GPU Cluster Manager v1.0.0 Release Notes¶

Overview¶

This release has been re-architected and re-implemented from the previous release.

Highlights¶

Enhances Fractional GPU (NVIDIA) support
Introduces Multi-GPU workload checkpoint/restore
Adds AMD GPU Telemetry, Metrics Collection, and Visualization support in Mamangement Server UI
General bug fixes and improvements that could not be implmented in previous release.

GPU Feature Matrix: NVIDIA and AMD Support¶

Below is a summary of current feature support for MemVerge GPU Cluster Manager on NVIDIA and AMD GPUs:

Feature	NVIDIA GPUs	AMD GPUs
Fractional GPU Allocation	✔️	X (Planned for a future release)
GPU Partitioning	✔️	No direct equivalent
Transparent Checkpointing	✔️¹	X
GPU Orchestration/Sharing	✔️	✔️
Workload Priority/Preemption	✔️	✔️
Real-Time Utilization Metrics	✔️	✔️
AI Training & Inference Jobs	✔️	✔️

¹ Transparent Checkpointing for NVIDIA: Requires NVIDIA DataCenter Driver version 575 or higher.

Bug Fixes¶

Numerous bug fixes and additional content were added to this release.

Known Issues¶

See Known Issues for a complete list.

Deprecations¶

There are no known deprecations for this release.

Feedback & Support¶

We value your input and want to ensure a smooth experience with our platform. If you have questions, encounter any issues, or wish to suggest new features, please reach out to us:

File an Issue or Support Request:
Visit the Support page to learn how to submit feedback, open a support ticket, or request additional documentation.
Directly contact your Sales Manager, Product Manager, or Support Engineer.
Known Issues:
Check the Known Issues before reporting a new issue.

Your insights help us prioritize enhancements and guide future development, ensuring the platform remains powerful, user-friendly, and aligned with your organization’s AI goals.