Guide for Administrators Use of GPU Cluster Manager¶
Welcome to the GPU Cluster Manager Admin Guide, which provides detailed instructions on managing your AI environment, including cluster scaling, node management, and other administrative tasks.
Contents¶
Getting Started¶
- Initial Administrator Login
- Steps for how to find your password and login, reset your password, and verify or configure your server URL
- Using the Dashboard
- A guide explaining the different components and areas of GPU Cluster Manager's dashboard and how to quickly navigate around it's many features.
- Navigation Bar
- Explaining each icon on the left side navigation bar and the associated dashboard.
Kubernetes Cluster Management¶
- Adding Nodes to the Cluster
- Step-by-step instructions on adding new worker nodes to your K3s cluster. This guide covers node configuration, joining the cluster, and verifying GPU availability.
- Removing Nodes from the Cluster
- Instructions for safely removing worker nodes from the K3s cluster, ensuring minimal disruption to running workloads.
- Renaming Clusters
- Rename clusters for easy identification and management.
Infrastructure Map¶
- Infrastructure Map Dashboard
- View your GPU management infrastructure relationships, inspect individual components, and more.
Node & Node Group Management¶
- Managing Nodes
- An introduction to Nodes and Node Groups and managing these entities within a MemVerge AI cluster.
- Managing Node Groups
- Guidance on creating, configuring, and managing node groups within the UI. Node groups allow you to organize and target workloads to specific nodes based on hardware or software configurations.
Managing Departments¶
- Managing Departments
- An introduction to departments.
- Creating Departments
- Instructions for creating new departments
- Deleting Departments
- Instructions for deleting departments
Managing Projects Overview¶
- Projects Overview Dashboard
- View all projects and obtain key information about each
Managing Projects¶
- Managing Projects
- An introduction to projects.
- Creating Projects
- Create a new project and manage the priority and resources assigned to workloads that run within the project.
- Modifying Projects
- Change project settings.
- Deleting Projects
- Delete a project.
Managing Storage Volumes (Persistent Volume Claims - PVC)¶
- Managing Storage Volumes
- An introduction to Storage Volumes used by User Workspaces.
- Creating Storage Volumes
- Create a new Storage Volumes for user workloads.
- Deleting Projects
- Delete a Storage Volume.
Managing User Workspaces¶
- Managing User Workspaces
- An introduction to User Workspaces
- Creating Workspaces
- Create a new user Workspace
- View Workspace Details and Telemetry
- Monitor workspaces
- Connecting to a Workspace
- Access the UI of a user workspace, for example Jupyter Notebook or VSCode
- Stopping and Starting Workspaces
- Stop and start workspaces
- Deleting Workspaces
- Delete a Workspace
Billing¶
- Managing Billing
- Review cross charing between departments and get detailed reports for project level costs and charges.