Skip to content

Guide for Administrators Use of GPU Cluster Manager

Welcome to the GPU Cluster Manager Admin Guide, which provides detailed instructions on managing your AI environment, including cluster scaling, node management, and other administrative tasks.

Contents

Getting Started

  • Initial Administrator Login
    • Steps for how to find your password and login, reset your password, and verify or configure your server URL
  • Using the Dashboard
    • A guide explaining the different components and areas of GPU Cluster Manager's dashboard and how to quickly navigate around it's many features.
  • Navigation Bar
    • Explaining each icon on the left side navigation bar and the associated dashboard.

Kubernetes Cluster Management

  • Adding Nodes to the Cluster
    • Step-by-step instructions on adding new worker nodes to your K3s cluster. This guide covers node configuration, joining the cluster, and verifying GPU availability.
  • Removing Nodes from the Cluster
    • Instructions for safely removing worker nodes from the K3s cluster, ensuring minimal disruption to running workloads.
  • Renaming Clusters
    • Rename clusters for easy identification and management.

Infrastructure Map

Node & Node Group Management

  • Managing Nodes
    • An introduction to Nodes and Node Groups and managing these entities within a MemVerge AI cluster.
  • Managing Node Groups
    • Guidance on creating, configuring, and managing node groups within the UI. Node groups allow you to organize and target workloads to specific nodes based on hardware or software configurations.

Managing Departments

Managing Projects Overview

Managing Projects

Managing Storage Volumes (Persistent Volume Claims - PVC)

Managing User Workspaces

Billing

  • Managing Billing
    • Review cross charing between departments and get detailed reports for project level costs and charges.

Users and Authentication