Skip to content

Managing Node Groups

What is a Node Group?

A Node Group is a logical collection of Kubernetes nodes that share common hardware characteristics, such as CPU and GPU manufacturer/model. Node groups enable administrators to:

  • Target Workloads: Schedule specific applications (e.g., GPU-intensive AI jobs) to nodes with compatible hardware.
  • Simplify Management: Apply policies, quotas, or updates to groups of nodes collectively.
  • Optimize Costs: Isolate workloads to nodes with cost-effective hardware for their requirements.

Key Features

  • Resource Allocation: Reserve nodes for critical workloads or teams.
  • Scalability: Dynamically add/remove nodes as demand changes.

Node Group Dashboard

Node Group Dashboard The Node Group dashboard can be divided into two primary sections:

Node Group Management Controls:

  • "Search Node Groups" Field: An input field with a magnifying glass icon, enabling administrators to search for specific node groups by name.
  • "+ New Node Group" Button: A button used to initiate the process of creating a new node group.

Node Group Card:

(Example: ng-amd-23.49-nvidia-a10g-12.4):

Each card represents a single node group and displays its essential information and management options:

  • Node Group Icon: A visual identifier for the node group (e.g., stylized cube).
  • Node Group Name: The unique name of the node group (e.g., ng-amd-23.49-nvidia-a10g-12.4).
  • Node Group Tag: A custom tag associated with the node group (e.g., corporate), useful for categorization.
  • Node Health Summary:

    • "Nodes" Count: The total number of nodes within this group (e.g., 1).
    • Status Indicators: Small circles representing the health status of nodes within the group.
    Name Icon
    Ready Green Checkmark Icon
    Not Ready Red Exclamation Point Icon
    Unknown Grey Question Mark Icon
  • Aggregated Resource Summary

    • GPUs: Total number of GPUs available across all nodes in the group.
    • CPU Cores: Total number of CPU cores available across all nodes in the group.
    • CPU Memory: Total amount of memory (in GiB) available across all nodes in the group.
  • Action Icons:
    • Edit Icon (Pen): Allows administrators to modify the settings or properties of the node group.
    • Delete Icon (Trash Can): Used to remove or delete the node group from the system.

Node Group Operations

  • Create a Node Group:
  • Remove a Node Group:
    • Remove a node group when it’s no longer needed. This does not delete the nodes themselves.
  • Edit a Node Group: