Managing Node Groups¶
What is a Node Group?¶
A Node Group is a logical collection of Kubernetes nodes that share common hardware characteristics, such as CPU and GPU manufacturer/model. Node groups enable administrators to:
- Target Workloads: Schedule specific applications (e.g., GPU-intensive AI jobs) to nodes with compatible hardware.
- Simplify Management: Apply policies, quotas, or updates to groups of nodes collectively.
- Optimize Costs: Isolate workloads to nodes with cost-effective hardware for their requirements.
Key Features¶
- Resource Allocation: Reserve nodes for critical workloads or teams.
- Scalability: Dynamically add/remove nodes as demand changes.
Node Group Dashboard¶
The Node Group dashboard can be divided into two primary sections:
Node Group Management Controls:¶
- "Search Node Groups" Field: An input field with a magnifying glass icon, enabling administrators to search for specific node groups by name.
- "+ New Node Group" Button: A button used to initiate the process of creating a new node group.
Node Group Card:¶
(Example: ng-amd-23.49-nvidia-a10g-12.4):
Each card represents a single node group and displays its essential information and management options:
- Node Group Icon: A visual identifier for the node group (e.g., stylized cube).
- Node Group Name: The unique name of the node group (e.g., ng-amd-23.49-nvidia-a10g-12.4).
- Node Group Tag: A custom tag associated with the node group (e.g., corporate), useful for categorization.
-
Node Health Summary:
- "Nodes" Count: The total number of nodes within this group (e.g., 1).
- Status Indicators: Small circles representing the health status of nodes within the group.
Name Icon Ready Not Ready Unknown -
Aggregated Resource Summary
- GPUs: Total number of GPUs available across all nodes in the group.
- CPU Cores: Total number of CPU cores available across all nodes in the group.
- CPU Memory: Total amount of memory (in GiB) available across all nodes in the group.
- Action Icons:
- Edit Icon (Pen): Allows administrators to modify the settings or properties of the node group.
- Delete Icon (Trash Can): Used to remove or delete the node group from the system.
Node Group Operations¶
- Create a Node Group:
- Create a new node group to organize nodes with shared attributes (e.g., GPU type, memory capacity).
- Remove a Node Group:
- Remove a node group when it’s no longer needed. This does not delete the nodes themselves.
- Edit a Node Group:
- Modify group properties, such as adding or removing nodes from the node group, etc.