Viewing Detailed Workspace Information¶
Once you have one or more Workspaces running, you can drill down into each Workspace’s details to monitor resource usage, health, and configuration. This section describes the various tabs and panels that provide insights into your Workspace’s status, including GPU utilization, node allocation, and telemetry data.
Open the Workspace Dashboard¶
-
Navigate to the Workspaces Dashboard
- In the left navigation bar, select the
Workspaces Icon .
- You will land on the Workspaces dashboard, which shows all current Workspaces.
- In the left navigation bar, select the
-
Select a Workspace
- Locate the Workspace you want to inspect.
- Click the Workspace card to open its detailed view.
Overview Tab¶
The Workspace Details screen provides granular insight into a specific workspace.
Workspace Summary Header¶
- Workspace Name & Status: (e.g.,
Workspace: red-ws Ready Checkpointing
). - UID: Internal workspace identifier (e.g.,
proj-red/red-ws
). - Quick Actions:
Stop
,Connect
,Delete
buttons.
Workspace Resource & Metadata Overview¶
Summary of workspace configuration and state:
- Project: Project name (e.g.,
proj-red
,lowest
). - Requested Resources: GPUs (e.g.,
1
), CPU Cores (e.g.,3
), Memory (e.g.,4GiB
). - Utilization: Current GPU and CPU usage percentages.
- Created At: Timestamp of workspace creation.
- Age: Duration since creation.
Workspace Detail Tabs¶
Navigation tabs for more details, with Pods currently active:
- Pods (Active): Lists workspace's Kubernetes pods.
- GPUs In Use
- Nodes
- Node Groups
- Storages
- Metrics
- Conditions
Pagination Controls¶
- Navigation: Arrows to browse pages of pods.
- Items Per Page: Displays current page and allows adjustment (e.g.,
1 / 10 / page
).
Pods Tab¶
The Pods tab focuses on the underlying computational units (Kubernetes pods) that form the workspace.
Pods List Table¶
Details for each pod associated with the workspace:
- Name: Pod's unique name (e.g.,
red-ws-0
). - Ready Containers:
Ready
containers /Total
containers (e.g.,2/2
). - Status: Pod's current operational state (e.g.,
Running
). - Restarts: Number of times the pod has restarted.
- IP: Pod's internal IP address.
- Node: The node where the pod is running.
Benefits¶
- Monitor Pod Health: Verify status and readiness.
- Troubleshoot Workloads: Check restarts and node placement for issues.
- Verify Allocation: Confirm pods are running on expected resources.
'GPUs In Use' Tab¶
Below is an image of what is visibile from the 'GPUs in Use' Tab of Workspace Details:
- GPU Allocation
- Lists the GPUs assigned to this Workspace, including their UUID, Instance ID, Node Group, Node, Model, GPU Utilization, and GPU Memory Utilization
- Real-time or near-real-time graphs showing GPU Usage (percent busy) and GPU Memory consumption.
- Helps identify whether your AI model is efficiently using GPU resources or if there is unused capacity.
Nodes Tab¶
- Node Association
- Shows the node(s) where this Workspace’s pods are running, including Node Name and Status.
- Node Metrics
- Displays GPUs, CPU Cores, Memory, Node Group, and Internal/External IP
Node Groups Tab¶
- Group Membership
- If multiple Node Groups are available, this tab indicates which group currently serves the Workspace.
- Resource Overview
- Summarizes how many reserved and allocated GPUs, CPU cores, and memory resources each Node Group provides.
- Helpful for understanding how workloads are distributed among different Node Groups in your cluster.
Storage Tab¶
- Storage Information
- View information on storage associated with your workspace.
- Provides information on the storage name, status, volume name, capacity, access modes, storage class, volume mode, and mount path.
Metrics Tab¶
- GPU, CPU, and Memory Graphs
- View usage trends over time, identifying performance bottlenecks or resource spikes.
- Enables you to see if usage patterns changed after a model update or new dataset processing.
Conditions Tab¶
- Health Checks
- Displays any current warnings or errors reported by Kubernetes or the workspace environment.
- Examples include node pressure conditions, scheduling failures, or pod restart loops.
- System Events
- Shows a timeline of events like when the Workspace started, stopped, or encountered any scheduling issues.
Best Practices¶
- Regular Monitoring: Check GPU, CPU, and memory usage periodically to ensure you’re using the correct resource configurations.
- Investigate Conditions: Address warnings or errors promptly to maintain a healthy Workspace environment.
- Manage Storage: Attach or resize volumes as your dataset grows.
- Optimize Node Selection: If performance is lagging, consider adjusting Node Group assignments or upgrading hardware resources.