Known Issues¶

This section outlines the currently known issues in this release. We are actively working on resolving these issues and will provide updates in future releases.

Reporting New Issues¶

If you encounter an issue not listed here, please report it to our support team. Visit the Support Page for more information.

Known Issue List¶

Non-Admin Users May Not see some Telemetry Data
Issue #: AIP-572
Severity: Low
Affected Components: Telemetry/Metrics
Impacted Versions: 0.3.0
Description: In some screens, tabs, or panes, non-admin users may not see telemetry data. For example, the detailed workspace tab may not correctly display GPU Utilization metrics.
Cause: RBAC permissions on non-admin users are prevent access some telemetry metrics.
Workaround: None
Solution: This issue will be fixed in a future release.

Install or Uninstall of MemVerge.ai may take several hours
Issue #: AIP-551
Severity: Low
Affected Components: Install/Uninstall
Impacted Versions: 0.3.0
Description: In some cases, the installation or uninstallation process can take a very long time. During the installation phase, several components, some quite large, are downloaded from the Internet. If your Internet connection is slow or throttled by firewalls, this will impact the time to install the product.
Workaround: None. Do not cancel the operation, it will eventually complete.
Cause: Unknown. Each situation is different and specific to the environment.
Solution: Improvements to logging will be delivered in a future release that will assist in troubleshooting install/uninstall issues.

NVidia GPU PIDs/TIDs or Usage in the Workspace Terminal
Issue #: AIP-509
Severity: Low
Affected Components: Workspaces
Impacted Versions: 0.3.0+
Description: Inside a user workspace, running nvidia-smi will show some, but not all all the information.
Cause: This issue is a known security limitation of the NVidia Operator and Containers. See Cannot see gpu threads in container for more information.
Workaround: None
Solution: None

Using a 'local' StorageClass Persistent Volume Claims (PVC) for Workspaces may not be reusable once the Workspace is Deleted
Issue #: AIP-482
Severity: Low
Affected Components: Workspace Volumes
Impacted Versions: 0.3.0+
Description: If a Volume is created using the local StorageClass, eg a local NVME SSD, and used by a workspace, once the workspace is stopped and deleted, the volume may not be useable to any other workspace.
Cause: Once a PVC is claimed by a workspace pod, the status becomes Bound. If the pod is deleted, a new workload cannot reuse the PVC. If the workspace pod attempts to start on another node, it won't have access to the assigned PVC residing on another worker node.
Workaround: None
Solution: Always use storage accessible by all worker nodes in the cluster. NFS or similar should be used to avoid this issue.

All Fields in the 'Edit Volume' screen may be disabled, or Modifications made using the 'Edit Volume' screen may not appear to take effect.
Issue #: AIP-480
Severity: Low
Affected Components: Workspace Volumes
Impacted Versions: 0.3.0+
Description: When editing a Volume, all fields may be disabled (greyed out) or changes to the size/capacity may not take effect.
Cause: This behavior of this issue depends on whether the Volume has an assigned Workspace, whether the Workspace is actively running, stopped, suspended due to a checkpoint, and more. Additionally, changing the volume capacity changes the PVC capacity. Initially you will see the requested size is the new capacity, but the underlying volume may not change immediately, or at all, depending on the storage class type. For example chaning a PVC from 5Gi to 10Gi, you will see the request size is 10Gi, but the capacity of 5Gi.
Workaround: Changes to Volumes should be done when there are no workspaces using the volume.
Solution: In the future, we will improve this workflow.

Long Node Group, Department, or Project Names my be Truncated in some areas of the UI
Issue #: AIP-472
Severity: Low
Affected Components: UI
Impacted Versions: 0.3.0+
Description: On small or low-resolution screens, long entity names may become truncated on some pages in the UI.
Cause: Some pages, such as 'Create Project' have a lot of information and tables that have many fields. To fit as much information as possible into the screen realestate, some fields are reduced, truncating the NodeGroup Name.
Workaround: If possible, shrink the text size in your browser to fit more information on the screen.
Solution: In the future, we will improve this workflow.

User Account Retention Policy is Disabled
Issue #: AIP-469
Severity: Low
Affected Components: RBAC/Security
Impacted Versions: 0.3.0+
Description: User account retention policy - automatically disables dormant accounts - has been disabled.
Cause: N/A
Workaround: None
Solution: In the future, we will improve this workflow and feature to allow dormant accounts to automatically become disabled, preventing those users from logging in. This is a security improvement.

Workspace Storage Usage using NFS is not Enforced
Issue #: AIP-454
Severity: Low
Affected Components: Workspace Volumes/Storage
Impacted Versions: 0.3.0+
Description: When workspace volumes created backed by NFS, users may be able to commit more data than was requested. For example, if the user creates a 10GB volume, more than 10GB can be written without ENOSPC or other messages.
Cause: This is a known issue. See nfs-subdir-external-provisioner: No restrictions on PVC
Workaround: None
Solution: None

Multi-Pod Workloads/Workspaces may not be correctly admitted to the Kubernetes cluster
Issue #: AIP-394, AIP-397, AIP-395
Severity: Low
Affected Components: Workspaces
Impacted Versions: 0.3.0+
Description: When a workspace with multiple kubernetes pods is created, it is possible only some of the pods may be deployed.
Cause: If there is not enough resources in the chosen project, a workload that needs to span multiple projects won't be scheduled.
Workaround: None
Solution: In the future, we will improve this workflow and feature to allow multi-pod workloads to borrow resources from other projects when resources are unavailable in the primary project.

Creating a Node Group using the same GPU Make/Model in different Modes is not Supported
Issue #: AIP-392
Severity: Low
Affected Components: Node Group
Impacted Versions: 0.3.0+
Description: Creating a Node Group using the same underlying GPU Make/Model, but the GPU is in a different mode (NVidia MIG, for example), are not supported. Node Group creation requires GPUs of the same make/model and mode.
Cause: Hybrid GPU configuration are not supported in the same Node Group.
Workaround: None
Solution: Ensure all GPUs have been cofigured in the same mode before creating a Node Group.

A Default Node Group is not Automatically Created when the Kubernetes Cluster is a Single Server, or when the Management/Control Plane Node has GPUs
Issue #: AIP-388
Severity: Low
Affected Components: Node Group
Impacted Versions: 0.3.0+
Description: MemVerge.ai will create a default node group after installation. In very small clusters where the cluster is a single node or when the control plane is installed on a GPU Worker, a default node group may not be automatically created.
Cause: This is expected behaviour due to workloads running on the control/management node may cause performance issues when under high workload demand.
Workaround: None
Solution: Manually create node groups using the available node or install the control/management plane on a dedicated CPU-only host.

Cannot Create a Node Group with the same Name assigned to Different Departments
Issue #: AIP-535
Severity: Low
Affected Components: Node Group
Impacted Versions: 0.3.0+
Description: When creating a new Node Group using the same name as an existing Node Group will fail, even if the Node Group is assigned to a different Department.
Cause: Names are globally scoped across the cluster rather than at a Department level/scope.
Workaround: None
Solution: Ensure Node Group names are unique across the entire cluster.

Creating a new Department using a previously deleted Department Name will have old Billing/Usage Information
Issue #: AIP-609
Severity: Low
Affected Components: Billing/Department
Impacted Versions: 0.3.0+
Description: When creating a new Department using the same name as a previously deleted department, the billing information from the previous department may be accessible.
Cause: This is by design. Billing information primarily uses the Name for uniqueness. Reusing an old name will show any historical information.
Workaround: None
Solution: Ensure Department names are unique, unless this is desired.

OAuth does not work with Enterprise GitHub Accounts
Issue #: AIP-503
Severity: Low
Affected Components: Security
Impacted Versions: 0.3.0+
Description: When the GitHub Oauth Provider is Enabled, Enterprise GitHub users cannot login.
Cause: This is an issue in the Rancher OAuth provider.
Workaround: None
Solution: A fix will be made available in a future release.