Understanding Workload Priority and Preemption¶
Preemption ensures that higher-priority workloads can acquire resources even if lower-priority workloads are already using them. In this simple demonstration, you will create two Projects with different priority levels and observe how a running medium-priority workload is paused when a higher-priority workload starts—and how it automatically resumes once the higher-priority job finishes.
PreRequsites¶
- A Node Group with a single GPU that will be used by workloads in the two Projects, one at a time.
- At least one non-admin user exists. Create a new local user if required.
1. Create Two Projects with Different Priorities¶
- Open the Projects Dashboard
- Login to the MemVerge.ai UI as a Platform Administrator.
- Select Projects from the left navigation bar.
- Click + Create Project to add your first Project
- Complete the required fields. See Managing Projects for more information.
- Assign Priority
- In the Priority field, choose Medium for this first Project.
- Create the Second Project
- Repeat the process for a second Project (e.g., high-priority-project).
- Set the Priority to High or Highest.
2. Assign the Same User to Both Projects¶
There are two methods to assign users to a project:
Method 1:
- Go to Users & Authentication → Users.
- Click on the username that will run both workloads. The user details page will open.
- Click + Add to Project
- Select each newly created Project (medium priority, high priority) from the dropdown.
- Click Confirm
- Repeat Step 3 so the user has permissions to create Workspaces in both Projects.
Method 2:
- Navigate to Projects
- Click the Name of the project to see the project details
- Navigate to the Members tab
- Click + Add Members
- Search or select the user
- Click Confirm
3. Login as the User¶
- Logout of the AI Platform as the Platform Admin
- Login as the user
4. Prepare a Storage Volume¶
- Create or Use an Existing Volume
- From Storage → Volumes, choose + New Volume.
- Enter a name and allocate size for your AI workloads (e.g.,
ml-training-vol
). - This volume can be used by both Projects if attached to each Workspace configuration.
5. Create and Start a Workspace in the Medium-Priority Project¶
- Select the Medium Project
- In the Projects dashboard, find medium-priority-project.
- Create a Workspace
- Click + Create Workspace. Assign the new workspace to the medium-priority-project.
- Attach the ml-training-vol if needed for your workload’s data.
- Launch the Workspace
- Wait for it to reach a Running or Ready state.
5. Create and Start a Workspace in the High-Priority Project¶
- Switch Projects
- In Projects, find high-priority-project.
- Create a Workspace
- Similar steps as above: + Create Workspace, attach any required volume (can be the same or a new one).
- Observe Preemption
- Upon launching this high-priority workspace, the platform will pause (preempt) the medium-priority workspace’s job if the cluster resources are insufficient to run both simultaneously.
- The medium-priority workspace status transitions to Preempted or NotReady, indicating it has been paused.
6. Stopping the High-Priority Workload¶
- Stop the High-Priority Workspace
- In the Workspaces dashboard, locate the workspace running under high-priority-project.
- Click Stop, then confirm in the popup dialog.
- Automatic Resume
- Once the high-priority workload fully stops, the platform detects freed resources.
- The medium-priority workspace automatically resumes from its preemption state.
7. Confirm the Medium-Priority Workspace Has Resumed¶
- Check Status
- Refresh the Workspaces dashboard. The medium-priority workspace should revert from Preempted or NotReady back to Running or Ready.
- Verify Workload Progress
- Connect to the resumed workspace (if interactive) or check logs to ensure the job continues from where it left off.
Congratulations! You have successfully demonstrated a simple preemption scenario using two workloads in different priority Projects. This setup ensures mission-critical or time-sensitive jobs always have the necessary resources, while lower-priority jobs automatically pause and resume as availability changes.