Enabling Transparent Checkpointing¶
The MemVerge Transparent Checkpoint Operator is activated by adding specific labels to your Kubernetes Pod specifications. These labels can be applied directly in your YAML manifests or dynamically using kubectl
.
Applying Labels in Pod Specifications¶
To enable checkpointing for a specific pod, add the memverge.ai/checkpoint-mode: 'true'
label to the metadata.labels
section of your Pod specification or within the template.metadata.labels
of your workload controller (e.g., Deployment, StatefulSet, Job).
Example: Enabling checkpointing for a Job
apiVersion: batch/v1
kind: Job
metadata:
name: my-checkpointed-job
spec:
template:
metadata:
labels:
memverge.ai/checkpoint-mode: 'true' # Enable checkpointing for pods created by this Job
memverge.ai/checkpoint-volume-size: 2Gi # Optional: Specify checkpoint volume size
spec:
containers:
- name: my-container
image: my-image:latest
restartPolicy: Never
When the Job creates a Pod, the memverge.ai/checkpoint-mode: 'true'
label will instruct the operator to automatically checkpoint the pod's state when it is deleted (e.g., upon successful completion or failure). If the pod is recreated, the operator will automatically restore its state from the latest checkpoint.
Important Note for Workload Controllers: For controllers like Deployments, StatefulSets, and Jobs, apply the MemVerge labels to the template.metadata.labels
section. This ensures that all Pods created by the controller will inherit these labels. Modifying the labels of the controller itself will not affect existing Pods.
Setting Default Labels for Future Pods in a Namespace (using Mutating Admission Webhooks)¶
While not a direct kubectl
command, you can configure Mutating Admission Webhooks (if your Kubernetes cluster supports them) to automatically add MemVerge labels to newly created Pods within a specific namespace. This approach ensures that all future Pods in that namespace will have checkpointing enabled by default. The configuration of such webhooks is beyond the scope of this basic user guide but is a powerful way to enforce checkpointing policies.