MemVerge Transparent Checkpoint Operator User Guide¶
This guide explains how to use the MemVerge Transparent Checkpoint Operator to enable automatic checkpointing and restoration for your Kubernetes workloads. By applying specific labels to your Pod specifications, you can instruct the operator to manage the lifecycle of your application's running state.
Enabling Transparent Checkpointing¶
The MemVerge Transparent Checkpoint Operator is activated by adding specific labels to your Kubernetes Pod specifications. These labels can be applied directly in your YAML manifests or dynamically using kubectl
.
Applying Labels in Pod Specifications¶
To enable checkpointing for a specific pod, add the memverge.ai/checkpoint-mode: 'true'
label to the metadata.labels
section of your Pod specification or within the template.metadata.labels
of your workload controller (e.g., Deployment, StatefulSet, Job).
Example: Enabling checkpointing for a Job
apiVersion: batch/v1
kind: Job
metadata:
name: my-checkpointed-job
spec:
template:
metadata:
labels:
memverge.ai/checkpoint-mode: 'true' # Enable checkpointing for pods created by this Job
memverge.ai/checkpoint-volume-size: 2Gi # Optional: Specify checkpoint volume size
spec:
containers:
- name: my-container
image: my-image:latest
restartPolicy: Never
When the Job creates a Pod, the memverge.ai/checkpoint-mode: 'true'
label will instruct the operator to automatically checkpoint the pod's state when it is deleted (e.g., upon successful completion or failure). If the pod is recreated, the operator will automatically restore its state from the latest checkpoint.
Important Note for Workload Controllers: For controllers like Deployments, StatefulSets, and Jobs, apply the MemVerge labels to the template.metadata.labels
section. This ensures that all Pods created by the controller will inherit these labels. Modifying the labels of the controller itself will not affect existing Pods.
Applying Labels to Existing Pods using kubectl label
¶
You can also add MemVerge labels to running Pods using the kubectl label
command. This is useful for enabling checkpointing for existing deployments without modifying their original specifications.
Example: Enabling checkpointing for an existing Pod named my-running-pod
To specify a checkpoint volume size for the same pod:
Note: Labels applied using kubectl label
are live changes to the Pod object. However, if the Pod is managed by a controller, these changes might be overwritten upon the next reconciliation of the controller. For persistent label changes in managed Pods, it's recommended to update the controller's Pod template.
Applying Labels to All Pods in a Namespace¶
You can apply a label to all existing Pods within a specific namespace using kubectl label
with a selector.
Example: Enabling checkpointing for all Pods in the default
namespace
Caution: Applying labels to all Pods in a namespace can have unintended consequences if not done carefully. Ensure you understand the impact on all applications running in that namespace before executing such a command.
Applying Labels to Pods Based on Existing Selectors¶
You can target a specific set of Pods based on their existing labels using a selector with kubectl label
.
Example: Enabling checkpointing for all Pods with the label app=my-app
Setting Default Labels for Future Pods in a Namespace (using Mutating Admission Webhooks)¶
While not a direct kubectl
command, you can configure Mutating Admission Webhooks (if your Kubernetes cluster supports them) to automatically add MemVerge labels to newly created Pods within a specific namespace. This approach ensures that all future Pods in that namespace will have checkpointing enabled by default. The configuration of such webhooks is beyond the scope of this basic user guide but is a powerful way to enforce checkpointing policies.
Removing Checkpointing Labels¶
To disable transparent checkpointing for a Pod, you can remove the MemVerge-related labels.
Removing Labels from Specific Pods using kubectl label
¶
Use the kubectl label --overwrite
command with a hyphen (-
) at the end of the label name to remove it.
Example: Disabling checkpointing for a Pod named my-checkpointed-pod
To remove the checkpoint volume size label as well:
Removing Labels from Workload Controller Templates¶
To permanently disable checkpointing for Pods managed by a controller, you need to remove the MemVerge labels from the template.metadata.labels
section of the controller's specification and then apply the updated specification. Existing Pods will retain the label until they are recreated or updated by the controller.
Example: Disabling checkpointing in a Deployment
-
Edit the Deployment:
-
Remove the
memverge.ai/checkpoint-mode
and any other MemVerge labels from thetemplate.metadata.labels
section. - Save and close the editor. The Deployment will reconcile, and new Pods created will not have the checkpointing labels. You might need to manually delete existing Pods for the changes to take full effect on all instances.
Removing Labels from All Pods in a Namespace¶
Similar to adding labels, you can remove a label from all Pods in a namespace using kubectl label
with the --all
selector and the label name followed by a hyphen.
Example: Disabling checkpointing for all Pods in the mynamespace
namespace
Caution: Exercise caution when removing labels from all Pods in a namespace, as it will affect all applications running there.
Removing Labels from Pods Based on Selectors¶
You can remove labels from a specific set of Pods based on their existing labels.
Example: Disabling checkpointing for all Pods with the label app=legacy-app
Complete List of Labels¶
The following table describes the labels supported by the MemVerge Transparent Checkpoint Operator:
Label | Description |
---|---|
memverge.ai/checkpoint-mode |
Set to true to enable MemVerge transparent checkpoint/restore service. |
memverge.ai/checkpoint-containers |
List of container names to be checkpointed, delimited by comma. If not set, all containers except istio-proxy and nginx-proxy are checkpointed. |
memverge.ai/checkpoint-storage-volume |
An existing volume in the pod used for checkpoint storage. If not set, a dynamically provisioned PV is used for checkpoint storage. The PV's lifecycle is controlled by the operator, which requires that the pod has a controller. This option is required for plain pods (no workload controller, i.e., StatefulSet, Job, CronJob, etc.). The user must manage the lifecycle of the volume. |
memverge.ai/checkpoint-storage-class |
The StorageClass name used to dynamically provision the Persistent Volume for checkpoint storage. If not set, the default StorageClass is used. It is ignored if memverge.ai/checkpoint-storage-volume is set. |
memverge.ai/checkpoint-volume-size |
The size of the Persistent Volume for checkpoint storage. If not set, it is computed by summation of the memory limits of all containers in the pod. It is ignored if memverge.ai/checkpoint-storage-volume is set. |
memverge.ai/checkpoint-files |
List of files/directories to be checkpointed, delimited by comma. |
memverge.ai/irmap-scan-paths |
List of paths for irmap scan, delimited by comma. |
memverge.ai/checkpoint-diagnosis |
Set to true to preserve checkpoint images and logs for diagnostic purposes. |
By understanding and applying these labels, you can effectively manage the checkpointing behavior of your Kubernetes applications using the MemVerge Transparent Checkpoint Operator. Remember to consult the operator's logs and Kubernetes events for detailed information.