OpCenter Configuration Parameters

Configurable parameters control the behavior of the OpCenter.

Introduction

The OpCenter has configuration parameters that apply to the operation of the OpCenter server or that provide the default settings for jobs submitted to the OpCenter. You can change the values for most of the configuration parameters.

You can view or change parameter values using the CLI or the web interface. For some changes to take effect, you must restart the OpCenter.

Configuration Parameters

The following table shows the OpCenter configuration parameters.

Note

Default values for parameters may differ between OpCenter releases.

Key	Default Value	Editable	Restart Required?	Definition
address	0.0.0.0:443	Yes	Yes	Address(es) that OpCenter listens on to receive https requests. Default means all interfaces.
maxProc	2	Yes	Yes	Maximum number of virtual CPUs used by OpCenter processes
sessionTTL	168h0m0s	Yes	No	Duration until login token becomes invalid
sessionTimeout	1h0m0s	Yes	No	Maximum session inactivity time (minimum value allowed is 1h)
cloud.candidateInstTypeLimit	16	Yes	Yes	Maximum number of compute instance types selected that match the CPU and memory constraints
cloud.cpuCompatibleMode	loose	Yes	No	Policy for checking whether CPU of new VM is compatible with the snapshot image from previous VM
cloud.createVMPolicy	spotFirst	Yes	No	Policy that defines which VM pay type is selected first
cloud.createVMPriceLimit	0	Yes	No	Maximum hourly rate above which VM is not a candidate for the job
cloud.createVMPriceLimitPercent	0	Yes	No	Maximum spot instance hourly rate, measured as a percentage of the equivalent on-demand hourly rate, above which VM is not a candidate for the job
cloud.createVMRetryInterval	10m0s	Yes	No	If VM creation fails in one cycle of attempts, wait this interval before retrying
cloud.createVMRetryLimit	1	Yes	No	Number of VM creation attempt cycles before VM creation is abandoned
cloud.createVolumeRetryLimit	6	Yes	No	Maximum number of attempts to create storage volume
cloud.enableCarbonEmission	true	Yes	No	Enable the carbon emissions calculator
cloud.floatVolSizeLimit	128	Yes	No	Worker node memory size threshold below which OpCenter automatically creates a volume to store snapshot images
cloud.handleRebalanceMemThreshold	64G	Yes	No	Threshold memory size above which AWS Rebalance Recommendation signal triggers job migration
cloud.imageVolumeSize	6	Yes	No	Size of volume used as container root volume
cloud.imageVolumeType	gp2	Yes	No	Type of volume used as container root volume
cloud.instTypeBanTTL	3h	Yes	Yes	Duration for which a VM instance is quarantined after an attempt to create VM of this type fails
cloud.instTypeCachePath		Yes	No	File path (local or in the cloud) to store instance type information. Use in cases where the OpCenter is deployed in a private VPC and cannot query the cloud API to get instance types. In such a case, the OpCenter retrieves instance type information from the cache.
cloud.instTypeOrderMethod	price	Yes	Yes	Criterion used to order candidate instance types
cloud.instTypeRetryLimit	10	Yes	No	Number of VM instance types included in one VM creation attempt cycle
cloud.interruptionPoss	true	Yes	No	When set to true, check the likelihood of spot instance reclaim when determining candidate instance types. (Not used currently.)
cloud.maxSpotReclaim	3	Yes	No	Number of spot reclaim events allowed for each job until job moves on-demand instance
cloud.miUpdateInterval	1h0m0s	Yes	No	Interval between checks for updates to the machine image repository
cloud.nameserver		Yes	No	IP address of domain name server (use to override the default domain name server)
cloud.recreateVMRetryLimit	120h0m0s	Yes	No	Maximum time allowed for OpCenter to create a VM to restore a snapshot to a running state
cloud.reqRetryInterval	1s	Yes	Yes	Interval between VM instance creation attempts within a cycle
cloud.reqRetryLimit	6	Yes	Yes	Maximum number of VM creation attempts within one cycle
cloud.securityGroups	sg-***	Yes	No	Security group(s) applied to every worker node
cloud.securityRole	[OpCenter_name]-mvWorkerNodeProfile-**	Yes	No	IAM role assigned to worker node
cloud.snapLocation	local	Yes	No	Location where snapshot images are stored
cloud.snapSkipOpenFileList		Yes	No	On checkpoint or restore, skip file size checks on files in this list (wildcards supported)
cloud.subnetIPCountLowerLimit	5	Yes	Yes	Lower limit of number of available IP addresses in a subnet. If subnet has fewer available IP addresses than this limit, subnet ignored.
cloud.subnetList	[ ]	Yes	No	List of subnets in which to create VMs for jobs
cloud.swapDurationOnOOM	0s	Yes	No	Time threshold to trigger OOM migration after swap space usage passes threshold
cloud.swapFileSize	4G	Yes	No	Size of swap space configured for each worker node
cloud.swapUsageOnOOM	0.5	Yes	No	Amount of swap space (measured as a fraction of the total swap space) that must be used before starting OOM duration counter. If both thresholds crossed, OOM migration triggered.
cloud.swapVolSizeLimit	16	Yes	No	Maximum swap capacity (in GB) above which a dedicated volume for swap space created
cloud.swapVolType	gp3	Yes	No	If a dedicated swap space volume created, type of volume used
cloud.vmInitTimeout	20m0s	Yes	No	Maximum time allowed to create VM
gui.autoRefreshInterval	5m0s	Yes	No	Interval between automatic refreshes of OpCenter web interface display
gui.defaultJobFilterUpdate	2016h0m0s	Yes	No	Default filter applied to job listings in the Jobs dashboard is `update<=duration` where `duration` is specified by `gui.defaultJobFilterUpdate`
history.enabled	true	Yes	No	Flag to enable (or disable) the service that compiles a history of job metadata
image.cachePath	file:///mnt/memverge/image	Yes	No	Location where container images are cached
image.defaultFilter		Yes	No	Default filter applied to listing of container images, for example, "category=data_science"
image.imageUpdateInterval	10m	Yes	Yes	Interval between refreshes of the container image library
license.licenseCheckInterval	30m0s	Yes	Yes	Interval between checks of license status
license.licenseServer	https://license.memverge.com	Yes	No	URL to access MemVerge license server
log.file	/var/log/memverge/opcenter.log	No	NA	Path to OpCenter log file (on OpCenter server)
log.hostLogRetainTime	168h0m0s	Yes	No	Maximum age of any host log (older logs are automatically deleted)
log.level	info	Yes	No	Linux-style log level for recording OpCenter events
log.logPruneFreeSpaceRatio	0.4	Yes	No	If set to true, log files pruned when minimum free space ratio crosses threshold on disk supporting logs (`/mnt/memverge`)
log.logPruneMinSpaceRatio	3	Yes	No	Minimum free disk space capacity (in GB) that triggers log pruning
log.maxBackups	10	Yes	No	Maximum number of logs of each type
log.maxSize	10	Yes	No	Maximum size of each log (in GB)
metrics.ocMetricsInterval	10s	Yes	No	Interval between updates to the OpCenter metrics
metrics.ocMetricsRetention	2160h0m0s	Yes	Yes	Maximum age of OpCenter metric files (files older than this value are deleted)
migrate.abortUnderOOMKiller	false	Yes	No	If set to false, OpCenter ignores OOM scores assigned by linux kernel. If set to true, OpCenter kills jobs with high OOM scores rather than migrate because of OOM trigger.
migrate.cpuDisable	true	Yes	No	Option to disable (or enable if set to false) WaveRider based on CPU utilization
migrate.cpuLimit	0	Yes	No	Upper limit on the number of virtual CPUs when migrating to a larger CPU (0 means no limit)
migrate.cpuLowerBoundDuration	5m0s	Yes	No	Time that CPU utilization must remain below the lower threshold for CPU utilization to trigger job migration to a smaller CPU
migrate.cpuLowerBoundRatio	5	Yes	No	Lower threshold (measured as a percentage of the maximum utilization) for CPU utilization
migrate.cpuLowerLimit	0	Yes	No	Lower limit on the number of virtual CPUs when migrating to a smaller CPU (0 means no limit)
migrate.cpuMigrateStep	50	Yes	No	Percentage increase (or decrease) in the number of virtual CPUs when migrating to a larger (or smaller) CPU
migrate.cpuUpperBoundDuration	2m0s	Yes	No	Time that CPU utilization must remain above the upper threshold for CPU utilization to trigger job migration to a larger CPU
migrate.cpuUpperBoundRatio	90	Yes	No	Upper threshold (measured as a percentage of the maximum utilization) for CPU utilization
migrate.createVMFirst	true	Yes	No	Option to create new VM instance before capturing snapshot. Setting this to false means that the snapshot is captured before the new VM instance is created.
migrate.diskReadyTimeout	10m0s	Yes	No	Maximum time allowed to attach a volume to store snapshot images in cases where the snapshot volume is not created automatically when the job starts
migrate.enableAutoMigrate	true	Yes	No	Option to turn WaveRider on (or off)
migrate.evadeOOM	true	Yes	No	Option to turn out-of-memory (OOM) protection on (or off). OOM protection means that use of memory swap space triggers a job migration to a VM with more memory.
migrate.incompatibleInstTypeRetryLimit	16	Yes	No	Maximum number of attempts to create a compatible VM instance when migrating a job
migrate.memDisable	true	Yes	No	Option to ignore (true) or respond to (false) memory utilization when evaluating whether to migrate job
migrate.memLimit	0	Yes	No	Upper limit on memory size when migrating to a VM with more memory (0 means no limit)
migrate.memLowerBoundDuration	5m0s	Yes	No	Time that memory utilization must remain below the lower threshold for memory utilization to trigger job migration to a VM with less memory
migrate.memLowerBoundRatio	5	Yes	No	Lower threshold (measured as a percentage of the maximum utilization) for memory utilization
migrate.memLowerLimit	0	Yes	No	Lower limit on memory size when migrating to a VM with less memory (0 means no limit)
migrate.memMigrateStep	50	Yes	No	Percentage increase (or decrease) in memory size when migrating to a VM with more (or less) memory
migrate.memUpperBoundDuration	2m0s	Yes	No	Time that memory utilization must remain above the upper threshold for memory utilization to trigger job migration to a VM with more memory
migrate.memUpperBoundRatio	90	Yes	No	Upper threshold (measured as a percentage of the maximum utilization) for memory utilization
migrate.oomCheckpointTimeout	1h0m0s	Yes	No	Maximum time allowed to capture a memory snapshot in cases where OOM protection triggers job migration
migrate.oomNoInstanceTypePolicy		Yes	No	Action taken when OOM protection is triggered and no suitable VM instance is found to migrate to. An example action is "autoSuspend".
migrate.optimizeCost	false	Yes	No	If set to true, enable cost optimization policy.
migrate.optimizeThreshold	0.9	Yes	No	Migrate job to a new instance if current instance cost is more than `migrate.optimizeThreshold` (measured in $ per hour)
migrate.stepAuto	true	Yes	No	Automatically calculate the step size (in the number of virtual CPUs or memory size) when migrating to a larger (or smaller) VM
provider.allowList	[*]	Yes	No	List of VM instance types that specifies which instances are allowed when creating a new VM
provider.denyList	[ ]	Yes	No	List of VM instance types that specifies which instances are NOT allowed when creating a new VM
provider.gpuNameAllowList	[h100 v100 a100 t4 t4g m60 a10g]	Yes	No	List of GPU types that specifies which instances are allowed when creating a new VM
provider.gpuVendorAllowList	[nvidia]	Yes	No	List of GPU vendors that specifies which instances are allowed when creating a new VM
quota.autoResume	true	Yes	No	Action applied to a job, suspended because quota limit reached, after the quota is replenished.
quota.calcInterval	1h0m0s	Yes	No	Interval between checks of the current job cost against the quota limit
quota.coldSuspend	false	Yes	No	Type of suspend mode applied when quota limit reached or exceeded
quota.notifyThreshold	80	Yes	No	Threshold (measured as a percentage of the quota limit) that triggers an alert to users that quota limit approaching
quota.overageAction	cancel	Yes	No	Action applied to job when quota limit reached or exceeded
report.customerDefFile		Yes	Yes	Path to file that defines customer-specific ratios used to calculate customer bill. Ratios are: scheduler, compute, and storage.
report.externalCostFolder		Yes	Yes	Location of information included in external cost in customer bills
report.timeDiff	0s	Yes	No	Adjustment to UTC to produce reports specific to customer's time zone
report.updateInterval	1h0m0s	Yes	No	Interval between reports of job usage metrics (core hours) to the license server
scheduler.cloudParamsTTL	3m	Yes	No	Lifetime of cache that stores the mapping of job parameters (cpu and memory) to VM instance type (used to create VM when job state changes from "submitted" to "initializing")
scheduler.defaultDumpMode	full	Yes	No	Type of memory snapshot (full or incremental)
scheduler.dirtyPageCheckInterval	10s	Yes	No	Interval between checks of the dirty memory page count (dirty pages are pages whose content has changed)
scheduler.dirtyPageThreshold	9G	Yes	No	Threshold (determined by aggregate size of dirty memory pages) that triggers an incremental memory snapshot
scheduler.enableResourceCleanup	true	Yes	No	Enable resource clean-up service (checks that all resources associated with completed, failed or canceled jobs are deleted)
scheduler.executorPollInterval	10ms	Yes	No	Interval between checks of the OpCenter executor status to ensure that queued jobs can be processed in time
scheduler.extWorkPath		Yes	No	URI that identifies path to external jobs
scheduler.jobArchiveInterval	30m0s	Yes	No	Maximum age of jobs in the "normal" state. Jobs older than this are in the "archive" state.
scheduler.jobCleanupInterval	1m0s	Yes	No	Interval between runs of the job resource clean-up service
scheduler.jobCloudParamsCacheTTL	1h	Yes	No	Lifetime of cache that stores verified job parameters
scheduler.jobExecutorLimit	128	Yes	No	Maximum number of jobs processed in parallel
scheduler.jobOptimizeInterval	10m0s	Yes	No	Interval between attempts to migrate a job from an on-demand instance to a spot instance
scheduler.jobTTL	8640h0m0s	Yes	No	Maximum duration allowed for any job
scheduler.jobUpdateInterval	10s	Yes	No	Interval between checks of job status
scheduler.resourceCleanupInterval	24h0m0s	Yes	No	Interval between runs of the OpCenter resource clean-up service
scheduler.workPath	/mnt/memverge/slurm/work	No	No	Path to NFS-shared directory required by slurm scheduler
security.cacheTTL	1m0s	Yes	Yes	Lifetime of cache holding authentication tokens
security.certificateFolder	/etc/memverge/certs	Yes	Yes	Path to folder where security certificates are stored
security.inlineUidBoundary	0	Yes	No	Offset applied to user UID when mapping user UID on OpCenter to user UID on work node
security.persistToken	false	Yes	No	Action applied to login authentication tokens when OpCenter restarts. Persist means tokens are saved.
storage.updateInterval	1h	Yes	No	Duration after which an inactive file system based on a registered storage service is unmounted by OpCenter
template.templateSyncInterval	24h0m0s	Yes	No	Interval between synchronization checks between OpCenter and MemVerge template repository
template.templateUri	s3://mmce-data/templates-production	Yes	No	Location of MemVerge template repository
upgrade.cacheFolder	/tmp/opcenter_builds	Yes	No	Path to stage new release (and associated metadata) before upgrading
upgrade.checkInterval	1h0m0s	Yes	No	Interval between checks for new OpCenter releases
upgrade.cloudStorePath	s3://opcenter-bucket-***	Yes	Yes	Location where OpCenter upgrade package is cached so it can be downloaded by worker nodes
upgrade.releaseUri	s3://float-package	Yes	No	Location where available float releases are stored
workflow.updateInterval	5s	Yes	No	Interval between updates to the workflow view displayed in the OpCenter web interface