Skip to content

OpCenter Configuration Parameters

Configurable parameters control the behavior of the OpCenter.

Introduction

The OpCenter has configuration parameters that apply to the operation of the OpCenter server or that provide the default settings for jobs submitted to the OpCenter. You can change the values for most of the configuration parameters.

You can view or change parameter values using the CLI or the web interface. For some changes to take effect, you must restart the OpCenter.

Configuration Parameters

The following table shows the OpCenter configuration parameters.

Note

Default values for parameters may differ between OpCenter releases.

KeyDefault ValueEditableRestart Required?Definition
address0.0.0.0:443YesYes Address(es) that OpCenter listens on to receive https requests. Default means all interfaces.
maxProc2YesYesMaximum number of virtual CPUs used by OpCenter processes
sessionTTL168h0m0sYesNoDuration until login token becomes invalid
sessionTimeout1h0m0sYesNoMaximum session inactivity time (minimum value allowed is 1h)
cloud.candidateInstTypeLimit16YYMaximum number of compute instance types selected that match the CPU and memory constraints
cloud.cpuCompatibleModelooseYNPolicy for checking whether CPU of new VM is compatible with the snapshot image from previous VM
cloud.createVMPolicyspotFirstYesNoPolicy that defines which VM pay type is selected first
cloud.createVMRetryInterval10m0sYesNoIf VM creation fails in one cycle of attempts, wait this interval before retrying
cloud.createVMRetryLimit3YesNoNumber of VM creation attempt cycles before VM creation is abandoned
cloud.createVolumeRetryLimit6YesNoMaximum number of attempts to create storage volume
cloud.enableCarbonEmissiontrueYesNoEnable the carbon emissions calculator
cloud.floatVolSizeLimit128YesNoWorker node memory size threshold below which OpCenter automatically creates a volume to store snapshot images
cloud.handleRebalanceMemThreshold64GYesNoThreshold memory size above which AWS Rebalance Recommendation signal triggers job migration
cloud.imageVolumeSize6YesNoSize of volume used as container root volume
cloud.imageVolumeTypegp2YesNoType of volume used as container root volume
cloud.instTypeBanTTL3hYesYesDuration for which a VM instance is quarantined after an attempt to create VM of this type fails
cloud.instTypeRetryLimit10YesNoNumber of VM instance types included in one VM creation attempt cycle
cloud.maxSpotReclaim0 (unlimited)YesNoNumber of spot reclaim events allowed for each job
cloud.miUpdateInterval1h0m0sYesNoInterval between checks for updates to the machine image repository
cloud.recreateVMRetryLimit120h0m0sYesNoMaximum time allowed for OpCenter to create a VM to restore a snapshot to a running state
cloud.reqRetryInterval1sYesYesInterval between VM instance creation attempts within a cycle
cloud.reqRetryLimit6YesYesMaximum number of VM creation attempts within one cycle
cloud.securityGroupssg-***YesNoSecurity group(s) applied to every worker node
cloud.securityRolequota-mvWorkerNodeProfile-**YesNoIAM role assigned to worker node
cloud.snapLocationlocalYesNoLocation where snapshot images are stored
cloud.swapFileSize4GYesNoSize of memory swap space configured for each worker node
cloud.vmInitTimeout20m0sYesNoMaximum time allowed to create VM
gui.autoRefreshInterval5m0sYesNoInterval between automatic refreshes of OpCenter web interface display
history.enabledtrueYesNoFlag to enable (or disable) the service that compiles a history of job metadata
image.cachePaths3://opcenter-bucket-***/imagesYesNoLocation where container images are cached
image.imageUpdateInterval10mYesYesInterval between refreshes of the container image library
license.licenseCheckInterval30m0sYesYesInterval between checks of license status
license.licenseServerhttps://license.memverge.comYesNo URL to access MemVerge license server
log.file/var/log/memverge/opcenter.logNoNAPath to OpCenter log file (on OpCenter server)
log.hostLogRetainTime168h0m0sYesNoMaximum age of any host log (older logs are automatically deleted)
log.levelinfoYesNoLinux-style log level for recording OpCenter events
log.maxBackups10YesNoMaximum number of logs of each type
log.maxSize10YesNoMaximum size of each log
metrics.ocMetricsInterval10sYesNoInterval between updates to the OpCenter metrics
ocMetricsRetention2160h0m0sYesYesMaximum age of OpCenter metric files (files older than this value are deleted)
migrate.cpuDisabletrueYesNoOption to disable (or enable if set to false) WaveRider based on CPU utilization
migrate.cpuLimit0YesNoUpper limit on the number of virtual CPUs when migrating to a larger CPU (0 means no limit)
migrate.cpuLowerBoundDuration5m0sYesNoTime that CPU utilization must remain below the lower threshold for CPU utilization to trigger job migration to a smaller CPU
migrate.cpuLowerBoundRatio5YesNoLower threshold (measured as a percentage of the maximum utilization) for CPU utilization
migrate.cpuLowerLimit0YesNoLower limit on the number of virtual CPUs when migrating to a smaller CPU (0 means no limit)
migrate.cpuMigrateStep50YesNoPercentage increase (or decrease) in the number of virtual CPUs when migrating to a larger (or smaller) CPU
migrate.cpuUpperBoundDuration2m0sYesNoTime that CPU utilization must remain above the upper threshold for CPU utilization to trigger job migration to a larger CPU
migrate.cpuUpperBoundRatio90YesNoUpper threshold (measured as a percentage of the maximum utilization) for CPU utilization
migrate.createVMFirsttrueYesNoOption to create new VM instance before capturing snapshot. Setting this to false means that the snapshot is captured before the new VM instance is created.
migrate.diskReadyTimeout10m0sYesNoMaximum time allowed to attach a volume to store snapshot images in cases where the snapshot volume is not created automatically when the job starts
migrate.enableAutoMigratetrueYesNoOption to turn WaveRider on (or off)
migrate.evadeOOMtrueYesNoOption to turn out-of-memory (OOM) protection on (or off). OOM protection means that any use of memory swap space triggers a job migration to a VM with more memory.
migrate.memDisabletrueYesNoOption to ignore (true) or respond to (false) memory utilization when evaluating whether to migrate job
migrate.memLimit0YesNoUpper limit on memory size when migrating to a VM with more memory (0 means no limit)
migrate.memLowerBoundDuration5m0sYesNoTime that memory utilization must remain below the lower threshold for memory utilization to trigger job migration to a VM with less memory
migrate.memLowerBoundRatio5YesNoLower threshold (measured as a percentage of the maximum utilization) for memory utilization
migrate.memLowerLimit0YesNoLower limit on memory size when migrating to a VM with less memory (0 means no limit)
migrate.memMigrateStep50YesNoPercentage increase (or decrease) in memory size when migrating to a VM with more (or less) memory
migrate.memUpperBoundDuration2m0sYesNoTime that memory utilization must remain above the upper threshold for memory utilization to trigger job migration to a VM with more memory
migrate.memUpperBoundRatio90YesNoUpper threshold (measured as a percentage of the maximum utilization) for memory utilization
migrate.oomCheckpointTimeout1h0m0sYesNoMaximum time allowed to capture a memory snapshot in cases where OOM protection triggers job migration
migrate.oomNoInstanceTypePolicyYesNoAction taken when OOM protection is triggered and no suitable VM instance is found to migrate to. An example action is "autoSuspend".
migrate.stepAutotrueYesNoAutomatically calculate the step size (in the number of virtual CPUs or memory size) when migrating to a larger (or smaller) VM
provider.allowList[*]YesNoList of VM instance types that specifies which instances are allowed when creating a new VM
provider.denyList[ ]YesNoList of VM instance types that specifies which instances are NOT allowed when creating a new VM
provider.gpuNameAllowList[h100 v100 a100 t4 t4g m60 a10g]YesNoList of GPU types that specifies which instances are allowed when creating a new VM
provider.gpuVendorAllowList[nvidia]YesNoList of GPU vendors that specifies which instances are allowed when creating a new VM
quota.autoResumetrueYesNoAction applied to a job, suspended because quota limit reached, after the quota is replenished.
quota.calcInterval1h0m0sYesNoInterval between checks of the current job cost against the quota limit
quota.coldSuspendfalseYesNoType of suspend mode applied when quota limit reached or exceeded
quota.notifyThreshold80YesNoThreshold (measured as a percentage of the quota limit) that triggers an alert to users that quota limit approaching
quota.overageActioncancelYesNoAction applied to job when quota limit reached or exceeded
report.updateInterval1h0m0sYesNoInterval between reports of job usage metrics (core hours) to the license server
scheduler.cloudParamsTTL 3mYesNoLifetime of cache that stores the mapping of job parameters (cpu and memory) to VM instance type (used to create VM when job state changes from "submitted" to "initializing")
scheduler.defaultDumpModefullYesNoType of memory snapshot (full or incremental)
scheduler.dirtyPageCheckInterval10sYesNoInterval between checks of the dirty memory page count (dirty pages are pages whose content has changed)
scheduler.dirtyPageThreshold9GYesNoThreshold (determined by aggregate size of dirty memory pages) that triggers an incremental memory snapshot
scheduler.enableResourceCleanuptrueYesNoEnable resource clean-up service (checks that all resources associated with completed, failed or canceled jobs are deleted)
scheduler.executorPollInterval10msYesNoInterval between checks of the OpCenter executor status to ensure that queued jobs can be processed in time
scheduler.extWorkPathYesNoURI that identifies path to external jobs
scheduler.jobArchiveInterval168h0m0sYesNoMaximum age of jobs in the "normal" state. Jobs older than this are in the "archive" state.
scheduler.jobCleanupInterval1m0sYesNoInterval between runs of the job resource clean-up service
scheduler.jobCloudParamsCacheTTL1hYesNoLifetime of cache that stores verified job parameters
scheduler.jobExecutorLimit128YesNoMaximum number of jobs processed in parallel
scheduler.jobOptimizeInterval10m0sYesNoInterval between attempts to migrate a job from an on-demand instance to a spot instance
scheduler.jobTTL8640h0m0sYesNoMaximum duration allowed for any job
scheduler.jobUpdateInterval10sYesNoInterval between checks of job status
scheduler.resourceCleanupInterval24h0m0sYesNoInterval between runs of the OpCenter resource clean-up service
scheduler.workPath/mnt/memverge/slurm/workNoNoPath to NFS-shared directory required by slurm scheduler
security.cacheTTL1m0sYesYesLifetime of cache holding authentication tokens
security.certificateFolder/etc/memverge/certsYesYesPath to folder where security certificates are stored
security.inlineUidBoundary0YesNoOffset applied to user UID when mapping user UID on OpCenter to user UID on work node
security.persistTokenfalseYesNoAction applied to login authentication tokens when OpCenter restarts. Persist means tokens are saved.
storage.updateInterval1hYesNoDuration after which an inactive file system based on a registered storage service is unmounted by OpCenter
template.templateSyncInterval24h0m0sYesNoInterval between synchronization checks between OpCenter and MemVerge template repository
template.templateUris3://mmce-data/templates-productionYesNoLocation of MemVerge template repository
upgrade.cacheFolder/tmp/opcenter_buildsYesNoPath to stage new release (and associated metadata) before upgrading
upgrade.checkInterval1h0m0sYesNoInterval between checks for new OpCenter releases
upgrade.cloudStorePaths3://opcenter-bucket-***YesYesLocation where OpCenter upgrade package is cached so it can be downloaded by worker nodes
upgrade.releaseUris3://float-packageYesNoLocation where available float releases are stored
workflow.updateInterval5sYesNoInterval between updates to the workflow view displayed in the OpCenter web interface