Skip to content

Policy-driven Job Migration

When enabled, WaveRider uses a rules-based policy to determine when to migrate a job to a different virtual machine.

Overview

The --migratePolicy option can be used with the float submit command to define the rules that determine when a job migrates to a new virtual machine. The policy can be added to a running job or an existing policy associated with a running job can be changed by using the float modify command. Equivalent actions are available using the web interface.

The policy works as follows. If the upper threshold for CPU or memory utilization is crossed (the two thresholds are treated independently) and utilization remains elevated for a specified interval, the job is migrated to a virtual machine that has more virtual CPUs or more memory. The increase in size is measured as a percentage defined by the step parameter. Similar behavior occurs if the lower threshold is crossed and utilization remains low for a specified interval: the job is moved to a smaller virtual machine.

If stepAuto is set to true, the manual step settings are ignored and the step size is calculated each time a threshold is crossed. The calculation depends on the specifications for the current virtual machine.

You can enable (or disable) out-of-memory (OOM) protection independently of the CPU and memory threshold triggers. If OOM protection is enabled, job migration is triggered as soon as the application touches the memory swap space.

Procedure

  1. Turn policy-driven migration on.

    • CLI: To enable policy-driven migration, use --migratePolicy [cpu.disable=false,mem.disable=false].
    • Web interface: On the Submit Job screen, go to the Start from Scratch tab, then click the WaveRider tab, and then toggle the WaveRider, CPU, and Memory settings from Off to On.
  2. If needed, override default values for policy rules.

    • CLI: Attach parameters to --migratePolicy as a string enclosed in square brackets (only include parameters that have values different from the default).

      The following parameters can be included in the string (default values listed in parentheses). If a unit is not shown, the value is a percentage of the maximum possible.

      • disable (false): if true, do not migrate
      • cpu.disable (true): if true, ignore cpu threshold crossings; if false, react to cpu threshold crossings
      • cpu.upperBoundRatio (90): upper threshold for utilization per virtual CPU (percentage)
      • cpu.lowerBoundRatio (5): lower threshold for utilization per virtual CPU (percentage)
      • cpu.upperBoundDuration (30s): time that utilization per virtual CPU must remain above the upper threshold before migration is triggered
      • cpu.lowerBoundDuration (5m0s): time that utilization per virtual CPU must remain below the lower threshold before migration is triggered
      • cpu.step (50): percentage increase (or decrease) in the number of virtual CPUs in the new virtual machine versus the original virtual machine
      • cpu.limit (use 0 for no limit): maximum number of vCPUs allowed. If a job migrates to a VM with this number of vCPUs, then migration to a VM with more vCPUs is not permitted. Migration to a VM with fewer vCPUs is permitted.
      • cpu.lowerLimit (use 0 for no limit): minimum number of vCPUs allowed. If a job migrates to a VM with this number of vCPUs, then migration to a VM with fewer vCPUs is not permitted. Migration to a VM with more vCPUs is permitted.
      • mem.disable (true): if true, ignore memory threshold crossings; if false, react to memory threshold crossings
      • mem.upperBoundRatio (90): upper threshold for memory utilization (percentage)
      • mem.lowerBoundRatio (5): lower threshold for memory utilization (percentage)
      • mem.upperBoundDuration (30s): time that memory utilization must remain above the upper threshold before migration is triggered
      • mem.lowerBoundDuration (5m0s): time that memory utilization must remain below the lower threshold before migration is triggered
      • mem.step (50): percentage increase (or decrease) in memory capacity of the new virtual machine versus the original virtual machine
      • mem.limit (use 0 for no limit): maximum memory capacity (in GB) allowed. If a job migrates to a VM with this memory capacity, then migration to a VM with more memory is not permitted. Migration to a VM with less memory is permitted.
      • mem.lowerLimit (use 0 for no limit): minimum memory capacity (in GB) allowed. If a job migrates to a VM with this memory capacity, then migration to a VM with less memory is not permitted. Migration to a VM with more memory is permitted.
      • stepAuto (false): if stepAuto is set to true, then values for cpu.step and mem.step are calculated dynamically before each migration. Overrides values set with cpu.step and mem.step.
      • evade.OOM (true): if set to true and the application touches swap space, the job migrates to a VM with more memory. The values set for cpu.limit and mem.limit are overridden. Job migration can continue until the largest VM offered by the CSP is reached.

        Note

        The limits set by cpu.limit, cpu.lowerLimit, mem.limit, and mem.lowerLimit only apply to WaveRiding events. They do not apply to the initial selection of an instance to run the job.

    • Web interface: In the Submit Job screen, click the Start from Scratch tab, click the WaveRider tab, and then change the values in fields pre-populated with default values. Out-of-memory protection and automatic calculation of the migration steps are enabled by default. Uncheck the respective box to change the setting. You can view the changes in the generated command line on the right-hand side.

  3. Submit job with auto-migration policy included.

    • CLI: Use the float submit command with the --migratePolicy option.

      Example:

      $ float submit -i python -j ./python_job_script.sh --dataVolume [size=10]:/data \
      -c 4 -m 8 --migratePolicy [mem.disable=false,mem.upperBoundRatio=60,cpu.disable=true]
      id: kNuBDaAPZ3CacpCy16heA
      name: python
      user: admin
      imageID: docker.io/bitnami/python:latest
      status: Submitted
      submitTime: "2023-08-08T01:03:21Z"
      duration: 0s
      cost: 0.0000 USD
      inputArgs: -j ./python_job_script.sh -i python --migratePolicy [mem.disable=false,mem.upperBoundRatio=60,cpu.disable=true] -m 8 -c 4 --dataVolume [size=10]:/data
      vmPolicy:
          policy: spotFirst
          retryLimit: 3
          optimize: true
          retryInterval: 10m0s
      migratePolicy:
          evadeOOM: true
          cpu:
              upperBoundRatio: 90
              lowerBoundRatio: 5
              upperBoundDuration: 30s
              lowerBoundDuration: 5m0s
              step: 50
              disable: true
          mem:
              upperBoundRatio: 60
              lowerBoundRatio: 5
              upperBoundDuration: 30s
              lowerBoundDuration: 5m0s
              step: 50
      
    • Web interface: In the Submit Job screen, click the Start from Scratch tab, and fill in the required fields in the Basic tab. In the WaveRider tab, change the default settings, for example, toggle the Memory setting to On and change Upper Bound Utilization to 60. Click Submit.

  4. Modify auto-migration policy associated with running job.

    • CLI: To modify the auto-migration policy associated with a running job or to turn auto-migration on, use float modify --migratePolicy <policy-string> -j <job_id>

      Example (change a parameter from its default value):

      $ float modify --migratePolicy [cpu.disable=false] -j kNuBDaAPZ3CacpCy16heA
      Warning: Are you sure you want to modify kNuBDaAPZ3CacpCy16heA?
      New migratePolicy may impact auto-migration behavior.(yes/No): yes
      Successfully modified kNuBDaAPZ3CacpCy16heA:  --migratePolicy [cpu.disable=false]
      
    • Web interface: Go to the Jobs screen and locate your job by ID or Name. Click the ID to open the Job Details screen. Click Modify Job (top, right-hand side). Fill out the fields in the pop-up screen (for example, toggle CPU auto-migration policy to On) and then click Modify.

  5. View a record of any migration events.

    • CLI: Use the float log cat job.events -j <job_id> command.
    • Web interface: On the Jobs screen, click your job, and then go to the Instances tab. If more than one instance is shown, the job migrated at least once. Go to the Attachments tab. Click the Preview icon next to the job.events log to see details.