Manual Job Migration

Running jobs can be moved manually at any time using the CLI, for example, to change the compute platform or the pay type.

Procedure

  1. Log in to the OpCenter.
    • If you are using the web interface, enter your credentials on the landing page.
    • If you are using the web CLI, you are already logged in.
    • If you are using a remote terminal or a terminal session on the OpCenter server, enter the following command:
      float login -u <username> -p <password> -a <OpCenter_ip_address>
      where <username> and <password> are login credentials, and <OpCenter_ip_address> is the OpCenter's private IP address if you are within your organization's virtual private cloud (VPC), or public IP address if you are outside the VPC.
      Note: The OpCenter IP address is cached, so you can omit it when you log back in. After the cache expires, the IP address defaults to local host. If you get a "connection refused" error, retry with -a <OpCenter_ip_address> option included.
  2. Migrate a running job to a new virtual machine instance.
    Note: The vmPolicy option associated with a running job can be modified. If the new policy is incompatible with the old policy (not considering priceLimit), then job migration is automatically triggered, so, for example, a job running on a Spot Instance migrates immediately to an On-demand Instance.
    • CLI: To migrate a job to a specific VM type, enter the following command:
      float migrate -t <instance_type> -j <job_id>
      where <instance_type> is a virtual machine type (for example, c4.xlarge in AWS) and <job_id> is the job identifier.

      To migrate to an instance whose capacities you define, enter the following command:

      float migrate --cpu <minCPU>:<maxCPU> –-mem <minMem>:<maxMem> -j <job_id>
      where <minCPU>:<maxCPU> is the allowable range for the number of virtual CPUs, <minMem>:<maxMem> is the allowable range for the memory size in GB, and <job_id> is the job identifier. The upper bounds for number of virtual CPUs and memory size can be omitted in which case only the lower bounds apply.
    • Web interface: Go to the Jobs screen and locate your job by ID or Name. Under the Actions column, click on the Migrate Jobicon. Fill out the fields in the pop-up screen and then click on Migrate.
  3. Check whether the job migrates successfully.
    • CLI: Keep entering the float squeue command until the job state changes from "Floating" to "Executing."
    • Web interface: On the Jobs screen, click on the Refresh button until the job state changes from "Floating" to "Executing."
  4. View a record of any migration events
    • CLI: Use the float log cat job.events -j <job_id> command.
    • Web interface: On the Jobs screen, click on your job, and then go to the Attachments tab. Click on the Preview icon next to the job.events log.
    Example of manual migration:
    float migrate -f -t t3a.large -j Xx0r4CYE7X6MRmivjoITf
    float log cat job.events -j Xx0r4CYE7X6MRmivjoITf
    2023-04-19T14:42:13.334: Ready to migrate with spec job: lWpXdspWZcWESBMMy9Nbm, instType: t3a.large, CPU: 0:0, Memory: 0:0, zone: , payType: 
    2023-04-19T14:42:13.334: Attempt to find instance type for spec InstType:t3a.large,CPU:2 ~ 0,Memory:4 ~ 0,Zone:us-east-1b,CPUVendor:AuthenticAMD,priceLimit:0,priceLimitPerc:0
    2023-04-19T14:42:13.389: Determined instance params: Zone:us-east-1b,InstType:t3a.large,CPU:2,Memory:8
    2023-04-19T14:42:13.389: Ready to migrate with instance type: t3a.large, cpu: 2, memory: 8, zone: us-east-1b, last instance type: t3a.medium(Spot)
    2023-04-19T14:42:13.389: Ready to checkpoint host i-0d7877d1978a4db4c
    2023-04-19T14:42:14.759: Checkpointed host i-0d7877d1978a4db4c, result: [container: c63784ae91f26cc4b2f8d1980f84ce14bc17ae5b8478953c4a2cb06c61e3a8b3, checkpoint file: ], duration 1.369989165s
    2023-04-19T14:42:27.401: Detached volume vol-0fc557b252c9a497c from host i-0d7877d1978a4db4c
    2023-04-19T14:42:33.819: Detached volume vol-0ea186ddf37c64396 from host i-0d7877d1978a4db4c
    2023-04-19T14:42:40.212: Detached volume vol-05bf0027cef28aa4f from host i-0d7877d1978a4db4c
    2023-04-19T14:42:40.212: Ready to create new host to recover
    2023-04-19T14:42:46.999: Created instance i-06107ed1970156e8d at us-east-1b, waiting for it to initialize
    2023-04-19T14:45:02.014: Mounted vol-0fc557b252c9a497c:/mnt/float-data to i-06107ed1970156e8d
    2023-04-19T14:45:02.014: Mounted vol-0ea186ddf37c64396:/mnt/float-image to i-06107ed1970156e8d
    2023-04-19T14:45:02.014: Mounted vol-05bf0027cef28aa4f:/data to i-06107ed1970156e8d
    2023-04-19T14:45:02.015: Created new host: i-06107ed1970156e8d(Spot)
    2023-04-19T14:45:02.248: Got 1 containers on host i-06107ed1970156e8d
    2023-04-19T14:45:02.248: Ready to recover {ID:c63784ae91f2,Checkpointed:true,Running:false} on host i-06107ed1970156e8d
    2023-04-19T14:45:02.264: Job floated to instance i-06107ed1970156e8d (2 CPU/8 GB) (Spot)
    2023-04-19T14:45:02.893: Migrated to new VM: i-06107ed1970156e8d