Skip to content

Manual Job Migration

Overview

You can manually move a running job at any time (using the CLI or the OpCenter web interface) to change, for example, the compute platform or the pay type.

Procedure

  1. Log in to the OpCenter.

    • If you are using the web interface, enter your credentials on the landing page.
    • If you are using the web CLI shell, you are already logged in.
    • If you are using a remote terminal or a terminal session on the OpCenter server, enter the following command:

      float login -u <username> -p <password> -a <OpCenter_ip_address>
      

      Replace <username> and <password> with the login credentials, and <OpCenter_ip_address> with the OpCenter's private IP address if you are within your organization's virtual private cloud (VPC) or public IP address if you are outside the VPC.

      Note

      The OpCenter IP address is cached, so you can omit it when you log back in. After the cache expires, the IP address defaults to local host. If you get a "connection refused" error, retry with -a <OpCenter_ip_address> option included.

  2. Migrate a running job to a new virtual machine instance.

    Note

    The vmPolicy setting associated with a running job can be modified. If the new setting is incompatible with the old setting (not considering priceLimit), job migration is automatically triggered, so, for example, a job running on a Spot Instance migrates immediately to an On-demand Instance.

    • CLI: To migrate a job to a specific VM type, enter the following command:

      float migrate -t <instance_type> -j <job_id>
      

      Replace <instance_type> with a virtual machine type (for example, c5.xlarge in AWS) and <job_id> with the job identifier.

      To migrate to an instance whose capacities you define, enter the following command:

      float migrate --cpu <minCPU>:<maxCPU> –-mem <minMem>:<maxMem> -j <job_id>
      

      Replace <minCPU>:<maxCPU> with the allowable range for the number of virtual CPUs, <minMem>:<maxMem> with the allowable range for the memory size in GB, and <job_id> with the job identifier. The upper bounds for number of virtual CPUs and memory size can be omitted, in which case only the lower bounds apply.

      Note

      To show all parameters that can be modified when the job migrates, enter the following.

      float migrate -h
      

    • Web interface: Go to the Jobs screen and locate your job by ID or Name. Click the ID to open the Job Details screen. Click Migrate Job (top, right-hand side). Fill out the fields in the pop-up screen and then click Migrate.

  3. Check that the job migrates successfully.

    • CLI: Keep entering the float squeue command until the job state changes from "Floating" to "Executing."
    • Web interface: On the Jobs screen, click the Refresh button until the job state changes from "Floating" to "Executing."
  4. View a record of any migration events

    • CLI: Use the float log cat job.events -j <job_id> command.
    • Web interface: On the Jobs screen, click your job, and then go to the Instances tab. If more than one instance is shown, the job migrated at least once. Go to the Attachments tab and click the Preview icon next to the job.events log to see details.

Example

$ float migrate -f -t t3a.large -j Xx0r4CYE7X6MRmivjoITf
$
$ float log cat job.events -j Xx0r4CYE7X6MRmivjoITf
2023-04-19T14:42:13.334: Ready to migrate with spec job: lWpXdspWZcWESBMMy9Nbm, instType: t3a.large, CPU: 0:0, Memory: 0:0, zone: , payType: 
2023-04-19T14:42:13.334: Attempt to find instance type for spec InstType:t3a.large,CPU:2 ~ 0,Memory:4 ~ 0,Zone:us-east-1b,CPUVendor:AuthenticAMD,priceLimit:0,priceLimitPerc:0
2023-04-19T14:42:13.389: Determined instance params: Zone:us-east-1b,InstType:t3a.large,CPU:2,Memory:8
2023-04-19T14:42:13.389: Ready to migrate with instance type: t3a.large, cpu: 2, memory: 8, zone: us-east-1b, last instance type: t3a.medium(Spot)
2023-04-19T14:42:13.389: Ready to checkpoint host i-0d7877d1978a4db4c
2023-04-19T14:42:14.759: Checkpointed host i-0d7877d1978a4db4c, result: [container: c63784ae91f26cc4b2f8d1980f84ce14bc17ae5b8478953c4a2cb06c61e3a8b3, checkpoint file: ], duration 1.369989165s
2023-04-19T14:42:27.401: Detached volume vol-0fc557b252c9a497c from host i-0d7877d1978a4db4c
2023-04-19T14:42:33.819: Detached volume vol-0ea186ddf37c64396 from host i-0d7877d1978a4db4c
2023-04-19T14:42:40.212: Detached volume vol-05bf0027cef28aa4f from host i-0d7877d1978a4db4c
2023-04-19T14:42:40.212: Ready to create new host to recover
2023-04-19T14:42:46.999: Created instance i-06107ed1970156e8d at us-east-1b, waiting for it to initialize
2023-04-19T14:45:02.014: Mounted vol-0fc557b252c9a497c:/mnt/float-data to i-06107ed1970156e8d
2023-04-19T14:45:02.014: Mounted vol-0ea186ddf37c64396:/mnt/float-image to i-06107ed1970156e8d
2023-04-19T14:45:02.014: Mounted vol-05bf0027cef28aa4f:/data to i-06107ed1970156e8d
2023-04-19T14:45:02.015: Created new host: i-06107ed1970156e8d(Spot)
2023-04-19T14:45:02.248: Got 1 containers on host i-06107ed1970156e8d
2023-04-19T14:45:02.248: Ready to recover {ID:c63784ae91f2,Checkpointed:true,Running:false} on host i-06107ed1970156e8d
2023-04-19T14:45:02.264: Job floated to instance i-06107ed1970156e8d (2 CPU/8 GB) (Spot)
2023-04-19T14:45:02.893: Migrated to new VM: i-06107ed1970156e8d