Running a Batch Job

The Memory Machine Cloud Edition web interface and CLI have a rich set of options that allow you to customize the container runtime environment. The command strings shown here are simplified.

Procedure

  1. Log in to the OpCenter.
    • If you are using the web interface, enter your credentials on the landing page.
    • If you are using the web CLI, you are already logged in.
    • If you are using a remote terminal or a terminal session on the OpCenter server, enter the following command:
      float login -u <username> -p <password> -a <OpCenter_ip_address>
      where <username> and <password> are login credentials, and <OpCenter_ip_address> is the OpCenter's private IP address if you are within your organization's virtual private cloud (VPC), or public IP address if you are outside the VPC.
      Note: The OpCenter IP address is cached, so you can omit it when you log back in. After the cache expires, the IP address defaults to local host. If you get a "connection refused" error, retry with -a <OpCenter_ip_address> option included.
  2. Display the images that are available in the OpCenter library.
    • CLI: Enter the following command:
      float image list
    • Web interface: Select Images from the side panel.
  3. If the required image is not available in the OpCenter library, upload the image.
    • CLI: If the image is available from a public or private repository, enter the following command:
      float image add <image_name> <image_URI> --user <repo_access_user> --token <repo_access_token>
      where <repo_access_user> and <repo_access_token> are the credentials for accessing the repository if it is private (obtain these from the repository owner). For example:
      float image add python docker.io/bitnami/python
      If you are using the CLI in a terminal session, you can upload an image from a local directory by entering the following command:
      float image upload <image_name> --path </path/to/image>
      where </path/to/image> is the path to the directory where the image tar file is located on your local machine.
    • Web interface: If the image is available from a public or private repository, on the Images screen, select Private > Add Image. Fill in the pop-up form and click on Add.
  4. Prepare the job script for the workload.
    Note: The job script can be a local file, but there are other access options, such as from an S3 bucket or from a web server.
  5. Submit a job to the OpCenter.
    • CLI: Enter the following command:
      float sbatch -i <image_name> -j <job_script> --cpu <num_cpu> –-mem <mem_size> –-dataVolume [size=<vol_size>]:/<mnt_point>
      where:
      • <image_name> is the docker image to run the job
      • <job_script> is the job script to execute (include the complete path to the job script file)
      • <num_cpu> is the minimum number of virtual CPUs to use (can also specify as a range in form min:max)
      • <mem_size> is minimum memory capacity to use in GB (can also specify as a range in form min:max)
      • <vol_size> is the capacity of the data directory (in GB)
      • <mnt_point> is the mount point for the data directory.
      Example:
      float sbatch -i python -j ./python_job_script.sh --cpu 4 –-mem 8 –-dataVolume [size=10]:/data 
      Alternatively, you can use a definition file in conjunction with the float sbatch command. For example, to submit the same job, enter:
      float sbatch -d ./def1.yaml
      where def1.yaml is a file with the following contents:
      image: python
      job: python_job_script.sh 
      cpu: 4
      mem: 8
      dataVolume:
       -  "[size=10]:/data"
    • Web interface: Click on Submit Job (left-hand panel), fill in the fields in the pop-up form, and then click on Submit.

      As you fill in the fields, a CLI command string is generated (see the right-hand panel), which helps in understanding what effect the information in the fields has.

  6. Check job status.
    • CLI: Enter the following command:
      float squeue
      The first column shows the unique job identifier associated with each job.
    • Web interface: Click on Jobs (left-hand panel).
  7. Display detailed information on the job you submitted.
    • CLI: Enter the following command:
      float show -j <job_id>
      where <job_id> is the unique job identifier shown in the previous step.
    • Web interface: Click on Jobs (left-hand panel) and then click on the ID associated with your job (identify your job by ID or name). A screen entitled Job Details - <job_name> is displayed.
  8. Display the logs associated with a job.
    • CLI: Enter the following command:
      float log ls <job_id>

      where <job_id> is the job identifier.

    • Web interface: On the Job Details - <job_name> screen, click on the tab entitled Attachments.
  9. View log file contents.
    • CLI: Enter the following command:
      float log tail --follow <log_file_name> -j <job_id>

      where <log_file_name> is the name of the log file you specified in the job script and <job_id> is the job identifier.

      Note: As the job runs, logs are written to a directory mounted from the OpCenter server. When the job ends, the logs, for example, stderr and stdout, are automatically saved by the OpCenter as stderr.autosave and stdout.autosave, respectively.
    • Web interface: On the Job Details - <job_name> screen, click on the tab entitled Attachments and then click on the Preview icon next to the log you want to view. Click on the Refresh button to update the display.
  10. Modify parameters (security groups, migration policy, VM creation policy, or periodic snapshots) associated with a running a job.
    • CLI: Enter the following command.
      float modify -j <job_id> --<option> <option_string>
      where <option> is one of the following:
      • addSecurityGroup (<option_string> is the identifier of security group to add)
      • rmSecurityGroup (<option_string> is the identifier of the security group to remove)
      • migratePolicy (<option_string> is the new migration policy to apply)
      • snapshotInterval (<option_string> is the new periodic snapshot interval to apply)
      • vmPolicy (<option_string> is the new VM creation policy to apply)
    • Web interface: On the Jobs screen, identify your job by ID or name. Under the Actions column, click on the Modify Job icon associated with your job. In the dialog box, fill in the new parameter values to apply to the job and then click on Modify.
  11. View contents of submitted job file (recent or archived jobs).
    • CLI: Enter the following command:
      float show -c -j <job_id>
    • Web interface: Not supported.
  12. Cancel a running a job.
    • CLI: Enter the following command:
      float scancel -j <job_id>
    • Web interface: On the Jobs screen, identify your job by ID or name. Under the Actions column, click on the Cancel icon associated with your job.

What to do next

If the job runs on a Spot Instance that is reclaimed, the job will "float" to a new Spot Instance and continue running. After the job has run to completion, retrieve your results from the location you specified in the job script.