Running a Batch Job

The Memory Machine Cloud web interface and CLI have a rich set of options that allow you to customize the container runtime environment.

Procedure

  1. Log in to the OpCenter.
    • If you use the web interface, enter your credentials on the landing page.
    • If you use the web CLI shell, you are already logged in.
    • If you use a remote terminal or a terminal session on the OpCenter server, enter the following command:
      float login -u <username> -p <password> -a <OpCenter_ip_address>
      where <username> and <password> are login credentials, and <OpCenter_ip_address> is the OpCenter's private IP address if you are within your organization's virtual private cloud (VPC), or public IP address if you are outside the VPC.
      Note: The OpCenter IP address is cached, so you can omit it when you log back in. After the cache expires, the IP address defaults to local host. If you get a "connection refused" error, retry with -a <OpCenter_ip_address> option included.
  2. Display the images that are available in the OpCenter library.
    • CLI: Enter the following command:
      float image list
    • Web interface: Click App Library from the side panel.
  3. If the required image is not available in the OpCenter library, upload the image.
    • CLI: If the image is available from a public or private repository, enter the following command:
      float image add <image_name> <image_URI> --user <repo_access_user> --token <repo_access_token>
      where <repo_access_user> and <repo_access_token> are the credentials for accessing the repository if it is private (obtain these from the repository owner). For example:
      float image add python docker.io/bitnami/python
      Note: When using the CLI, you can skip loading the image into the OpCenter. If you specify the image URI in the float submit command, the OpCenter automatically pulls the image (see step 5).
      You can upload an image from a local directory by entering the following CLI command in a local terminal window.
      float image upload <image_name> --path </path/to/image>
      where </path/to/image> is the path to the directory where the image tar file is located on your local machine.
    • Web interface: If the image is available from a public or private repository, on the App Library screen, select Private > Add Image. Fill in the pop-up form and click Add.

      To upload an image from a local directory, click Upload Image and follow the instructions in the pop-up window.

      Note: Jobs submitted using the web interface must use images from the App Library (either Built-in or Private).
  4. Prepare the job script for the workload.
    Note: The job script can be a local file, but there are other access options, such as a file from an S3 bucket or from a web server.
  5. Submit a job to the OpCenter.
    • CLI: Enter the following command:
      float sbatch -i <image_name>|<image_uri> -j <job_script> --cpu <num_cpu> –-mem <mem_size> –-dataVolume [size=<vol_size>]:/<mnt_point>
      where:
      • <image_name> is the docker image to run the job
      • <image_uri> is the URI to pull the image to run the job
      • <job_script> is the job script to execute (include the complete path to the job script file)
      • <num_cpu> is the minimum number of virtual CPUs to use (can also specify as a range in form min:max)
      • <mem_size> is minimum memory capacity to use in GB (can also specify as a range in form min:max)
      • <vol_size> is the capacity of the data directory (in GB)
      • <mnt_point> is the mount point for the data directory.
      Note: If you are using the CLI, the image does not have to be in the App Library. Choose -i <image_uri> instead of -i <image_name>. The image is automatically loaded into the Private image library as an entry called <image_name>-<random_string> or image-<random_string>.
      Example:
      float sbatch -i /bitnami/python -j python_job_script.sh --cpu 4 –-mem 8 –-dataVolume [size=10]:/data
      id: EOgf83Ru3U5jf9u2JK6Gp
      name: image-xorwav-c5d.large
      user: admin
      imageID: docker.io/bitnami/python:latest
      status: Initializing
      ...(edited)
      Alternatively, you can use a definition file in conjunction with the float sbatch command. For example, to submit the same job, enter:
      float sbatch -d ./def1.yaml
      where def1.yaml is a file with the following contents:
      image: "bitnami/python"
      job: python_job_script.sh 
      cpu: 4
      mem: 8
      dataVolume:
       -  "[size=10]:/data"
    • Web interface: Click Submit Job (left-hand panel) and follow these steps.
      • Click the Start from Scratch tab
      • Click the Basic tab
      • Fill in the required fields
      • (Optional) In the Basic tab and all other tabs, fill in optional fields or modify prepopulated fields
      • Click Submit.
      As you fill in the fields, a CLI command string is generated (see the right-hand panel), which helps in understanding what effect the information in the fields has.
  6. Check job status.
    • CLI: Enter the following command:
      float squeue
      The first column shows the unique job identifier associated with each job.
    • Web interface: Click Jobs (left-hand panel).
  7. Display detailed information on the job you submitted.
    • CLI: Enter the following command:
      float show -j <job_id>
      where <job_id> is the unique job identifier shown in the previous step.
    • Web interface: Click Jobs (left-hand panel) and then click the ID associated with your job (identify your job by ID or name). A screen entitled Job Details - <job_name> is displayed.
  8. Display the logs associated with a job.
    • CLI: Enter the following command:
      float log ls <job_id>

      where <job_id> is the job identifier.

    • Web interface: On the Job Details - <job_name> screen, click the Attachments tab.
  9. View log file contents.
    • CLI: Enter the following command:
      float log tail --follow <log_file_name> -j <job_id>

      where <log_file_name> is the name of the log file you specified in the job script and <job_id> is the job identifier.

      Note: As the job runs, logs are written to a directory mounted from the OpCenter server. When the job ends, the logs, for example, stderr and stdout, are automatically saved by the OpCenter as stderr.autosave and stdout.autosave, respectively.
    • Web interface: On the Job Details - <job_name> screen, click the Attachments tab and then click the Preview icon next to the log you want to view. Click the Refresh button to update the display.
  10. Modify parameters (security groups, migration policy, VM creation policy, error policy or periodic snapshots) associated with a running a job.
    • CLI: Enter the following command.
      float modify -j <job_id> --<option> <option_string>
      where <option> is one of the following:
      • addSecurityGroup (<option_string> is the identifier of security group to add)
      • rmSecurityGroup (<option_string> is the identifier of the security group to remove)
      • migratePolicy (<option_string> is the new migration policy to apply)
      • snapshotInterval (<option_string> is the new periodic snapshot interval to apply; use "0" or "disable" to turn off)
      • vmPolicy (<option_string> is the new VM creation policy to apply)
      • errPolicy (<option_string> is the new error policy to apply)
    • Web interface: On the Jobs screen, identify your job by ID or name. Under the Actions column, click the Modify Job icon associated with your job. In the dialog box, fill in the new parameter values to apply to the job and then click Modify.
  11. View contents of job file if submitted with job (recent or archived jobs).
    • CLI: Enter the following command:
      float show -c -j <job_id>
    • Web interface: On the Jobs screen, identify your job by ID (or name) and click the entry. On the Job Details - <job_name> screen, click the tab entitled Settings. Scroll down to the Job Script section.
  12. Cancel a running a job.
    • CLI: Enter the following command:
      float scancel -j <job_id>
    • Web interface: On the Jobs screen, identify your job by ID or name. Under the Actions column, click the More icon associated with your job and then select Cancel.

What to do next

After the job runs to completion, retrieve your results from the location you specified in the job script.