CLI Command Reference
Use float
CLI commands to interact with the OpCenter; for example, to submit and manage jobs.
Version
The MMCloud CLI commands described here are consistent with the OpCenter release shown in the table.
MMCloud CLI | OpCenter Release | Date |
---|---|---|
FLOAT_v3.0.0-69ce0c9-Imperia.bin | FLOAT_v3.0.0-69ce0c9-Imperia.bin | 2024-07-31T08:34:11Z |
Usage
Use float
in the following format.
float [global_flags] [command] [subcommand] [options]
Global Flags
Use global flags with any float
command or subcommand.
Flag | Usage | Definition |
-a, --address ip_address | Connect to OpCenter server | IP address of OpCenter (default: localhost or last OpCenter IP address used) |
-F, --format json | yaml | table | Specify format for output | Output format (default: yaml) |
-h, --help | Display help | Help for MMCloud CLI |
--logLevel log_level | Specify log level | Log level (default: info) |
-p, --password password | Log in to OpCenter | Login password |
--scroll | Enable scroll mode for multiple page output | Enable navigation (up, down, left, right) for displays that span multiple pages |
-u, --username user_name | Log in to OpCenter | Login username |
-v, --verbose on | off | Turn verbose mode on or off | Verbose mode setting (default: off) |
Categories
The MMCloud CLI commands are grouped into categories as shown in the table. An alphabetical listing of the commands follows.
Category | Commands |
---|---|
Job Management | list, show, submit, cancel, migrate, modify, suspend, resume, snapshot, rerun, promote, hosts |
Job Status Monitoring | log, ps, top, df |
Authentication & Authorization | login, logout, secret |
User Management | user, group, quota-policy |
OpCenter Core Services | image, gateway, template, config, license, report, storage |
OpCenter Management Operations | status, version, release, restart |
Other | completion |
float cancel (scancel)
Use the float cancel
(or float scancel
) command to cancel a job. The command has no subcommands. Use the command with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
Cancel job | --filter filter |
Filter to select jobs if --job not specified. A simple filter is [attribute][operator][value] . Use --filter multiple times to apply multiple filters combined with "and" operator. Create complex filters by combining simple filters using parentheses and operators (and, or). Use "? to match a single character and "*" to match multiple characters.Values are strings, datetimes, or numbers. Operators are:
Examples:
| |
-f, --force | Automatically answer "yes" at confirmation prompt | ||
-j, --job job_id |
Job to cancel if --filter not specified
|
Example
$ float cancel -j ctZLDo7OFG4BuJ8ytiTem
Warning: Are you sure you want to cancel this job?
ID: ctZLDo7OFG4BuJ8ytiTem
Name: python-c5d.large
All the related resources will be released. (yes/No): y
Request to cancel ctZLDo7OFG4BuJ8ytiTem has been submitted
float completion
Use the float completion
command to generate auto-completion script for use in current shell or in every shell started subsequently. Use a subcommand with the -h
flag to display help on how to use the auto-completion script.
Subcommands | Usage | Option | Option Definition |
bash
| Generate auto-completion script for bash | --no-descriptions | Disable completion descriptions |
fish
| Generate auto-completion script for fish | --no-descriptions | Disable completion descriptions |
powershell
| Generate auto-completion script for powershell | --no-descriptions | Disable completion descriptions |
zsh
| Generate auto-completion script for zsh | --no-descriptions | Disable completion descriptions |
float config
Use the float config
command to view or change the OpCenter configuration. Use a subcommand with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
cert
| Set server certificate and key (must supply both) | -c, --cert /path/to/cert | Path to certificate file |
-k, --key /path/to/key | Path to key file | ||
get
| Show runtime value of a single OpCenter configuration parameter | config_parameter | Configuration parameter to display, for example, sessionTimeout |
ldap
| Set values for LDAP configuration parameters | --addr ldap_server_address | IPv4 address of LDAP server |
--adminGroup ldap_admin_group | LDAP admin group | ||
--anonymous=true | false | LDAP anonymous bind (default false) | ||
--base base_DN | LDAP base Distinguished Name (DN) | ||
--bindDN bind_DN | LDAP bind DN | ||
--bindPW bind_pw | LDAP bind password | ||
--cert /path/to/cert | Path to LDAP certificate file | ||
--conf /path/to/conf_file | Path to LDAP configuration file | ||
--connTimeout duration | Duration (format: HhMmSs) until LDAP connection times out (default 10s) | ||
--enable=true | false | Enable LDAP (default true) | ||
--groupOU group_OU | LDAP group Organizational Unit (OU) (default "Group") | ||
--key /path/to/key | Path to LDAP key file | ||
--network tcp | udp | LDAP connection protocol (default "tcp") | ||
--peopleOU people_OU | LDAP people OU (default "People") | ||
--reset | Reset all LDAP parameters to empty (null) | ||
--tls=true | false | LDAP use tls (default true) | ||
list , ls
| List configuration parameters for OpCenter (shows if a parameter can be changed and if a change requires a system restart) | -s, --scope filter | Condition(s) to filter the list of configuration parameters (default: list all parameters) |
mset
| Set runtime values of multiple configuration parameters | config_parm1=parm_value1 config_parm2=parm_value2 ... | Array of configuration parameters and the values to set them to. If the word "default" is used for `parm_value`, the configuration parameter is reset to its default value. |
set
| Set runtime value of a single configuration parameter | config_parm parm_value | Configuration parameter and the value to set it to. If the word "default" is used for `parm_value`, the configuration parameter is reset to its default value. |
--file value.txt | Text file containing value for `config_parameter` (see example) |
Examples
$ float config set sessionTimeout 48h
sessionTimeout is set to 48h0m0s
$ float config set sessionTimeout default
sessionTimeout is set to 1h0m0s
$ float config ls --scope "session"
+----------------+----------+----------+--------------+
| KEY | VALUE | EDITABLE | NEED RESTART |
+----------------+----------+----------+--------------+
| sessionTTL | 168h0m0s | Y | |
| sessionTimeout | 1h0m0s | Y | |
+----------------+----------+----------+--------------+
$ echo -n 48h > session.txt
$ float config set sessionTimeout --file session.txt
The value of sessionTimeout is set to '48h0m0s'.
float df
Use the float df
command to display the file systems mounted by an executing job (includes used and available disk space). The linux command df
must be installed in the container. This command has no subcommands. Use the command with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
Display mounted file systems and associated disk space. | --args df_args |
Arguments to pass to df command
| |
-j, --jobId job_id | Job to query |
Example
$ float df --args "-h" -j WIY92p0jWyaCMP0CNQYjC
Filesystem Size Used Avail Use% Mounted on
overlay 6.0G 1.1G 5.0G 17% /
/dev/nvme3n1 10G 105M 9.9G 2% /data
tmpfs 64M 0 64M 0% /dev
172.31.81.17:/mnt/memverge/slurm/work/nzG6oM5DoLCysXAkoZFCA/app 50G 20G 31G 39% /mmce
/dev/nvme2n1 6.0G 1.1G 5.0G 17% /etc/hosts
/dev/nvme0n1p2 40G 7.5G 33G 19% /opt/aws
shm 63M 0 63M 0% /dev/shm
devtmpfs 1.7G 0 1.7G 0% /proc/key
float gateway
Use the float gateway
command to create and manage a reverse proxy server. Use a subcommand with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
connect
| Connect a running job to the gateway (reverse proxy) | -g, --gateway gw_id | ID of gateway to connect job to (format is g- followed by a fixed-length character string) |
-j,--job job_id | ID of job to connect to gateway | ||
--targetPort port_number |
Port used to connect to job, for example, 8787 for RStudio. Include --targetPort multiple times to connect multiple ports from a single server to the gateway.
| ||
create
| Create a gateway | --bandwidth bw | Minimum gateway bandwidth in Mbps (default: 25). Only use with AliCloud. |
-c,--cpu min_cpu | Minimum number of virtual CPUs for gateway (default: 0) | ||
-t,--instType instance_type |
VM instance type for gateway, for example, c5.xlarge for AWS. Do not combine with --cpu or --mem options.
| ||
-m,--mem min_mem | Minimum memory capacity in GB for gateway (default: 0) | ||
-n,--name gw_name | Name to associate with gateway | ||
--noPublicIP | Create gateway with private IP address only. Ensure that gateway is reachable from the hosts that need to connect. | ||
--portRange min:max | Range of client-side ports opened on gateway (allowed range is between 1000 and 65535). | ||
--securityGroup sec_group | AWS security group (or tag in GCP) applied to gateway. Include option multiple times to apply multiple security groups. | ||
-z, --zone availability_zone | Availability zone in which to create gateway | ||
destroy
| Destroy a gateway | -g, --gateway gw_id | ID of gateway to destroy (format is g- followed by a fixed-length character string) |
-f, --force | Automatically answer "yes" at confirmation prompt | ||
disconnect
| Disconnect a job from a gateway | -g, --gateway gw_id | ID of gateway to query (format is g- followed by a fixed-length character string) |
-j, --job job_id | ID of job to disconnect from gateway | ||
--port port_num |
Server-side port to disconnect from gateway. If gateway connects to multiple ports on server, include --port for each port.
| ||
info
| Display information about a gateway (including IP address and connected jobs) | -g, --gateway gw_id | ID of gateway to query (format is g- followed by a fixed-length character string) |
list
| List all running gateways (optionally include stopped gateways) | -A, --showAll | Option to list stopped as well as running gateways (overrides filters) |
-f, --filter filter | Filter(s) to apply to list of gateways. Use option multiple times to apply multiple filters combined with "and" operator. See details in Working with OpCenter/Filters/Gateway Filters. | ||
-o, --orderBy attribute |
Attribute used to order gateway listing (prepend with "-" to reverse order). Supported values are (default: lastUpdate):
| ||
modify
| Modify attributes of a gateway | --addSecurityGroup sec_group | AWS security group (or tag in GCP) added to gateway. Include option multiple times for multiple security groups. |
-g, --gateway gw_id | ID of gateway to modify | ||
--rmSecurityGroup sec_group | AWS security group (or tag in GCP) removed from gateway. Include option multiple times for multiple security groups. |
Example
$ float gateway create -n NewGateway --securityGroup sg-0fbb6a83983183364 --portRange 10000:10500
id: g-wsfthf3z8zeb0cyoc6dsb
name: NewGateway
status: Creating
configuration: ""
IPAddress: ""
portRange: 10000-10500
startTime: "2024-01-01T16:33:20Z"
usedPorts: 0
cost: ""
clientJobs: {}
float group
Use the float group
command to manage OpCenter user groups. Use a subcommand with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
add , create
| Create a new group | new_group | Name of new group |
--admin user1, user2... | User(s) given admin role in new group | ||
--gid GID | Group ID for this group (default is next available gid) | ||
--user user1, user2... | User(s) added to new group | ||
add --ldap
| Associate an LDAP group with an LDAP directory | ldap_group_name | Name of group to associate with LDAP directory |
delete , remove , rm
| Delete a group |
group_name | | Name or id of group to delete |
info , show
| Display information about group including members | group_name | group_id | Name or id of group to query |
list , ls
| List groups that current user belongs to | ||
update
| Update attributes of a group | group_name | group_id | Name or id of group to update |
--add user1, user2... | User(s) added to group | ||
--admin |
Flag to indicate that any username listed after --add is given admin role in group and any username listed after --remove has admin role in group removed
| ||
--gid GID | New group ID for this group | ||
--name group_name | New name for group | ||
--remove user1, user2... | User(s) removed from group |
Examples
float group update crops --admin --remove barley --add wheat
name: crops
gid: 2007
admins: wheat
users: wheat,barley
type: builtin
float hosts (sinfo)
Use the float hosts
(or sinfo
) command to show details and current status of current (or all) worker nodes. The command has no subcommands. Use the command with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
Show current (or all) worker node details including status | -A, --all | Clear all filters to show all jobs | |
-f, --filter filter |
Filter(s) to apply to jobs whose associated hosts are displayed. Use option multiple times to apply multiple filters combined by "and" operator. See details in Working with OpCenter/Filters/Host Filters.
The default filter is: -f "running=true or update<=1h" Example: -f status=executing -f timeRange=2010-10-22~ | ||
-o, --orderBy attribute |
Attribute used to order host listing (prepend with "-" to reverse order). Supported values are (default: start):
| ||
-w, --windowSize num_hosts | Number of hosts displayed before header row is repeated (default: 512) |
float image
Use the float image
command to manage container images on OpCenter. Use a subcommand with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
add
| Add image pull information to OpCenter | image_name image_uri |
Name to associate with image in the AppLibrary and URI to pull image from repository. Do not use image_uri if --link specified.
|
--import-all | Import all tags for this image from repository | ||
--link file_link |
Link to access AWS S3 or AliCloud OSS file, for example, s3://bucket_name/file_path Do not use if image_uri specified.
| ||
--token repo_access_token | Token to pull image from private repository | ||
--user repo_access_user | Username to pull image from private repository | ||
cache
| Add (delete) image to (from) cache configured for OpCenter (NFS-mounted directory or S3 bucket) | image_name | Container image to add to or delete from cache |
-d, --delete | If option included, delete image from cache | ||
-f, --force | Overwrite existing cached version of image | ||
--tag tag_name | Tag to select specific container image (default: "latest") | ||
delete , rm , remove
| Delete container image(s) and one or more tags | image1 image2... or image1:tag1 image2:tag2... | Container image(s) to delete. If no tags are specified, then all tags are removed. |
-f, --force | Automatically answer "yes" at confirmation prompt | ||
--tag tag_name | Tag associated with image1 image2... to remove. If this is the only tag associated with images, images are removed. | ||
list , ls
| Show all images available on OpCenter | ||
tags , tag
| Display tags associated with image and image status | image_name | Container image name |
update
| Update information associated with image | image_name | Container image name |
--addtag image_tag1, image_tag2... | Additional tags to associate with image | ||
--name image_name | New name to identify image | ||
--token repo_access_token | New token required to access private repository | ||
--user repo_access_user | New username associated with token required to access private repository | ||
upload
| Load image from local server | image_name | Name to identify image in App Library |
-i, --image local_name |
Name of image in local repository (cannot use if --path included)
| ||
--path /path/to/image |
Path to where image is located ((cannot use if --image included))
|
Examples
$ float image list
+------------+-------------------------------+--------+-------------+
| NAME | URI | TAGS | ACCESS USER |
+------------+-------------------------------+--------+-------------+
| python | docker.io/bitnami/python | latest | |
| r-base | docker.io/rocker/r-base | latest | |
.....(edited)
+------------+-------------------------------+--------+-------------+
$ float image add r-base docker.io/rocker/r-base
name: r-base
uri: docker.io/rocker/r-base
owner: admin
tags:
latest:
status: Available
locked: false
lastUpdated: 2023-06-15T16:12:41.739947963Z
size: Unknown
$ float image cache blast
Request to cache image blast (tag: latest) has been submitted
$ float image cache blast --delete
Request to clean image blast (tag: latest) has been submitted
$ float image tags blast
+--------------------------+--------+-----------+----------------------+
| URI | TAG | STATUS | LAST UPDATED |
+--------------------------+--------+-----------+----------------------+
| docker.io/memverge/blast | 2.14.0 | Available | 2023-06-26T05:15:02Z |
| | latest | Available | 2024-08-13T15:51:33Z |
$ float image upload test2 --path /tmp/hello-world-latest.tar
Start uploading image test2 (localhost/hello-world:latest) from /tmp/hello-world-latest.tar
Progress: |>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>| 100.00% Complete (ETA. 0s)
Uploaded image /tmp/hello-world-latest.tar, time spent: 0s
name: test2
uri: localhost/hello-world
owner: admin
tags:
latest:
status: Ready
uri: file:///mnt/memverge/images/test2-latest.tar
locked: false
lastUpdated: 2023-04-17T19:26:38.584557866Z
lastPushed: 2023-04-17T19:26:38.584557866Z
size: Unknown
$ float image upload testinggzupload --path ./testinggzupload.tar.gz
Start uploading image testinggzupload (docker.io/library/testinggzupload:latest) from ./testinggzupload.tar.gz
Progress: |>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>| 100.00% Complete (ETA. 0s)
Uploaded image ./testinggzupload.tar.gz, time spent: 31m20s
name: testinggzupload
uri: docker.io/library/testinggzupload
owner: admin
tags:
latest:
status: Ready
uri: s3://opcenter-bucket-594424d0-c760-11ee-8abd-128ad93e7ec9/images/ntltcroq2g.tar
locked: false
lastUpdated: 2024-04-11T15:39:29.686393328Z
lastPushed: 2024-04-11T15:39:29.686393428Z
size: 1.33 GB
Note
The user that queries the local repository for the IMAGE ID must be the same user that executes the float image upload command
In this example, the user is root.
$ sudo podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
quay.io/podman/hello latest 39ae24b9cabf 3 days ago 1.7 MB
$ sudo /opt/memverge/bin/float image upload testimage --image 39ae24b9cabf
Found image quay.io/podman/hello:latest, using it as repository and tag
Using /bin/podman to save image quay.io/podman/hello:latest to testimage.tar
Start uploading image testimage (quay.io/podman/hello:latest) from testimage.tar
...[edited]
float license
Use the float license
command to manage OpCenterlicenses. Use a subcommand with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
acquire
| Acquire license from MMCloud Portal | --A, --account acct_name | Username (email address) to access the MMCloud Portal |
-P, --password passwd | Password to access the MMCloud Portal | ||
info
| Display license information and status |
float list (squeue)
Use the float list
(or squeue
) command to show a filtered list of queued jobs. The command has no subcommands. Use the command with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
Show filtered list of queued jobs (default: list jobs from oldest to newest) | -A, --all | Clear all filters to show all jobs | |
-f, --filter filter |
Filter(s) to apply to list of queued jobs. Use option multiple times to apply multiple filters combined with "and" operator. See details in Working with OpCenter/Filters/Job Filters. Example: -f status=executing -f timeRange=2023-10-22~ -f tags=kind:training,project:finance-\* | ||
-o, --orderBy attribute |
Attribute used to order job listing (prepend with "-" to reverse order). Supported values are (default: start):
| ||
-w, --windowSize num_jobs | Number of jobs displayed before header row is repeated (default 512) |
Examples
$ float squeue -f "imageID=*blast*"
+-------+------------------+...+-------+-----------+----------+----------------------+------------+
| ID | NAME |...| USER | STATUS | DURATION | SUBMIT TIME | COST |
+-------+------------------+...+-------+-----------+----------+----------------------+------------+
| m8t...| blast-c5.9xlarge | | admin | Completed | 8h44m18s | 2024-02-14T04:07:38Z | 5.4851 USD |
| w70...| blast-c5.9xlarge | | bean | Completed | 9h3m16s | 2024-02-14T04:22:45Z | 5.3502 USD |
+-------+------------------+...+-------+-----------+----------+----------------------+------------+
...[edited for clarity]
$ float squeue -f "imageID=*blast*" -f user=bean
+-------+------------------+...+-------+-----------+----------+----------------------+------------+
| ID | NAME |...| USER | STATUS | DURATION | SUBMIT TIME | COST |
+-------+------------------+...+-------+-----------+----------+----------------------+------------+
| w70...| blast-c5.9xlarge | | bean | Completed | 9h3m16s | 2024-02-14T04:22:45Z | 5.3502 USD |
+-------+------------------+...+-------+-----------+----------+----------------------+------------+
...[edited for clarity]
$ float squeue -o name
+--------+----------------------+------------------+-------+-----------+--------------+----------+------------+
| ID | NAME | WORKING HOST | USER | STATUS | SUBMIT TIME | DURATION | COST |
+-------------------------------+------------------+-------+-----------+--------------+----------+------------+
| SZd... | apple | 54.163.154.116...| admin | Executing | ...19:11:47Z | 1h35m1s | 0.1143 USD |
| btC... | banana | 34.234.64.157... | admin | Executing | ...19:12:19Z | 1h34m30s | 0.0926 USD |
| hnP... | camel | | admin | Completed | ...20:43:01Z | 3m28s | 0.0031 USD |
| 5d1... | cherry | 54.89.203.124... | admin | Executing | ...19:12:27Z | 1h34m22s | 0.0924 USD |
| grd... | python-t3a.xlarge | 34.229.17.82... | admin | Completed | ...20:43:48Z | 2m51s | 0.0026 USD |
| nVD... | tidyverse-t3a.xlarge | 3.89.224.246... | admin | Executing | ...19:14:39Z | 1h32m10s | 0.0903 USD |
+--------+----------------------+------------------+-------+-----------+--------------+----------+------------+
...[edited for clarity]
$ float squeue -f "imageID=*python:latest and status=Executing"
+-----------------------+-------------------------+------------------------------+--------+-----------+...
| ID | NAME | WORKING HOST | USER | STATUS |...
+-----------------------+-------------------------+------------------------------+--------+-----------+...
| w2763vpgwuy9cksihxbcd | python-d72rur-t3.medium | 44.220.82.26 (2Core4GB/Spot) | banana | Executing |...
| q6r4ome7wyywk33b5jo9g | testpy | 3.239.38.179 (2Core4GB/Spot) | bean | Executing |...
+-----------------------+-------------------------+------------------------------+--------+-----------+...
...[edited for clarity]
$ float squeue -f "imageID=*python* or imageID=*tidyverse*"
+-----------------------+-------------------------+------------------------------+--------+-----------+...
| ID | NAME | WORKING HOST | USER | STATUS |...
+-----------------------+-------------------------+------------------------------+--------+-----------+...
| w2763vpgwuy9cksihxbcd | python-d72rur-t3.medium | 44.220.82.26 (2Core4GB/Spot) | banana | Executing |...
| q6r4ome7wyywk33b5jo9g | testpy | 3.239.38.179 (2Core4GB/Spot) | bean | Executing |...
| bxulaob02n1d91cmymvxk | Rtest | 44.212.52.61 (2Core4GB/Spot) | bean | Executing |...
+-----------------------+-------------------------+------------------------------+--------+-----------+...
...[edited for clarity]
float log
Use the float log
command to view and manage log files. Use subcommand with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
cat
| Write log file contents to standard output | log_file | Log file whose contents are displayed |
-i, --hid host_id | Host whose logs are displayed (default: OpCenter server) | ||
-j, --job job_id | Job whose logs are displayed (default: OpCenter) | ||
download
| Download a zip file of selected logs associated with job | -i, --include logsinzip1, logsinzip2... |
Logs included in zip file (default: "all"). Options are:
|
-j, --job job_id | Job whose log files are included | ||
--path /path/to/dir | Path to save zip file (default "./") | ||
list , ls
| List all logs associated with target | -i, --hid host_id | Host whose log files are listed (default: OpCenter server) |
-j, --job job_id | Job whose log files are listed (default: OpCenter) | ||
-H, --readable | Display log size in human-readable format | ||
rm
| Remove all logs associated with target | -i, --hid host_id | Host whose log files are removed (default: OpCenter server) |
-j, --job job_id | Job whose log files are removed (default: OpCenter) | ||
tail
| Write last n lines of log file to standard output | log_file | Log file to display |
-f, --follow | Display new lines as they are appended to log file | ||
-i, --hid host_id | Host whose log file lines are displayed (default: OpCenter server) | ||
-j, --job job_id | Job whose log file lines are displayed (default: OpCenter) | ||
-n, --num n | Number of lines to display (default: 100) |
Examples
$ float log tail --follow output -j XGiUDRto7kwofWBNPkiW5
Ready to prepare source data
Ready to download pbmc_1k_v3_fastqs from s3
Ready to download refdata-gex-GRCh38-2020-A from s3
Ready to run test
...[output edited]
$ float log ls -H
+---------------------+---------------+----------------------+
| LOG NAME | READABLE SIZE | LAST UPDATE TIME |
+---------------------+---------------+----------------------+
| opcenter.access_log | 7.40 MB | 2024-02-14T19:02:02Z |
| opcenter.log | 712.73 KB | 2024-02-14T18:55:28Z |
| upgrade.log | 1.92 KB | 2024-02-11T03:54:27Z |
| messages | 220.56 KB | 2024-02-14T18:48:48Z |
+---------------------+---------------+----------------------+
float login
Use the float login
command, with valid username and password, to log in to OpCenter. The command has no subcommands. Use the command with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
Log in to OpCenter | --info | Display login status |
Examples
float logout
Use the float logout
command to log the current user out of the OpCenter and invalidate the authorization token. The command has no subcommands. Use the command with the -h
flag to list the options.
Example
$ float logout
Logout Succeeded!
$ float login --info
Error: cannot find any login session (code: 2004)
float migrate
Use the float migrate
command to move a job from one VM instance to another VM instance of the same type or of a different type. The command has no subcommands. Use the command with the -h
flag to display the options.
Subcommands | Usage | Option | Option Definition |
Migrate job to a new VM instance. If no options used, migrate to identical instance in same availability zone. | -c, --cpu min_cpu:max_cpu | Range of number of virtual CPUs to select from for new VM instance (can omit max_cpu). | |
-e, --env |
Environment variables that apply when the migrated job resumes. Use format env_key=env_value
| ||
-f, --force | Automatically answer "yes" at confirmation prompt | ||
--gpu-count min_gpu:max_gpu | Range of number of GPUs to select from for new VM instance | ||
--gpu-mem min_gpumem:max_gpumem | Range of memory size (in GB) of GPUs to select from for new VM instance | ||
--gpu-vendor gpu_vendor |
GPU vendor name (for example, nvidia or amd )
| ||
-t, --instType instance_type |
VM instance type to migrate to (e.g., c5.xlarge in AWS). Do not combine
with --cpu or --mem options.
| ||
-j, --job job_id | Job to migrate | ||
-m, --mem min_mem:max_mem | Range of memory size (in GB) to select from for new VM instance (can omit max_mem). | ||
-P, --payType spot | ondemand | Pricing tier for VM instance (Spot or On-demand) | ||
--sync | Block all terminal input until job has migrated | ||
--rerun | Ignore snapshot and restart job from the beginning on new VM instance (job must be running) | ||
-z, --zone availability_zone | Availability zone in which to execute job |
Examples
$ float migrate -j lL07E84pQQpYqCQ88xeIQ
entity:
id: i-0cfbd3f1f82087dd5
type: host
name: 192.168.0.2
status: normal
instanceType: c5.xlarge
startTime: "2022-09-23T15:09:09Z"
downTime: ""
$ float migrate -f --sync --payType ondemand -j tqrGc4Z6g18nkphzTeaxM
tqrGc4Z6g18nkphzTeaxM is now migrating...
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
tqrGc4Z6g18nkphzTeaxM has been migrated to 44.212.92.162 (4Core16GB/OnDemand).
float modify
Use the float modify
command to change a subset of attributes associated with a running job. The command has no subcommands. Use the command with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
Modify certain attributes associated with running job | --addCustomTag tagName:tagValue | Tag name and value to associate with running job. Separate multiple tags with "," or include option multiple times to add multiple tags. | |
--addSecurityGroup sec_group | Security group (or tag in GCP) added to VM instance for this job. Include option multiple times to add multiple security groups. | ||
--errPolicy err_policy |
The policy to use if the job fails. The allowed policies are the following.
| ||
--force | Automatically answer "yes" at confirmation prompt | ||
-j, --job job_id | Job to apply changes to | ||
-M, --migratePolicy migrate_policy |
New migrate policy to apply. See float submit for format.
| ||
--rmSecurityGroup sec_group | Security group (or tag in GCP) to remove from VM instance for this job. Include option multiple times to remove multiple security groups. | ||
--snapshotInterval snapshot_interval | New periodic snapshot interval for this job. Use "disable" or "0" to turn off periodic snapshots. | ||
-V, --vmPolicy vm_policy |
New VM creation policy to apply. See float submit for format.
|
Example
$ float modify --vmPolicy [SpotOnly=true] -j PF0bgCvlpdJkog0RCBZPg
Warning: Are you sure you want to modify PF0bgCvlpdJkog0RCBZPg?
New vmPolicy may trigger migration.(yes/No): yes
Successfully modified PF0bgCvlpdJkog0RCBZPg: --vmPolicy [SpotOnly=true]
$ float modify -j y78iefqeo8k6ng2rpvduj --addCustomTag run-name=task1
Successfully modified y78iefqeo8k6ng2rpvduj: --addCustomTag run-name=task1
$ float show -j y78iefqeo8k6ng2rpvduj
...
customTags:
run-name=task1: ""
...[edited]
$ float modify -j y78iefqeo8k6ng2rpvduj --addCustomTag project=RNA-sequencing,dept=Research
Successfully modified y78iefqeo8k6ng2rpvduj: --addCustomTag project=RNA-sequencing,dept=Research
$ float modify -j y78iefqeo8k6ng2rpvduj --addCustomTag PI=jones --addCustomTag funding=NHS
Successfully modified y78iefqeo8k6ng2rpvduj: --addCustomTag PI=jones --addCustomTag funding=NHS
$ float show -j y78iefqeo8k6ng2rpvduj
...
customTags:
PI=jones: ""
funding=NHS: ""
project=RNA-sequencing,dept=Research: ""
run-name=task1: ""
...[edited]
float promote (boost)
Use the float promote
or float boost
command (as an admin user) to move a job in the "Submitted" state to the front of the scheduling queue. The command has no subcommands. Use the command with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
Promote a job in the "Submitted" state to the front of the scheduling queue. Must be admin user. | -j, --job job_id | Job to promote |
float ps
Use the float ps
command to show the complete process tree of a running job (the linux command ps
must be installed in the container). The command has no subcommands. Use the command with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
Show the complete process tree of job | --args podman_args |
Arguments passed to podman
| |
-j, --jobId job_id | Job to query |
float quota-policy
Use the float quota-policy
command to define and manage quota (SurfZone) policies. Use a subcommand with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
add , create
| Create quota policy | policy_name | Name to associate with the quota policy |
--limit budget_limit | Maximum amount (in $) allowed to spend in one month | ||
--action cancel | suspend | Action taken when quota limit reached (default: cancel) | ||
--auto-resume=true | false | Action taken on suspended job when quota replenished (default: true) | ||
--threshold threshold | Percentage of quota (budget) consumed to trigger alert (default: 80) | ||
info , show
| Display quota policy details | policy_id | ID that identifies quota policy |
list
| List available quota policies | ||
update , modify
| Update parameters in existing quota policy | policy_id | ID that identifies quota policy |
--name policy_name | New name to associate with the quota policy | ||
--limit budget_limit | Updated maximum amount (in $) allowed to spend in one month | ||
--action cancel | suspend | Updated action taken when quota limit reached | ||
--autoresume true | false | Updated action taken on suspended job when quota replenished | ||
--threshold threshold | Updated percentage of quota (budget) consumed to trigger alert | ||
delete , rm , remove
| Delete a quota policy | policy_id | ID to identify quota policy |
Examples
$ float quota-policy list
id: u2w02ntshv61mxc7muaz0
name: fruit
metric: Cost
overageAction: cancel
autoResume: false
threshold: 80%
limit: 37
- id: 2f4qaesir4wxz71mazstq
name: legume
metric: Cost
overageAction: suspend
autoResume: true
threshold: 80%
limit: 35
$ float quota-policy update 2f4qaesir4wxz71mazstq --limit 50
id: 2f4qaesir4wxz71mazstq
name: legume
metric: Cost
overageAction: suspend
autoResume: true
threshold: 80%
limit: 50
float release
Use the float release
command to manage the OpCenter software. Use a subcommand with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
info
| Display information regarding new features and bug fixes in OpCenter release | -r, --release version | The release to display information about |
list , ls
| List available OpCenter releases | ||
sync
| Sync CLI version with OpCenter version | ||
upgrade
| Upgrade OpCenter software | --force | Automatically answer "yes" at confirmation prompt |
-r, --release version |
The release to upgrade to (default: "latest"). Only the admin user can upgrade software. Cannot upgrade using web CLI console.
| ||
--sync | Wait for upgrade to complete and then sync the CLI to the new release |
Examples
$ float release ls
+----------+--------------------------------------+----------------------+-----------+
| VERSION | RELEASE | RELEASE TIME | SIZE |
+----------+--------------------------------------+----------------------+-----------+
| * v2.5.0 | FLOAT_v2.5.0-171088e-HalfMoonBay.bin | 2024-02-09T03:31:15Z | 220.65 MB |
| v2.4.1 | FLOAT_v2.4.1-0803674-Goa.bin | 2024-01-09T18:29:33Z | 219.32 MB |
+----------+--------------------------------------+----------------------+-----------+
$ float release sync
downloading ...
Progress: |>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>| 100.00% Complete (ETA. 0s)
The float binary is synced up with opcenter
Note
If you place the float
binary in a directory that is only writable by the root user, use sudo float release sync
float report
Use the float report
command to download reports from OpCenter. Use a subcommand with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
download
| Download OpCenter usage history report | --path /path/to/dir | Path to save report on local computer |
get
| Generate and display usage report with filter(s) applied | report_name | Name of report, e.g., usage_report_by_job |
-A, --all | Compile report from all usage data retained by this OpCenter. If -A not used, default filter applied (different for each report). | ||
-d, --date date_string | Date used to filter reports (default: current report) | ||
--filter filter | Filter(s) to apply to usage report. Use option multiple times to apply multiple filters. Working with OpCenter/Filters/Job Filters for details. Example: timeRange=2010-10-22~ | ||
-l, --limit num_records | Maximum number of records to display (default: 0 which means unlimited) | ||
-o, --orderBy field | Field to order by. Use 'cost' or 'jobs' (default). The column entitled 'JOB COUNT' is used to order entries when 'jobs' specified. | ||
-r, --refresh | Force the refresh of report | ||
ls , list
| List all available usage reports |
Examples:
$ float report get usage_report_by_user -A -f status=Cancelled
+-----------+-----+-----------+------------+--------------+----------------+--------------------+...
| USER NAME | FEE | JOB COUNT | WALL TIME | COMPUTE TIME | SPOT INSTANCES | ONDEMAND INSTANCES |...
+-----------+-----+-----------+------------+--------------+----------------+--------------------+...
| admin | 0 | 69 | 617h57m42s | 615h42m7s | 47 | 11 |...
| apple | 0 | 1 | 6h33m59s | 6h53m31s | 8 | 0 |...
...
(edited)
$ float report download --path ./temp.gzip
Downloaded to ./temp.gzip
`file temp.gzip`
temp.gzip: gzip compressed data, original size modulo 2^32 14336
float rerun (requeue, resubmit)
Use the float rerun
(or requeue
or resubmit
) command to re-submit a completed job, a canceled job, or a job that failed to complete. The command has no subcommands. Use the command with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
Re-submit completed or failed job | -j, --job job_id | Job to re-submit |
float restart (reboot)
Use the float restart
(or reboot
) command to restart OpCenter (terminates all login sessions). The command has no subcommands. Use the command with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
Restart OpCenter | -f, --force | Automatically answer "yes" at confirmation prompt |
Example
$ float list
+-----+---------------------+...+-----------+----------------------+----------+------------+
| ID | NAME |...| STATUS | SUBMIT TIME | DURATION | COST |
+-----+---------------------+...+-----------+----------------------+----------+------------+
| g...| tidyverse-t3.medium |...| Executing | 2023-12-11T23:22:17Z | 20m37s | 0.0066 USD |
+---------------------------+...+-----------+----------------------+----------+------------+
$ float restart
Warning: There are running jobs in server, do you want to restart forcibly?
Some job may fail if you do that!(yes/No): yes
Warning: Are you sure you want to restart OpCenter? All active sessions will be inactivated.(yes/No): yes
OpCenter is now restarting and will be online soon.
Pause briefly and then log back in
$ float list
+-----+---------------------+...+-----------+----------------------+----------+------------+
| ID | NAME |...| STATUS | SUBMIT TIME | DURATION | COST |
+-----+---------------------+...+-----------+----------------------+----------+------------+
| g...| tidyverse-t3.medium |...| Executing | 2023-12-11T23:22:17Z | 21m42s | 0.0072 USD |
+---------------------------+...+-----------+----------------------+----------+------------+
(edited for clarity)
float resume (recover, restore)
Use the float resume
(or recover
or restore
) command to resume a suspended job (see float suspend
command). The command has no subcommands. Use the command with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
Resume a suspended job. If no options used, resume on identical instance as original.. | -c, --cpu min_cpu:max_cpu | Range of number of virtual CPUs to select from for new VM instance (can omit max_cpu). | |
-t, --instType instance_type |
Instance type for new VM (e.g., c4.xlarge in AWS). Do not combine with --cpu or --mem options.
| ||
-j, --job job_id | Job to resume | ||
-m, --mem min_mem:max_mem | Range of memory size (in GB) to select from for new VM instance (can omit max_mem). | ||
-P, --payType spot | ondemand | Pricing tier for VM instance (Spot or On-demand). |
float sbatch (submit)
The float sbatch
command is the same as float submit
.
float scancel (cancel)
The float scancel
command is the same as float cancel
.
float secret
Use the float secret
command to interact with the secret manager. Use the command with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
get
| Retrieve secret from the secret manager | secret_name | Name associated with secret |
ls , list
| List all secrets created by current user | ||
set , put
| Insert {name, value} pair into secret manager database | secret_name secret_value | {name, value} pair to insert |
--file /path/to/file | File containing secret_value. Use instead of specifying secret_value on command line. | ||
-f, --force | Overwrite existing secret value | ||
unset , rm , del , remove , delete
| Delete {name, value} pair from secret manager database | secret_name | Name associated with {secret_name, secret_value} pair to delete |
Example
$ float secret set s3access 1234567
Set s3access successfully
$ float secret ls
+----------+
| NAME |
+----------+
| s3access |
+----------+
$ float secret unset s3access
unset secret s3access successfully
$ cat value.txt
secret123
$ float secret set usersecret --file ./value.txt
Set usersecret successfully
float show
Use the float show
command to display the status of a job and content of job scripts. The command has no subcommands. Use the command with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
Show current status of job or VM instance, or show contents of scripts used for job | --containerInitScript | Display contents of container init script | |
-c, --content | Display contents of job script | ||
-i, --hid host_id | VM instance to query | ||
--hostInitScript | Display contents of host init script | ||
--hostTerminateScript | Display contents of host terminate script | ||
-j, --job job_id | Job to query |
Example
$ float show -j FOIW5Y16KZgJ6Tsd02QuS
id: ctZLDo7OFG4BuJ8ytiTem
name: python-c5d.large
workingHost: 52.7.123.178 (2Core4GB/Spot)
user: admin
imageID: docker.io/bitnami/python:latest
imageDigest: sha256:24c1d45bf41c396184bd9808b307c67267a809754cd176ac8d91cceb47d0f3ef
output: |-
Getting image source signatures
Copying blob sha256:1acb894a7ceb1ba5362fb85123b0248a064ed3195aaa24af74e8cb710ca1c5a4
Copying config sha256:23bacce690702dac91557ef74ab312cb3db5a2b4bb54ada968d1352e7d9a110a
Writing manifest to image destination
Storing signatures
Loaded image: docker.io/bitnami/python:latest
First submit job ctZLDo7OFG4BuJ8ytiTem, call podman directly
No cmd args provided, launch job directly
4eda6b0d471fb31eecfd06b59e1d8c527310861009fa4baf967d834397934bac
status: Executing
.....[output edited]
$ float show -c -j S0JPnWd3a2hQmVmVWCMWc
#!/usr/bin/bash
LOG_PATH=$1
LOG_FILE=$LOG_PATH/output
touch $LOG_FILE
exec >$LOG_FILE 2>&1
echo "Congratulations! You have submitted your first job"
for(( c=1; c<3; c++))
do
if [[ $(($c % 3)) == 1 ]] then
echo "Hello World!"
else
echo "Your next job will be more interesting" >&2
fi
sleep 20s
done
echo "Job complete"
float sinfo (hosts)
The float sinfo
command is the same as float hosts
.
float snapshot
Use the float snapshot
command to display information about snapshots. Use subcommand with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
list
| List available snapshots | -A, --all | Show all snapshots associated with job |
-f, --filter filter | Filter to apply to snapshots, for example: status=normal. The default filter is active=true. Use option multiple times to apply multiple filters combined with "and" operator. | ||
-j, --jobID job_id | Job to query | ||
show , info
| Display information about snapshot | -s, --snapID snapshot_id | ID of snapshot to query |
Example:
$ float snapshot show -s jobSnap-Kp89d7qzJjiuhNeaOUfsA
id: jobSnap-Kp89d7qzJjiuhNeaOUfsA
jobID: h3xg4jpC8ydcgkVWBKj4S
volumeSnapshots:
- zone: us-east-1b
volumeId: vol-02786458cebdeca43
volumeSize: 6
status: completed
snapshotId: snap-08e5b6ed8b7834b96
createTime: "2023-04-18T15:51:17Z"
mountPoint: /mnt/float-data
cost: 0.0000 USD
- zone: us-east-1b
volumeId: vol-06dd42bf6b091466a
volumeSize: 6
status: completed
snapshotId: snap-0f2833cdb639fa4b5
createTime: "2023-04-18T15:51:17Z"
mountPoint: /mnt/float-image
cost: 0.0000 USD
- zone: us-east-1b
volumeId: vol-023c2e91e34f37335
volumeSize: 10
status: completed
snapshotId: snap-025a733db664e9054
createTime: "2023-04-18T15:51:17Z"
mountPoint: /data
cost: 0.0000 USD
status: normal
createTime: 2023-04-18T15:51:17.434161661Z
float squeue (list)
The float squeue
command is the same as float list
.
float status
Use the float status
command to show current status of the OpCenter. The command has no subcommands. Use the command with the -h
flag to list the options.
Example
$ float status
id: ac300c37-b0da-4c7b-8a50-d3fc49f6ad2a
Server Status: normal
API Request Status: normal
License Status: valid
Init Time: "2024-02-09T15:36:39Z"
Up Time: "2024-02-11T03:54:28Z"
Current Time: "2024-02-15T15:57:02Z"
float storage
Use the float storage
command to "register" (that is, pre-configure) storage services for use when submitting jobs. Use subcommand with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
delete , rm , remove | Delete a registered storage service | storageID | Identifier associated with the storage service to delete |
info , show | Display information about a registered storage service | storageID | Identifier associated with the storage service to query |
list | Display filtered list of registered storage services | -f, --filter filter |
Filter to apply to registered storage services. Use option multiple times to apply multiple filters combined with "and" operator. Simple filters consist of an attribute, operator, and value. Supported attributes include:
|
-o, --orderBy attribute | Attribute used to order listing (prepend with "-" to reverse order). Use any filter attribute (default: createdTime). Create nested ordering by connecting multiple attributes with ",". | ||
register volume , add volume | Register an existing volume as a storage service | -n, --name storage_name | Name to associate with this storage service |
--permision normal | public | Storage service access permission
| ||
--mode ro | rw | Access mode
| ||
--id volume_id | Volume ID that identifies volume (for example, an identifier like vol-0da677a1f4967350a in AWS) | ||
-m, --mountPoint path | Path to directory where volume is mounted by container | ||
register nfs , add nfs | Register NFS-exported directory as a storage service | -n, --name storage_name | Name to associate with this storage service |
--permision normal | public | Storage service access permission
| ||
--mode ro | rw | Access mode
| ||
-m, --mountPoint path | Path to directory where volume is mounted by container | ||
--url nfs://nfs_server_ip/exported_dir | IP address of NFS server and exported directory, for example 192.168.1.1/home | ||
register lustre , add lustre | Register Lustre file system as a storage service | -n, --name storage_name | Name to associate with this storage service |
--permision normal | public | Storage service access permission
| ||
--mode ro | rw | Access mode
| ||
-m, --mountPoint path | Path to directory where volume is mounted by container | ||
--options string | Mount options for lustre file system, for example, [opts=rw, noexec, abort_recov, recovery_time_hard=120s] | ||
--url lustre://dns-name/mountname/exported_dir | DNS name of lustre server, file system's mount name, and exported directory, respectively. The mount name is the name you assign the file system when configuring FSx for Lustre on AWS. Naming the file system makes it easier to find. | ||
register s3 , add s3 | Register S3 bucket as a storage service | -n, --name storage_name | Name to associate with this storage service |
--permision normal | public | Storage service access permission
| ||
--mode ro | rw | Access mode
| ||
-m, --mountPoint path | Path to directory where volume is mounted by container | ||
--bucket s3://bucket_name/folder | S3 bucket name and folder (optional) to use for storage service | ||
-c, --credential [accessKey=key_value, secret=secret_value,] | Access key string and access secret value, respectively | ||
--endpoint s3_endpoint | Endpoint for S3 bucket, for example, s3.us-east-1.amazonaws.com | ||
register gs , add gs | Register Google Cloud Storage as a storage service | -n, --name storage_name | Name to associate with this storage service |
--permision normal | public | Storage service access permission
| ||
--mode ro | rw | Access mode
| ||
-m, --mountPoint path | Path to directory where volume is mounted by container | ||
--bucket gs://bucket_name/folder | Google Cloud Storage bucket name and folder (optional) to use for storage service | ||
-c, --credential [accessKey=key_value, secret=secret_value,] | Access key string and access secret value, respectively | ||
--endpoint gs_endpoint | Endpoint for GCS bucket, for example, storage.googleapis.com | ||
register | Register storage service using dataVolume format (instead of register volume|nfs|lustre|s3|gs ). | --dataVolume [string] | Parameters that define the storage service. Use the following format.
|
update volume | Update an existing volume as a storage service | storageID | Identifier associated with this storage service |
-n, --name storage_name | Name to associate with this storage service | ||
--permision normal | public | Storage service access permission
| ||
--mode ro | rw | Access mode
| ||
--id volume_id | Volume ID that identifies volume (for example, an identifier like vol-0da677a1f4967350a in AWS) | ||
-m, --mountPoint path | Path to directory where volume is mounted by container | ||
update nfs | Update NFS-exported directory as a storage service | storageID | Identifier associated with this storage service |
-n, --name storage_name | Name to associate with this storage service | ||
--permision normal | public | Storage service access permission
| ||
--mode ro | rw | Access mode
| ||
-m, --mountPoint path | Path to directory where volume is mounted by container | ||
--url nfs://nfs_server_ip/exported_dir | IP address of NFS server and exported directory, for example 192.168.1.1/home | ||
update lustre | Update Lustre file system as a storage service | storageID | Identifier associated with this storage service |
-n, --name storage_name | Name to associate with this storage service | ||
--permision normal | public | Storage service access permission
| ||
--mode ro | rw | Access mode
| ||
-m, --mountPoint path | Path to directory where volume is mounted by container | ||
--options string | Mount options for lustre file system, for example, [opts=rw, noexec, abort_recov, recovery_time_hard=120s] | ||
--url lustre://dns-name/mountname/exported_dir | DNS name of lustre server, file system's mount name, and exported directory, respectively. The mount name is the name you assign the file system when configuring FSx for Lustre on AWS. Naming the file system makes it easier to find. | ||
update s3 | Update S3 bucket as a storage service | storageID | Identifier associated with this storage service |
-n, --name storage_name | Name to associate with this storage service | ||
--permision normal | public | Storage service access permission
| ||
--mode ro | rw | Access mode
| ||
-m, --mountPoint path | Path to directory where volume is mounted by container | ||
--bucket s3://bucket_name/folder | S3 bucket name and folder (optional) to use for storage service | ||
-c, --credential [accessKey=key_value, secret=secret_value,] | Access key string and access secret value, respectively | ||
--endpoint s3_endpoint | Endpoint for S3 bucket, for example, s3.us-east-1.amazonaws.com | ||
update gs | Update Google Cloud Storage as a storage service | storageID | Identifier associated with this storage service |
-n, --name storage_name | Name to associate with this storage service | ||
--permision normal | public | Storage service access permission
| ||
--mode ro | rw | Access mode
| ||
-m, --mountPoint path | Path to directory where volume is mounted by container | ||
--bucket gs://bucket_name/folder | Google Cloud Storage bucket name and folder (optional) to use for storage service | ||
-c, --credential [accessKey=key_value, secret=secret_value,] | Access key string and access secret value, respectively | ||
--endpoint gs_endpoint | Endpoint for GCS bucket, for example, storage.googleapis.com |
float submit
Use the float submit
(or float sbatch
) command to submit a job. The command has no subcommands. Use the command with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
Submit job for execution | --allowList instance_type |
Allowed instance type(s). Use format like "c5*" which allows any VM type in the c5 family. Include --allowList multiple times to add other VM instance types to the allow list. Default is [ ] which allows all types.
| |
--bandwidth bandwidth | Bandwidth (in Mbps) required for workload (only use in AliCloud) | ||
-A, --cmdArgs command_string | Commands that are executed immediately after container starts (can be used with or without a job script) | ||
-c, --cpu min_cpu:max_cpu | Range of number of virtual CPUs to select from to run job (can omit max_cpu) | ||
--cpuVendor cpu_vendor_name | CPU vendor of allowed VM instances. Allowed values are amd, intel, or " ". Default is " " which allows any vendor. Use with AWS only. | ||
--customTag tagName:tagValue |
Tag customized by each MMCloud subscriber, for example, to generate customized usage reports. Multiple custom tags can be applied by using --customTag multiple times in a single command.
| ||
-D, --dataVolume [size=vol_size,throughput=rate]:/data_dir or vol_id:/container_mnt_pt/path/to/dir or nfs://ip_address/server_export_dir: /container_mnt_pt/path/to/dir or [accesskey=x,secret=y,mode=rw]s3://bucketname/path:/container_mnt_pt/path/to/dir or [opts=xxx,yy]lustre://dnsname/mountname/path:/container_mnt_pt/path/to/dir or [accesskey=xxx,secret=yyy]jfss3://bucketname:/container_mnt_pt/path/to/dir |
Data volume mounted by container. Use option multiple times to attach multiple data volumes.
| ||
-d, --def definition_file |
Path to file if using definition file (yaml or json format) to provide input parameters to submit command
| ||
--denyList instance_type | Excluded instance type(s). Use format like "c5*" which excludes all VM types in the c5 family. Include --denyList multiple times to exclude other VM instance types. Default is [ ] which does not exclude any types. | ||
--dumpMode full | incremental | Setting for snapshot type. Full means complete memory snapshot taken every time. Incremental means only incremental changes captured after initial snapshot. Default is full. | ||
-e, --env env_key=env_value |
Environment variable setting for the job. Include --env multiple times to add multiple variables.
| ||
-E, --errPolicy err_policy |
Policy (AWS only) to use if the job fails. The choices are:
| ||
--extraContainerOpts | Extra options passed directly to container (enclose in quotes) | ||
-f, --force | Automatically answer "yes" at confirmation prompt | ||
--gateway gateway_id | ID of gateway to connect job to. Replacing gateway_id with the word "auto" directs the OpCenter to select a gateway automatically. | ||
--gpu-count min_gpu : max_gpu | Range of number of GPUs to select from to run job | ||
--gpu-disable | Disallow the use of GPUs for this job | ||
--gpu-mem min_gpu_mem : max_gpu_mem | Range of size of GPU memory to select from to run job | ||
--gpu-name GPU_NAME | Specific GPU to use for this job, for example, m60 or h100 or a100, and so on. | ||
--gpu-vendor GPU_VENDOR | Allowed GPU vendor to use for this job, for example, nvidia or amd. | ||
-i, --image image_name | image_URI | Image name or image URI to pull container image for job | ||
--imageVolSize image_vol_size | Size of volume (in GB) to act as root volume for image (default: 6) | ||
--imageVolType image_vol_type | (AWS only) Type of EBS volume to use for image volume, for example, gp2 or gp3. | ||
-t, --instType instance_type |
VM instance type, for example, c5.4xlarge for AWS. Overrides --cpu or --mem options.
| ||
-j, --job job_script | Job script to run workload. Use format: path to local file, s3 file, OSS file or https | http file. | ||
-k, --keepDump | Save dump log during job migration | ||
--mem min_mem:max_mem | Range of memory size (in GB) to select from to run job (can omit max_mem) | ||
--metricsInterval metrics_int | Time in seconds between queries to obtain container metrics (default: 10s) | ||
-M, --migratePolicy [migrate_policy] |
Policy to determine auto-migration behavior. Format is [option1=value1,option2=value2...] . The available options with their default values are (if no units are attached, the value is a percentage):
| ||
--miTag [tag1:value1 tag2:value2...] | Tag(s) to select virtual machine image. Use option multiple times to submit multiple tag,value pairs. | ||
-n, --name job_name | Name to associate with job | ||
--noPublicIP |
No public IP address assigned to container host. Ensure that host can reach AWS services. AliCloud users must include --endpoint option in their job scripts.
| ||
-o, --output /path/to/dir |
Folder to save stdout and stderr as files with names: stdout.autosave.$jobid and stderr.autosave.$jobid. Options for path are:
| ||
--outputFlag | If included (set to true), job status is included in job output folder. | ||
-P, --publish host_port:container_port |
Rule for publishing container port to container host port, for example, 8080:80 . Include option multiple times to publish multiple ports.
| ||
--rootVolSize root_vol_size | Root volume size in GB to load base OS (default: 40) | ||
--securityGroup sec_group | AWS security group (or tag in GCP) added to VM instance for this job. Use option multiple times to apply multiple security groups. | ||
--shmSize shm_size |
Size of /dev/shm in format nu where n is a number and u is b, k, m, or g for bytes, KiB, MiB or GiB, respectively (default: 64m)
| ||
--snapLocation local | s3://bucketname |
Location to save snapshot image and metadata. Choices are:
| ||
--storage id | name:/container_mnt_pt/path/to/dir | Registered storage service included with job. Specify storage service by name or ID. Specify path to where storage service is mounted on container. | ||
-I, --snapshotInterval snap_int | Time between periodic snapshots in format hhmmss where h, m, and s are numbers and h, m, and s are hours, minutes, and seconds, respectively. Default is 0 (disable). Minimum is 10m. | ||
-s, --subnet subnet_ID | AWS subnet ID in which to execute the job. Default is that the OpCenter automatically selects the subnet. | ||
--tag tag_name | Tag to select image version for job. Default is "latest" or only tag. | ||
--targetPort port_num | Port to connect job to on gateway. Use option multiple times to connect multiple ports. | ||
-T, --template template_name |
Template in format name:tag to use for submitting this job. Default tag is only tag or "latest".
| ||
-l, --timeLimit max_time | Maximum time that a job is allowed to run. Use format hhmmss where h, m, and s are numbers and h, m, and s are hours, minutes, and seconds, respectively. Default is unlimited (use 0). | ||
-V, --vmPolicy [vm_policy] |
VM creation policy. Format is [key1=value1,key2=value2...] . The allowed keys (and values) are:
| ||
--withRoot | Run job with root privileges | ||
-z, --zone availability_zone | Availability zone in which to execute job |
Examples
$ float submit -i tidyverse -j run_genericr.sh --cpu 4 --mem 8 --dataVolume [size=10]:/data
id: pw6nupnmbej0qedlvz3dx
name: tidyverse
user: admin
imageID: docker.io/rocker/tidyverse:latest
status: Submitted
submitTime: "2023-12-13T19:08:27Z"
duration: 0s
queueTime: 0s
cost: 0.0000 USD
inputArgs: -i tidyverse -c 2 -m 4 --dataVolume [size=10]:/data
cpu: 2
memGB: 4
vmPolicy:
policy: spotFirst
retryLimit: 3
optimize: true
retryInterval: 10m0s
migratePolicy:
evadeOOM: true
stepAuto: true
cpu:
upperBoundRatio: 90
lowerBoundRatio: 5
upperBoundDuration: 2m0s
lowerBoundDuration: 5m0s
step: 50
disable: true
mem:
upperBoundRatio: 90
lowerBoundRatio: 5
upperBoundDuration: 2m0s
lowerBoundDuration: 5m0s
step: 50
disable: true
actualEmission: 0.0000 g
baselineEmission: 0.0000 g
$ float submit -i docker.io/centos -j helloworld.sh -c 2 -m 4 --dataVolume [size=10]:/data --name hw -o nfs://172.31.81.17/mnt/memverge/shared
Log in to NFS server and check that output files are populated.
$ ls -l /mnt/memverge/shared/*
-rw-r--r-- 1 5001 5001 0 Feb 15 20:02 /mnt/memverge/shared/hw.08ogctot3ufqnyf9k3wg8.stderr.autosave
-rw-r--r-- 1 5001 5001 116 Feb 15 20:02 /mnt/memverge/shared/hw.08ogctot3ufqnyf9k3wg8.stdout.autosave
float suspend (hibernate)
Use the float suspend
(or hibernate
) command to temporarily suspend a job. In-memory state and relevant files are saved so that the job can resume executing at a later time. The command has no subcommands. Use the command with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
Suspend a running job | --cold | With this option, storage resources are reclaimed, which may save cost, but it slows the suspend operation (and the associated resume operation). Use if suspending for extended period. | |
-f, --force | Automatically answer "yes" to confirmation prompts | ||
-j, --job job_id | ID of job to suspend |
float template
Use the float template
command to use a template or to display information about templates. Use subcommand with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
delete , rm , remove
| Delete a template | -f | Force deletion of template |
-T, --template name:tag | Template name and tag. If tag omitted, default is "latest." | ||
deploy
| Submit a job using a template | -c, --cpu min_cpu:max_cpu | Range of number of virtual CPUs to select from to run job (can omit max_cpu, in which case max_cpu is set to twice the value of min_cpu). |
--gateway gw_id | ID of gateway to connect job to (if applicable) | ||
-t, --instType instance_type | VM instance type (e.g., c5.xlarge for AWS). If used, this option overrides -c and -m options. | ||
-m, --mem min_mem:max_mem | Range of memory size (in GB) to select from to run job (can omit max_mem , in which case max_mem is set to twice the value of min_mem). | ||
--noPublicIP | No public IP address assigned to container host. Make sure host is reachable from within the VPC. | ||
--securityGroup sec_group | AWS security group (or tag in GCP) applied to VM instance for this job. Include option multiple times to apply multiple security groups (or tags). | ||
--targetPort port_number |
Port used to connect to job, for example, 8787 for RStudio. Include --targetPort multiple times to connect multiple ports from a single server to the gateway
| ||
-T, --template name:tag |
Template (in format name:tag ) to use for submitting this job. Default tag is only tag or "latest".
| ||
info , show
| Display information about a template | -T, --template name:tag |
Template (in format name:tag ) to query. Default tag is only tag or "latest".
|
list , ls
| Display available templates. Templates labeled "official" are from the MemVerge template repository. | ||
save
| Save submit string from an earlier job as a template | -f, --force | Force template save |
-j,--job job_id | ID of job whose submit string is saved as template | ||
--overwrite | Overwrite existing template | ||
-T, --template name:tag | Name and tag to save template as | ||
sync
| Sync OpCenter template library with the MemVerge template repository |
float top
Use the float top
command to show a sorted list of information (including CPU and memory utilization) about current jobs. The display is updated at regular intervals. Enter q
to stop the display. The command has no subcommands. Use the command with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
Show utilization and other information about running jobs. Display updates continually. Enter q to exit.
| --interval top_int | Time in seconds (3 or greater) between queries to obtain container metrics (default: 3s). Format is *n*s where *n* is integer. | |
-j, --job_Id job_id | Job to query |
float user
Use the float user
command to manage OpCenter users. Use a subcommand with the -h
flag to list the options.
Subcommands | Usage | Option | Option Definition |
add , create
| Add a new user | new_username | Username for new user |
--create | Create group if the group does not exist (default: false) | ||
--email email_address | Email address for new user | ||
-gid gid | Group ID to use for group. If value not provided, gid automatically assigned. | ||
-g, --groups group_name | Group associated with new user (default: no group). Use option multiple times to include multiple groups. | ||
--ldap | Associate username with LDAP directory (default: false) | ||
--passwd password | Password for new user (default: "memverge") | ||
--quota-policy policy_name | Name of quota (SurfZone) policy to associate with this user | ||
--uid uint32 | User Identifier (uid) for new user. For regular user accounts, uid normally starts at 1000. | ||
delete , remove , rm
| Delete a user | user_name | user_id | Username or uid of user to delete |
-f, --force | Remove all active tokens belonging to user and delete user immediately | ||
disable
| Disable a user's account without deleting it | user_name | user_id | Username or id of user to query |
enable
| Enable a user's account | user_name | user_id | Username or id of user to query |
info , show
| Display information about user | user_name | user_id | Username or id of user to query |
list , ls
| List groups that current user belongs to | ||
passwd
| Reset user's password | user_name | user_id | Username or id of user to update |
--passwd new_password | New password for user | ||
update
| Update information associated with a user | user_name | user_id | Username or id of user to update |
--email email_address | New email address for user | ||
-gid gid | New group ID to use for group. | ||
-g, --group group_name | New group associated with user | ||
--name user_name | New username for user | ||
--passwd password | New password for user | ||
--quota-policy policy_name | New quota (SurfZone) policy to associate with this user | ||
--uid uint32 | New uid for user |
Examples
$ float user add testcase2 --passwd secret123 --groups crops
username: testcase2
uid: 5007
gid: 5007
role: normal
group: crops
email: ""
type: builtin
enabled: true
ownGroup: ""
$ float group update crops --add testcase2 --admin
$ float user info testcase2
username: testcase2
uid: 5007
gid: 5007
role: normal
group: crops
email: ""
type: builtin
enabled: true
ownGroup: crops
$ float user update testcase2 --group default
username: testcase2
uid: 5007
gid: 5007
role: normal
group: default,crops
email: ""
type: builtin
enabled: true
ownGroup: crops
$ float user info user-five
username: user-five
uid: 2012
gid: 2010
role: normal
group: group-five
email: ""
type: ldap
enabled: true
ownGroup: ""
float version
Use the float version
command to display the version of the float
CLI client and the version of the OpCenter it is connected to. The command has no subcommands. Use the command with the -h
flag to list the options.
Example