Support and Troubleshooting

Support

Support for Memory Machine CE software is available 24 x 7 x 365. Contact MemVerge support by sending email to support@memverge.com.

Logs

OpCenter compiles logs of events related to its operation as well as logs that are specific to a particular job or a particular host (worker node).

To see the logs that pertain to the operation of the OpCenter, enter the following:
float log ls
+---------------------+---------+----------------------+
|      LOG NAME       |  SIZE   |   LAST UPDATE TIME   |
+---------------------+---------+----------------------+
| etcd.log            |  239504 | 2023-02-02T01:46:04Z |
| opcenter.access_log | 2811941 | 2023-02-02T02:31:43Z |
| opcenter.log        |  756620 | 2023-02-02T02:30:25Z |
| upgrade.log         |    1974 | 2023-01-26T15:40:39Z |
+---------------------+---------+----------------------+

The most useful log for troubleshooting is the opcenter.log.

To see the logs that pertain to a particular job, enter the following:
float log ls -j <job_id>
Example:
float log ls -j TQ9PUJhoY0XLtL58KcU1M 
+--------------------------+-------+----------------------+
|         LOG NAME         | SIZE  |   LAST UPDATE TIME   |
+--------------------------+-------+----------------------+
| environments             |   330 | 2023-02-02T02:21:04Z |
| job.events               |  3009 | 2023-02-02T02:27:02Z |
| metrics-a1e1970a868a.txt | 12232 | 2023-02-02T02:46:35Z |
| output                   | 25963 | 2023-02-02T02:46:40Z |
| stderr.autosave          |     0 | 2023-02-02T02:21:27Z |
| stdout.autosave          |     0 | 2023-02-02T02:21:27Z |
+--------------------------+-------+----------------------+
The job script used to submit this job redirects stderr and stdout to a file called output (that is why stderr.autosave and stdout.autosave have zero size). The log called job.events is useful for troubleshooting.
To see the logs that pertain to a particular host, enter the following:
float log ls -i <host_id>
Example:
float log ls -i i-0e40db7d105cb6793 
+-------------------+--------+----------------------+
|     LOG NAME      |  SIZE  |   LAST UPDATE TIME   |
+-------------------+--------+----------------------+
| fagent.access_log |   8494 | 2023-02-02T17:10:10Z |
| fagent.log        |  11735 | 2023-02-02T17:08:00Z |
| fagent_init.log   |    639 | 2023-02-02T17:06:34Z |
| internal_output   |   1116 | 2023-02-02T17:07:59Z |
| messages          | 154889 | 2023-02-02T17:10:01Z |
+-------------------+--------+----------------------+
The log called messages is useful for troubleshooting. The log called fagent.log records interactions between the worker node and the OpCenter.

Troubleshooting

The following table shows commonly encountered errors and how to fix them.

Table 1. Troubleshooting OpCenter Issues
Error Cause Remedy
float login returns 'Get "https://127.0.0.1/api/v1/login": dial tcp 127.0.0.1:443: connect: connection refused' Incorrect IP address for OpCenter or the OpCenter IP address has aged out of the local float cache. Check OpCenter IP address and try again with float login -a <ip_address>
float command returns "Error: Session timeout (code: 2001)" Current OpCenter session timed out Log in to OpCenter
float image add or float submit returns "Error: Authentication failed, incorrect username or password (code: 2003)" Attempt to access private repository with incorrect or missing credentials Rerun command with valid credentials to access repository
float submit returns "Error: Unsupported argument, No instance types meet combined --cpu and --mem constraints. (code: 1027)" No VM instance found that meets all requirements including price limit for Spot Instance Resubmit with higher price limit if it is Spot Instance. Else, resubmit with different memory and vCPU ranges.
float squeue shows status as "WaitingForLicense" OpCenter cannot retrieve valid license Check license status on MemVerge License Server. Obtain new license or activate existing license.
float log cat <logfile> returns "Error: Invalid argument, No such log" Delay in writing to log file after the container starts running Wait and then retry command
float ps returns "Error: Job is not executing" Job is either initializing or it has completed Use float squeue to determine status of job
float squeue returns "No jobs" although jobs have been submitted and completed float squeue queries job history in a fixed time interval (default: last hour) Use float squeue -A to include all jobs