Support and Troubleshooting

Support

Support for MMCloud software is available 24 x 7 x 365. Contact MemVerge support by sending email to support@memverge.com.

Viewing Logs using the CLI

OpCenter compiles logs of events related to its operation as well as logs that are specific to a particular job or a particular host (worker node).

To see the logs that pertain to the operation of the OpCenter, enter the following:
float log ls
+---------------------+---------+----------------------+
|      LOG NAME       |  SIZE   |   LAST UPDATE TIME   |
+---------------------+---------+----------------------+
| etcd.log            |  239504 | 2023-02-02T01:46:04Z |
| opcenter.access_log | 2811941 | 2023-02-02T02:31:43Z |
| opcenter.log        |  756620 | 2023-02-02T02:30:25Z |
| upgrade.log         |    1974 | 2023-01-26T15:40:39Z |
+---------------------+---------+----------------------+

The most useful log for troubleshooting is the opcenter.log.

To view the content of a log, enter float log cat <log_name>. Combine with linux commands to view a section at a time or to direct the output to a file, for example:
float log cat opcenter.log | more
float log cat upgrade.log > temp
To see the logs that pertain to a particular job, enter the following:
float log ls -j <job_id>
Example:
float log ls -j TQ9PUJhoY0XLtL58KcU1M 
+--------------------------+-------+----------------------+
|         LOG NAME         | SIZE  |   LAST UPDATE TIME   |
+--------------------------+-------+----------------------+
| environments             |   330 | 2023-02-02T02:21:04Z |
| job.events               |  3009 | 2023-02-02T02:27:02Z |
| metrics-a1e1970a868a.txt | 12232 | 2023-02-02T02:46:35Z |
| output                   | 25963 | 2023-02-02T02:46:40Z |
| stderr.autosave          |     0 | 2023-02-02T02:21:27Z |
| stdout.autosave          |     0 | 2023-02-02T02:21:27Z |
+--------------------------+-------+----------------------+
The job script used to submit this job redirects stderr and stdout to a file called output (that is why stderr.autosave and stdout.autosave have zero size). The log called job.events is useful for troubleshooting.
To see the logs that pertain to a particular host, enter the following:
float log ls -i <host_id>
Example:
float log ls -i i-0e40db7d105cb6793 
+-------------------+--------+----------------------+
|     LOG NAME      |  SIZE  |   LAST UPDATE TIME   |
+-------------------+--------+----------------------+
| fagent.access_log |   8494 | 2023-02-02T17:10:10Z |
| fagent.log        |  11735 | 2023-02-02T17:08:00Z |
| fagent_init.log   |    639 | 2023-02-02T17:06:34Z |
| internal_output   |   1116 | 2023-02-02T17:07:59Z |
| messages          | 154889 | 2023-02-02T17:10:01Z |
+-------------------+--------+----------------------+
The log called messages is useful for troubleshooting. The log called fagent.log records interactions between the worker node and the OpCenter.

Viewing Logs using the Web Interface

To use the OpCenter web interface, open a browser and go to the public (if you are outside the VPC) or private (if you are inside the VPC) IP address associated with the OpCenter. Enter your credentials at the login screen.

To view OpCenter logs, complete the following steps.
  • From any screen, click the OpCenter Logs icon at the top, right-hand side.
  • To download a log, click the Download icon.
  • To view a log, click the Preview icon.
To view logs associated with a particular job, complete the following steps.
  • From the left-hand panel, click Jobs.
  • On the Jobs screen, click the ID of the job whose logs you want to view.
  • Click the Attachments tab to display the available logs.
To view logs associated with the host that is running a particular job, complete the following steps.
  • From the left-hand panel, click Jobs.
  • On the Jobs screen, click the ID of the job of interest.
  • Click the Instances tab to display the current host as well as any previous hosts (these are hosts from which the job migrated).
  • Click the Logs icon next to a host ID to display the logs associated with that host.

Troubleshooting

The following table shows commonly encountered errors and how to fix them.

Table 1. Troubleshooting OpCenter Issues
Error Cause Remedy
float login returns 'Get "https://127.0.0.1/api/v1/login": dial tcp 127.0.0.1:443: connect: connection refused' Incorrect IP address for OpCenter or the OpCenter IP address has aged out of the local float cache. Check OpCenter IP address and try again with float login -a <ip_address>
float command returns "Error: Session timeout (code: 2001)" Current OpCenter session timed out Log in to OpCenter
float image add or float submit returns "Error: Authentication failed, incorrect username or password (code: 2003)" Attempt to access private repository with incorrect or missing credentials Rerun command with valid credentials to access repository
float submit returns "Error: Unsupported argument, No instance types meet combined --cpu and --mem constraints. (code: 1027)" No VM instance found that meets all requirements including price limit for Spot Instance Resubmit with higher price limit if it is Spot Instance. Else, resubmit with different memory and vCPU ranges.
float squeue shows status as "WaitingForLicense" OpCenter cannot retrieve valid license Check license status by typing float license info or by clicking on the license icon on the web interface or by logging in to the MemVerge Account Server. Obtain new license or apply an existing license.
float log cat <logfile> returns "Error: Invalid argument, No such log" Delay in writing to log file after the container starts running Wait and then retry command
float ps returns "Error: Job is not executing" Job is either initializing or it has completed Use float squeue to determine status of job
float squeue returns "No jobs" although jobs have been submitted and completed float squeue queries job history in a fixed time interval (default: last hour) Use float squeue -A to include all jobs
Job status shows "FailToComplete" and job.events log shows "Failed to determine instance params, error: No instance types meet combined --cpu and --mem constraints (code: 5143)". CSPs impose limits on services instantiated by each account. In AWS, these limits are called "service quotas" and apply to every AWS service, generally on a region by region basis. This error is often seen when the EC2 service quota is exceeded. Wait until the number of EC2 instances falls below your service quota or request an increase in your EC2 service quota.