Skip to content

API Documentation

GET /api/v1/log/config

Description

Retrieve the configuration of the server log

Sample output

{
  "level": "info",
  "maxSizeMB": 10,
  "maxBackups": 10
}

PUT /api/v1/log/config

Description

Update the configuration of the server log

Request Parameters

The request body should contain the properties to change, e.g.

{
  "level": "debug"
}

Sample output

The same as the GET request.

GET /api/v1/config

Description

Retrieve the configuration of the server and application

Sample output

{
  "addr": "https://0.0.0.0:8081",
  "id": "f3c3be48-0f71-4a0c-b4e7-8fd36f37061c",
  "staticFolder": "mmabWeb",
  "security": {
    "certFile": "/home/ec2-user/.memverge/mmab/conf/server.crt",
    "keyFile": "/home/ec2-user/.memverge/mmab/conf/server.pem",
    "cognito": {
      "enabled": false,
      "userPoolID": "",
      "identityPoolID": "",
      "clientID": "",
      "adminGroups": [
        "admin"
      ]
    }
  },
  "ckpt": {
    "ckptMode": "iterative",
    "ckptImagePath": "/mmc-checkpoint",
    "ckptInterval": "1h0m0s",
    "ckptFiles": [],
    "IRMapScanPaths": [],
    "ckptOnSigTerm": false,
    "diagnosisMode": true,
    "rootFSDiff": false,
    "cloudWatchMode": false,
    "tcpClose": false
  },
  "node": {
    "heartbeat": "30s",
    "ttl": "5m0s",
    "maxPerLogSizeMB": 2,
    "maxNodeLogTotalMB": 1024,
    "cleanLogInterval": "12h0m0s"
  },
  "job": {
    "ebsPerJob": true,
    "customTags": {
      "owner": "cedric",
      "team": "engineer"
    },
    "ebsMountPath": "/mnt/mmab",
    "diskType": "gp3",
    "diskSizeGB": 100,
    "storCleanIntvl": "24h0m0s",
    "retentionPolicy": "time",
    "retentionInterval": "1h0m0s",
    "successTTL": "72h0m0s",
    "failureTTL": "168h0m0s"
  }
}

PUT /api/v1/configKV

Change specific configuration value with body:

{"kvMap": {"<key>":"<value>"}}

You can specify multiple keys and values in the same map.

Use the full path of the configuration key such as "node.ttl" Use the string version of the value, e.g. * "5m" for 5 minutes. * "true" for true * "500" for 500

When setting string array values, use , to separate them, e.g.

{"kvMap": {"security.cognito.adminGroups":"admin,root"}}

When setting string maps, use : to separate key and value. use , to separate pairs, e.g.

{"kvMap": {"job.customTags": "owner:name,team:engineer"}}

Sample output

The same as the GET request

Property description:

  • addr - address of the server
  • id - should not be changed
  • staticFolder - the location of the web frontend folder, should not be changed
  • security - security settings
  • certFile - location of the certificate file
  • keyFile - location of the private key file
  • cognito - cognito related settings
    • enabled - enable cognito authentication or not
    • userPoolID - user pool ID in Cognito
    • identityPoolID - the identity pool ID in Cognito
    • clientID - ID of the client for this application, must be a single page application(SPA)
    • adminGroups - users in these groups are treated as administrators
  • ckpt - checkpoint settings
  • ckptMode - checkpoint mode, can be iterative or none
  • ckptImagePath - path of the folder to store checkpoint image
  • ckptInterval - interval between checkpoints
  • ckptFiles - extra files to copy during checkpointing
  • IRMapScanPaths - paths to scan for IR map
  • rootFSDiff - whether to include the root filesystem in the checkpoint
  • tcpClose - whether to close all TCP connections
  • node - worker node settings
  • heartbeat - interval between heartbeats
  • ttl - time to live for a node
  • job - per job settings
  • ebsPerJob - whether to create a new EBS volume for each job
  • ebsMountPath - the mount path of the EBS volume
  • diskType - the type of the EBS volume
  • diskSizeGB - the size of the EBS volume
  • diskThroughputMB - the throughput of the EBS volume in MB
  • retentionPolicy - the job retention policy, can be time or memory (not supported yet)
  • retentionInterval - the retention check interval
  • successTTL - time to live for success job, default 3 days
  • failureTTL - time to live for failure job, default 7 days

GET /api/v1/node

Description

List the running worker nodes

Sample Output

[
  {
    "id": "0e0c7b08-32cc-4737-a2ea-9e2a2bfc7fd1",
    "ips": [
      "172.31.1.234"
    ],
    "hostName": "ip-172-31-1-234.us-west-1.compute.internal",
    "cloud": "aws",
    "arch": "x86_64",
    "cores": 2,
    "cpuModel": "Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz",
    "cpuVendor": "GenuineIntel",
    "memoryInMB": 7638,
    "instance": {
      "zone": "us-west-1a",
      "instanceId": "i-044f3323bfd228fd6",
      "instanceType": "m6i.large",
      "region": "us-west-1",
      "createTime": "2025-03-11T23:05:22Z",
      "payType": "Spot"
    },
    "lastHeartbeat": "2025-03-11T23:12:14.07346093Z"
  }
]

GET /api/v1/nodes/nodeID/files

Description

List available log files of the node. nodeID is the id field we got from the list node response.

Sample Output

[
  "/nodes/<nodeID>/var/log/cloud-init-output.log",
  "/nodes/<nodeID>/var/log/memverge/mmrunc.log",
  "/nodes/<nodeID>/var/log/memverge/pagent.log"
]

GET /nodes/nodeID/filePath

Description

Retrieve the content of the log of the node. Use the path you retrieve from the node files request. e.g.

GET /nodes/<nodeID>/var/log/memverge/mmrunc.log

Output

Content of the log file

GET /mmab.log

Description

Retrieve the latest mmab server log. Note that if you change the log file name in the log config, the path will also be changed to /<log-name>.log

Output

A text that contains the content of the log

GET /mmab-access.log

Description

Retrieve the access log of the mmab server. Note that if you change the log file name in the log config, the path will also be changed to /<log-name>-access.log

GET /api/v1/metric

Description

Retrieve the list of all metrics available on the system.

Sample output

[
  {
    "name": "Total runtime of jobs",
    "id": "metricDef-runtime.system-total",
    "definition": {
      "id": "metricDef-runtime",
      "description": "Total runtime of jobs",
      "labels": [
        "duration"
      ]
    },
    "object": {
      "id": "system-total",
      "type": "system-total",
      "name": "system-total"
    },
    "levels": [
      {
        "interval": "1m0s",
        "retention": "168h0m0s"
      },
      {
        "interval": "24h0m0s",
        "retention": "18000h0m0s"
      },
      {
        "interval": "30m0s",
        "retention": "2160h0m0s"
      }
    ]
  }
]

GET /api/v1/metricValue/ObjectID/MetricDefinitionID

Description

Retrieve the metric values of a specific metric

Query Parameters

Parameter Type Required Description
interval string No The interval of the metric values.
start time.Time No The start of the time range for metrics.
end time.Time No The end of the time range for metrics.

Sample Request:

/api/v1/metricValue/system-total/metricDef-volumeAttachTime?interval=1m&end=2025-03-15T00:00:00Z

Sample Output

{
  "id": "metricDef-volumeAttachTime.system-total",
  "points": [
    {
      "time": "2025-03-13T04:51:00Z",
      "value": 6.741566292
    },
    {
      "time": "2025-03-13T05:40:00Z",
      "value": 6.554704421
    }
  ],
  "metaData": {

  }
}

GET /api/v1/metrics/summary

Description

Retrieves a summary of metrics within a specified time range.


Request Parameters

Query Parameters

Parameter Type Required Description
start time.Time No The start of the time range for metrics.
end time.Time No The end of the time range for metrics.

Example Request

GET /api/v1/metrics/summary?start=2025-01-01T00:00:00Z&end=2025-01-07T23:59:59Z

Response Format

Response Body

The response is a JSON object containing a mapping of metric definitions to their corresponding summary items.

{
  "items": {
    "jobSubmitted": [
      {
        "id": "jobSubmitted",
        "point": {
          "time": "2025-01-01T12:00:00Z",
          "value": 150.0
        },
        "metaData": {
          "queueName": "queue1"
        }
      }
    ],
    "runtime": [
      {
        "id": "runtime",
        "point": {
          "time": "2025-01-02T12:00:00Z",
          "value": 300.5
        },
        "metaData": {
          "queueName": "queue1"
        }
      }
    ], 
    "spotProtection": [
      {
        "id": "spotProtection",
        "point": {
          "time": "2025-01-02T12:00:00Z",
          "value": 3
        },
        "metaData": {
          "queueName": "queue1"
        }
      }
    ],
    "timeSaved": [
      {
        "id": "timeSaved",
        "point": {
          "time": "2025-01-02T12:00:00Z",
          "value": 120.5
        },
        "metaData": {
          "queueName": "queue1"
        }
      }
    ]
  }
}

GET /api/v1/job

Description

List all the job

Sample Output

[
  {
    "id": "8252d813-e5fb-4dad-b397-cb518fd0fc41",
    "queueName": "jacky-test",
    "createdAt": "2025-05-15T05:29:06.354967467Z",
    "updatedAt": "2025-05-15T06:35:48.011660015Z",
    "status": "Succeeded",
    "nodeOid": "i-03d7eb4de55c4355f",
    "containerId": "6fc76e79d59123f6f35a7c9d7507885664d6983a64ebe231fa1391505a520c40",
    "spotProtCount": 1,
    "batchJobIds": [
      "8252d813-e5fb-4dad-b397-cb518fd0fc41"
    ],
    "volumeIds": [
      "vol-022893f19655019e0"
    ],
    "events": [
      {
        "timestamp": "2025-05-15T05:29:06.354967467Z",
        "eventType": "Job-Creating",
        "nodeOid": "i-0bf682f18f3cb557e",
        "containerId": "9098f44bfd84597ce82f13974ab8400b894a3595fdb51ec0ff5c835e4bf1fedf",
        "batchJobId": "8252d813-e5fb-4dad-b397-cb518fd0fc41"
      },
      {
        "timestamp": "2025-05-15T05:29:12.49788509Z",
        "eventType": "Volume-Created",
        "volumnId": "vol-022893f19655019e0"
      },
      {
        "timestamp": "2025-05-15T05:29:19.49788509Z",
        "eventType": "Volume-Attached",
        "volumnId": "vol-022893f19655019e0"
      },
      {
        "timestamp": "2025-05-15T05:29:20.49788509Z",
        "eventType": "Job-Created"
      },
      {
        "timestamp": "2025-05-15T05:29:20.675862549Z",
        "eventType": "Job-Running"
      },
      {
        "timestamp": "2025-05-15T05:55:06.194790675Z",
        "eventType": "Job-Checkpointing"
      },
      {
        "timestamp": "2025-05-15T05:55:06.894972615Z",
        "eventType": "Job-CheckpointSucceeded"
      },
      {
        "timestamp": "2025-05-15T06:00:22.278832037Z",
        "eventType": "Job-Restoring",
        "nodeOid": "i-03d7eb4de55c4355f",
        "containerId": "6fc76e79d59123f6f35a7c9d7507885664d6983a64ebe231fa1391505a520c40"
      },
      {
        "timestamp": "2025-05-15T06:01:26.010124985Z",
        "eventType": "Job-RestoreSucceeded"
      },
      {
        "timestamp": "2025-05-15T06:01:26.083797074Z",
        "eventType": "Job-Running"
      },
      {
        "timestamp": "2025-05-15T06:35:48.011660015Z",
        "eventType": "Job-Succeeded"
      }
    ]
  }
]