API Documentation
GET /api/v1/log/config
Description
Retrieve the configuration of the server log
Sample output
PUT /api/v1/log/config
Description
Update the configuration of the server log
Request Parameters
The request body should contain the properties to change, e.g.
Sample output
The same as the GET request.
GET /api/v1/config
Description
Retrieve the configuration of the server and application
Sample output
{
"addr": "https://0.0.0.0:8081",
"id": "f3c3be48-0f71-4a0c-b4e7-8fd36f37061c",
"staticFolder": "mmabWeb",
"security": {
"certFile": "/home/ec2-user/.memverge/mmab/conf/server.crt",
"keyFile": "/home/ec2-user/.memverge/mmab/conf/server.pem",
"cognito": {
"enabled": false,
"userPoolID": "",
"identityPoolID": "",
"clientID": "",
"adminGroups": [
"admin"
]
}
},
"ckpt": {
"ckptMode": "iterative",
"ckptImagePath": "/mmc-checkpoint",
"ckptInterval": "1h0m0s",
"ckptFiles": [],
"IRMapScanPaths": [],
"ckptOnSigTerm": false,
"diagnosisMode": true,
"rootFSDiff": false,
"cloudWatchMode": false,
"tcpClose": false
},
"node": {
"heartbeat": "30s",
"ttl": "5m0s",
"maxPerLogSizeMB": 2,
"maxNodeLogTotalMB": 1024,
"cleanLogInterval": "12h0m0s"
},
"job": {
"ebsPerJob": true,
"customTags": {
"owner": "cedric",
"team": "engineer"
},
"ebsMountPath": "/mnt/mmab",
"diskType": "gp3",
"diskSizeGB": 100,
"storCleanIntvl": "24h0m0s",
"retentionPolicy": "time",
"retentionInterval": "1h0m0s",
"successTTL": "72h0m0s",
"failureTTL": "168h0m0s"
}
}
PUT /api/v1/configKV
Change specific configuration value with body:
You can specify multiple keys and values in the same map.
Use the full path of the configuration key such as "node.ttl"
Use the string version of the value, e.g.
* "5m"
for 5 minutes.
* "true"
for true
* "500"
for 500
When setting string array values, use ,
to separate them, e.g.
When setting string maps, use :
to separate key and value. use ,
to
separate pairs, e.g.
Sample output
The same as the GET request
Property description:
addr
- address of the serverid
- should not be changedstaticFolder
- the location of the web frontend folder, should not be changedsecurity
- security settingscertFile
- location of the certificate filekeyFile
- location of the private key filecognito
- cognito related settingsenabled
- enable cognito authentication or notuserPoolID
- user pool ID in CognitoidentityPoolID
- the identity pool ID in CognitoclientID
- ID of the client for this application, must be a single page application(SPA)adminGroups
- users in these groups are treated as administrators
ckpt
- checkpoint settingsckptMode
- checkpoint mode, can beiterative
ornone
ckptImagePath
- path of the folder to store checkpoint imageckptInterval
- interval between checkpointsckptFiles
- extra files to copy during checkpointingIRMapScanPaths
- paths to scan for IR maprootFSDiff
- whether to include the root filesystem in the checkpointtcpClose
- whether to close all TCP connectionsnode
- worker node settingsheartbeat
- interval between heartbeatsttl
- time to live for a nodejob
- per job settingsebsPerJob
- whether to create a new EBS volume for each jobebsMountPath
- the mount path of the EBS volumediskType
- the type of the EBS volumediskSizeGB
- the size of the EBS volumediskThroughputMB
- the throughput of the EBS volume in MBretentionPolicy
- the job retention policy, can betime
ormemory
(not supported yet)retentionInterval
- the retention check intervalsuccessTTL
- time to live for success job, default 3 daysfailureTTL
- time to live for failure job, default 7 days
GET /api/v1/node
Description
List the running worker nodes
Sample Output
[
{
"id": "0e0c7b08-32cc-4737-a2ea-9e2a2bfc7fd1",
"ips": [
"172.31.1.234"
],
"hostName": "ip-172-31-1-234.us-west-1.compute.internal",
"cloud": "aws",
"arch": "x86_64",
"cores": 2,
"cpuModel": "Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz",
"cpuVendor": "GenuineIntel",
"memoryInMB": 7638,
"instance": {
"zone": "us-west-1a",
"instanceId": "i-044f3323bfd228fd6",
"instanceType": "m6i.large",
"region": "us-west-1",
"createTime": "2025-03-11T23:05:22Z",
"payType": "Spot"
},
"lastHeartbeat": "2025-03-11T23:12:14.07346093Z"
}
]
GET /api/v1/nodes/nodeID/files
Description
List available log files of the node. nodeID
is the id
field we got
from the list node response.
Sample Output
[
"/nodes/<nodeID>/var/log/cloud-init-output.log",
"/nodes/<nodeID>/var/log/memverge/mmrunc.log",
"/nodes/<nodeID>/var/log/memverge/pagent.log"
]
GET /nodes/nodeID/filePath
Description
Retrieve the content of the log of the node. Use the path you retrieve from the node files request. e.g.
GET /nodes/<nodeID>/var/log/memverge/mmrunc.log
Output
Content of the log file
GET /mmab.log
Description
Retrieve the latest mmab server log.
Note that if you change the log file name in the log config, the
path will also be changed to /<log-name>.log
Output
A text that contains the content of the log
GET /mmab-access.log
Description
Retrieve the access log of the mmab server.
Note that if you change the log file name in the log config, the
path will also be changed to /<log-name>-access.log
GET /api/v1/metric
Description
Retrieve the list of all metrics available on the system.
Sample output
[
{
"name": "Total runtime of jobs",
"id": "metricDef-runtime.system-total",
"definition": {
"id": "metricDef-runtime",
"description": "Total runtime of jobs",
"labels": [
"duration"
]
},
"object": {
"id": "system-total",
"type": "system-total",
"name": "system-total"
},
"levels": [
{
"interval": "1m0s",
"retention": "168h0m0s"
},
{
"interval": "24h0m0s",
"retention": "18000h0m0s"
},
{
"interval": "30m0s",
"retention": "2160h0m0s"
}
]
}
]
GET /api/v1/metricValue/ObjectID/MetricDefinitionID
Description
Retrieve the metric values of a specific metric
Query Parameters
Parameter | Type | Required | Description |
---|---|---|---|
interval |
string |
No | The interval of the metric values. |
start |
time.Time |
No | The start of the time range for metrics. |
end |
time.Time |
No | The end of the time range for metrics. |
Sample Request:
Sample Output
{
"id": "metricDef-volumeAttachTime.system-total",
"points": [
{
"time": "2025-03-13T04:51:00Z",
"value": 6.741566292
},
{
"time": "2025-03-13T05:40:00Z",
"value": 6.554704421
}
],
"metaData": {
}
}
GET /api/v1/metrics/summary
Description
Retrieves a summary of metrics within a specified time range.
Request Parameters
Query Parameters
Parameter | Type | Required | Description |
---|---|---|---|
start |
time.Time |
No | The start of the time range for metrics. |
end |
time.Time |
No | The end of the time range for metrics. |
Example Request
Response Format
Response Body
The response is a JSON object containing a mapping of metric definitions to their corresponding summary items.
{
"items": {
"jobSubmitted": [
{
"id": "jobSubmitted",
"point": {
"time": "2025-01-01T12:00:00Z",
"value": 150.0
},
"metaData": {
"queueName": "queue1"
}
}
],
"runtime": [
{
"id": "runtime",
"point": {
"time": "2025-01-02T12:00:00Z",
"value": 300.5
},
"metaData": {
"queueName": "queue1"
}
}
],
"spotProtection": [
{
"id": "spotProtection",
"point": {
"time": "2025-01-02T12:00:00Z",
"value": 3
},
"metaData": {
"queueName": "queue1"
}
}
],
"timeSaved": [
{
"id": "timeSaved",
"point": {
"time": "2025-01-02T12:00:00Z",
"value": 120.5
},
"metaData": {
"queueName": "queue1"
}
}
]
}
}
GET /api/v1/job
Description
List all the job
Sample Output
[
{
"id": "8252d813-e5fb-4dad-b397-cb518fd0fc41",
"queueName": "jacky-test",
"createdAt": "2025-05-15T05:29:06.354967467Z",
"updatedAt": "2025-05-15T06:35:48.011660015Z",
"status": "Succeeded",
"nodeOid": "i-03d7eb4de55c4355f",
"containerId": "6fc76e79d59123f6f35a7c9d7507885664d6983a64ebe231fa1391505a520c40",
"spotProtCount": 1,
"batchJobIds": [
"8252d813-e5fb-4dad-b397-cb518fd0fc41"
],
"volumeIds": [
"vol-022893f19655019e0"
],
"events": [
{
"timestamp": "2025-05-15T05:29:06.354967467Z",
"eventType": "Job-Creating",
"nodeOid": "i-0bf682f18f3cb557e",
"containerId": "9098f44bfd84597ce82f13974ab8400b894a3595fdb51ec0ff5c835e4bf1fedf",
"batchJobId": "8252d813-e5fb-4dad-b397-cb518fd0fc41"
},
{
"timestamp": "2025-05-15T05:29:12.49788509Z",
"eventType": "Volume-Created",
"volumnId": "vol-022893f19655019e0"
},
{
"timestamp": "2025-05-15T05:29:19.49788509Z",
"eventType": "Volume-Attached",
"volumnId": "vol-022893f19655019e0"
},
{
"timestamp": "2025-05-15T05:29:20.49788509Z",
"eventType": "Job-Created"
},
{
"timestamp": "2025-05-15T05:29:20.675862549Z",
"eventType": "Job-Running"
},
{
"timestamp": "2025-05-15T05:55:06.194790675Z",
"eventType": "Job-Checkpointing"
},
{
"timestamp": "2025-05-15T05:55:06.894972615Z",
"eventType": "Job-CheckpointSucceeded"
},
{
"timestamp": "2025-05-15T06:00:22.278832037Z",
"eventType": "Job-Restoring",
"nodeOid": "i-03d7eb4de55c4355f",
"containerId": "6fc76e79d59123f6f35a7c9d7507885664d6983a64ebe231fa1391505a520c40"
},
{
"timestamp": "2025-05-15T06:01:26.010124985Z",
"eventType": "Job-RestoreSucceeded"
},
{
"timestamp": "2025-05-15T06:01:26.083797074Z",
"eventType": "Job-Running"
},
{
"timestamp": "2025-05-15T06:35:48.011660015Z",
"eventType": "Job-Succeeded"
}
]
}
]