Skip to content

Jobs Overview

The Jobs view provides a comprehensive list of all batch jobs processed by Memory Machine Batch, allowing you to monitor their status, track details, and filter records for specific inquiries. This view is essential for detailed operational oversight and troubleshooting.

Jobs View

Let's go through each sub-section and describe what you can see and do:

Action Bar

Jobs View Action Bar

Located above the filters, this bar provides quick actions:

  • Export to CSV: Click this button to download the currently filtered and displayed job data into a .csv (Comma Separated Values) file. This allows for offline analysis in spreadsheet programs.
  • Refresh: Click to update the Jobs table with the latest available data.

Filtering and Search Options

Jobs View Filtering and Search Options

This section enables you to narrow down the list of jobs displayed in the table:

  • Select Queue: A dropdown menu that allows you to filter jobs by their assigned queue name.
  • Select Status: A dropdown menu to filter jobs based on their current status:
    • Creating: The job is in the process of being created.
    • Created: The job has been created.
    • Running: The job is currently being run.
    • Failed: The job has failed.
    • Succeeded: The job has successfully run and is now complete.
    • Restoring: The job is currently restoring from a checkpoint.
    • Restore Succeeded: The restore for the job has succeeded, the job continues to run.
    • Resture Failed: The restore for this job has failed.
    • Stopped: The job has stopped running.
    • Checkpointing: The job is currently having a checkpoint generated for it.
    • Checkpoint Succeeded: The checkpoint for the job has succeeded.
    • Checkpoint Failed: The checkpoint for the job has failed.
    • Volume Unready: Managed EBS is enabled for the job and the volume is not available when the job attempted to start.
    • Restore Volume Unready: Managed EBS is enabled for the job but the volume could not be found during restore.
  • Select Spot Protection Failures: A dropdown filter specifically for identifying jobs that experienced failures related to spot instance protection. it has two options in a drop down to choose from:
    • Checkpoint Failed: Filters to show all jobs that have a checkpoint which has failed.
    • Restore Failed: Filters to show all jobs that have a restore which has failed.
  • Created From / Created To: Date selectors that enable you to filter jobs based on their creation date within a specified range.
  • Search Jobs: An input field where you can type keywords (e.g., a Job ID, a portion of a name) to quickly find matching jobs. Click the magnifying glass icon or press Enter to initiate the search.

Jobs Table

This is the main area where job details are displayed. Each row represents a single job, and the columns provide various pieces of information:

  • ID: The unique identifier for each job.
  • Queue Name: The name of the queue to which the job belongs.
  • Status: The current operational status of the job (e.g., Running, Completed, Failed).
  • Node ID: The identifier of the compute node where the job is or was running.
  • Container ID: The identifier of the container in which the job's workload is executed.
  • Spot Protections: Indicates whether spot protection was applied to the job. On a job where spot protection was engaged, hover over the Spot Protections area to open a small pop-up screen that holds information on checkpoints and restores specific to the job:

    Jobs View Spot Protections

  • Batch Job IDs: Identifiers for underlying batch jobs, if applicable.

  • Max Disk Used: The maximum amount of disk space consumed by the job during its execution.
  • Created: The date and time when the job was initially created.
  • Updated: The date and time when the job's status or details were last updated.
  • Events: Clicking on this view icon will provide key information on the events the individual job has had. When you click on the view, you will see a report showing job status and corresponding timestamps for status change, as in the image below: Jobs View Events
  • Logs: In our last column, there is an eye icon (eye icon) and a download icon (download icon):

    • Eye Icon: When clicked, this icon presents a list of hyperlinks to all log files associated with job chosen. It looks like this: Jobs View Log Eye

    • Download Icon: When clicked, MMBatch will create a compressed tarball if all log files associated with the chosen job, and add it to the downloads file of the browser being used. This Compressed file holds all files mentioned from the Eye Icon.

Log Bundle

The Logs associated with a given job provide a view of activity within the MMBatch server at the Server, Node, and Job levels.

The table below explains each log collected, what level in the system it reports data on, and why it is useful to collect:

File Name (Format) Level What it Collects Helpful for Troubleshooting
(mmbatch server name)-access.log Server Access.log Tracks the requests sent to the mmbatch server
(mmbatch server name).log Server Server Log Collects everything that happens within the mmbatch server.
pagent.log Node Agent Log Log of activity performed by Agent between the node and the management server. Tracks the reported status of the jobs on the node.
mmrunc.log Node Container File Contains the events of the job container.
output.log Job Container Output File Consists of the last 100 lines of container output. Assists in understanding the container's status when used in conjunction with the Container File.
restore.log Job Restore Log Log of the restore activities. Provides key information regarding restore success or failure.
...dump.log Job Dump File Details activity associated with the checkpoint function. Provides key information regarding checkpoint success or failure.

Understanding Event-Driven Reporting

Event-driven reporting captures data instantly as discrete events happen (e.g., job completion, login). This differs from traditional reports that process data in batches.

Pros of Event-Driven Reporting

  • Real-time Insights: See what's happening now for quicker responses.
  • Granular Detail: Captures every single event, offering precise, detailed records.
  • High Fidelity: Reflects the exact sequence of operations as they occurred.
  • Scales for Volume: Handles massive data streams effectively.

Cons of Event-Driven Reporting

The main challenge with event-driven reporting is potential data mismatch with other parts of your application or dependent systems.

  • Data Inconsistency: Event data is immediate. Other layers might be "eventually consistent" or have processing delays, meaning their reports could temporarily lag or show different numbers.
  • Differing Logic: Other systems might aggregate data differently or apply unique transformations, leading to discrepancies in reported figures.
  • Complexity: Building and maintaining these systems can be more complex than traditional reporting.
  • Data Volume: Capturing every event can generate huge amounts of data, impacting storage and processing costs.

In short, event-driven reporting gives you powerful, real-time detail. Just be aware that its immediacy can lead to differences when comparing reports from other applications working at different layers of your processing stack.