Skip to content

WaveWatcher

WaveWatcher displays real-time resource utilization graphs on a per job basis.

Feature Description

Resource utilization (CPU, memory, storage, and network bandwidth) is not constant while a job runs. When starting a new application or a familiar application with a new dataset, users often don't know the profile of resource utilization over time. This can lead to suboptimal provisioning (over- or under-provisioning).

WaveWatcher allows users to view resource utilization in real-time and decide whether to take action proactively (manually migrate the job) or provision resources differently on subsequent job runs.

If a job runs to completion without any migration events, the user can use the built-in WaveRider simulator to analyze the effects of enabling WaveRider for this job. The simulator allows the user to vary WaveRider parameters to investigate further.

Operation

At fixed intervals (the default value is 20s but it is configurable), the OpCenter queries the container in which the job is running to retrieve a range of metrics related to resource utilization. The output is displayed as a time series that you can view using the OpCenter web interface.

Select Jobs from the left-hand panel and then click the job you want to examine. From the Job Details screen, select the WaveWatcher tab to show the graphical displays. Scroll down the page to show all the utilization graphs. The x-axis (time) can be expanded.

Examples are shown in the figures. The two icons at the top, right-hand side are used to show the WaveRider simulator and the WaveRider Insights pop-up, respectively. The WaveRider simulator icon is only shown if the job runs to completion on a single instance.

WaveWatcher Graphical Display

WaveWatcher Graphical Display

Configuration

WaveWatcher is configured by default. The OpCenter retrieves metrics from the container at fixed intervals. Change the interval (for a particular job) from the default value of 20s by submitting the job with the float submit --metricsInterval INTERVAL option where INTERVAL is in the format MmSs where M and S are the numbers of minutes and seconds between queries, respectively.

The metrics are available for download as a csv file. From the web interface, select the job and then go to the Attachments tab. Click the download icon next to the file called metrics.csv.

To download the metrics.csv file using the CLI, enter the following.

float log cat metrics.csv -j JOB_ID > metrics.csv

Replace:

JOB_ID: ID of job associated with metrics file