Wednesday 15 August 2007

New release: v0.9

Yes, finally release v0.9.

This new version is the first to feature Torque and Maui monitoring plugins and includes a better Ganglia plugin.

At the moment, the Torque plugin is limited to monitoring the number of jobs in each queue (and queue-group) and the efficiency (CPU time / wallclock time) in 5 bands (0%-20%, 20%-40%, etc).
I'm hoping to add support for asynchronous monitoring by watching the accounting log files. MonAMI already has a generic file-watcher component, so this should be fairly straight forward.

The Maui plugin is quite primitive, compared to what it could monitor. At the moment its limited to providing just the fair-share information (still very useful!), but I'd guess there's more information that could be gathered.

The ganglia plugin is now looking pretty nice. It has a dmax value (so ganglia will purge old metrics automatically) which is based on how long (in practice) it took to gather the data. So, if the computer slows down substantially, it'll carry on working.

The plugin also has a number of work-arounds for problems with Ganglia. For example, when MonAMI is monitoring torque and maui, it can provide hundreds of metrics. If gmond (the ganglia daemon) doesn't consume them fast enough, some will be lost, so MonAMI pauses whilst sending the metrics, allowing gmond to prevent metric loss.

No comments: