Monday, 11 August 2008

Drawing graphs with graphite

Work continues a-pace ... well kinda. I've added a new reporting plugin for a new monitoring system: Graphite [see launch-pad and wiki sites]. If you've not heard, is a funky new monitoring system that does away with the traditional RRDTool and does everything in python.

There's two main components to Graphite: carbon and graphite.

Carbon is a recording daemon (in fact, a set of three daemons) that stores information efficiently on disk (using a custom format) and maintain a fast in-memory cache. Sending new metric values to the carbon agent is very simply.

Graphite is a python web front-end that uses the Django framework and the ExtJS AJAX toolkit. Graph rendering is achieved using cairo (via python's cairo bindings). It's possible to run Graphite (Django) in stand-alone mode, but I guess most people will use mod_python and apache. Although there's a simple drag-and-drop compositor, the real power comes when using the CLI interface. There, each logged-in user can create their own custom graphs (multiple can be opened concurrently). These can be arranged on the screen and the resulting view saved for later recall.

It's a bit of a faff to setup (although better with v0.9.3) and there's a few rough edges (again, better with v0.9.3). That said, it's already usable and the AJAX interface is pretty nice. It's early days, so I'm not sure where it will fit within the monitoring eco-system compared to established projects (e.g., ganglia, munin, cacti). I guess time will tell.

Because of the way Graphite (and Carbon in particular) is designed, adding the MonAMI plugin to send it data is very easy. The code is now in CVS, ready for the next release. I've included a few screen shots that show the graph compositor.

Wednesday, 30 April 2008

Trouble at Mill

With some unfortunate timing, it looks like the "Axis of Openness" webpages (SourceForge, Slashdot, Freshmeat, ...) have gone for a burton. There seems to be some networking problems with these sites, with web traffic timing out. Assuming traceroute output is valid, the problem appears soon after traffic leaves the Santa Clara location of the Savvis network [dead router(s)?]

This is a pain because we've just done the v0.10 release of MonAMI and both the website and the file download locations are hosted by SourceForge. Whilst SourceForge is down, no one can download MonAMI!

If you're keen to try MonAMI, in the mean-time, you can download the RPMs from the (rough and ready) dev. site:

The above site is generously hosted by the ScotGrid project [their blog].

Thanks guys!

Monday, 28 April 2008

Version 0.10 has left the building

After many months of work, v0.10 has been tagged and source-/binary-RPMs and tar-balls are available.

This is a major release with many enhancements to MonAMI. Perhaps the two improvements that top the list are:
  • adaptive monitoring,
  • writing monitoring data into a database.
Some other note-worthy changes include:
  • New plugins:
  • Updates to existing plugins:
    • maui
      • support for QoS (a maui term) monitoring added,
      • added a timeout option (maui can take ages to reply sometimes).
    • Torque
      • better error handling (the library has a somewhat amusing way reporting problems),
      • enforce thread-safety (some torque library API isn't),
    • Ganglia
      • fixed gmond.conf parser,
      • transmission now less bursty (reduces likelihood of overloading gmond)
      • unicast support: sending data to just the one gmond, support for multiple gmonds (for failover in unicast deployments) pencilled in for the next release.
    • null
      • adjustable time delay (useful when playing with adaptive monitoring)
    • MySQL
      • added per-Table monitoring statistics (also can now act as a reporting plugin).
  • Other changes:
    • Added the "MonAMI by Example" tutorial (has been available from the web for a while)
    • MonAMI-core will use the recent history of a monitoring target's response time when estimating how long it future requests will take. This uses quite a nice algorithm, which responds quickly to a service suddenly taking a longer time to respond, but isn't fooled if a service responds very quickly.
    • Added per-Thread CPU profiling. This is so, if someone says "MonAMI is consuming vast amounts of CPU" we can figure out why.
    • Spring-clean of user-guide and tutorial: lots of effort has gone into this, mostly in ensuring a consistency in the typesetting. The document should look a lot nicer now and hopefully be easier to read.
You can download MonAMI from the SourceForge page:
or configure your YUM to download it automatically. Details are available here: