Monday, 11 August 2008

Drawing graphs with graphite

Work continues a-pace ... well kinda. I've added a new reporting plugin for a new monitoring system: Graphite [see launch-pad and wiki sites]. If you've not heard, is a funky new monitoring system that does away with the traditional RRDTool and does everything in python.

There's two main components to Graphite: carbon and graphite.

Carbon is a recording daemon (in fact, a set of three daemons) that stores information efficiently on disk (using a custom format) and maintain a fast in-memory cache. Sending new metric values to the carbon agent is very simply.

Graphite is a python web front-end that uses the Django framework and the ExtJS AJAX toolkit. Graph rendering is achieved using cairo (via python's cairo bindings). It's possible to run Graphite (Django) in stand-alone mode, but I guess most people will use mod_python and apache. Although there's a simple drag-and-drop compositor, the real power comes when using the CLI interface. There, each logged-in user can create their own custom graphs (multiple can be opened concurrently). These can be arranged on the screen and the resulting view saved for later recall.

It's a bit of a faff to setup (although better with v0.9.3) and there's a few rough edges (again, better with v0.9.3). That said, it's already usable and the AJAX interface is pretty nice. It's early days, so I'm not sure where it will fit within the monitoring eco-system compared to established projects (e.g., ganglia, munin, cacti). I guess time will tell.

Because of the way Graphite (and Carbon in particular) is designed, adding the MonAMI plugin to send it data is very easy. The code is now in CVS, ready for the next release. I've included a few screen shots that show the graph compositor.

Wednesday, 30 April 2008

Trouble at Mill

With some unfortunate timing, it looks like the "Axis of Openness" webpages (SourceForge, Slashdot, Freshmeat, ...) have gone for a burton. There seems to be some networking problems with these sites, with web traffic timing out. Assuming traceroute output is valid, the problem appears soon after traffic leaves the Santa Clara location of the Savvis network [dead router(s)?]

This is a pain because we've just done the v0.10 release of MonAMI and both the website and the file download locations are hosted by SourceForge. Whilst SourceForge is down, no one can download MonAMI!

If you're keen to try MonAMI, in the mean-time, you can download the RPMs from the (rough and ready) dev. site:
http://monami.scotgrid.ac.uk/

The above site is generously hosted by the ScotGrid project [their blog].

Thanks guys!

Monday, 28 April 2008

Version 0.10 has left the building

After many months of work, v0.10 has been tagged and source-/binary-RPMs and tar-balls are available.

This is a major release with many enhancements to MonAMI. Perhaps the two improvements that top the list are:
  • adaptive monitoring,
  • writing monitoring data into a database.
Some other note-worthy changes include:
  • New plugins:
  • Updates to existing plugins:
    • maui
      • support for QoS (a maui term) monitoring added,
      • added a timeout option (maui can take ages to reply sometimes).
    • Torque
      • better error handling (the library has a somewhat amusing way reporting problems),
      • enforce thread-safety (some torque library API isn't),
    • Ganglia
      • fixed gmond.conf parser,
      • transmission now less bursty (reduces likelihood of overloading gmond)
      • unicast support: sending data to just the one gmond, support for multiple gmonds (for failover in unicast deployments) pencilled in for the next release.
    • null
      • adjustable time delay (useful when playing with adaptive monitoring)
    • MySQL
      • added per-Table monitoring statistics (also can now act as a reporting plugin).
  • Other changes:
    • Added the "MonAMI by Example" tutorial (has been available from the web for a while)
    • MonAMI-core will use the recent history of a monitoring target's response time when estimating how long it future requests will take. This uses quite a nice algorithm, which responds quickly to a service suddenly taking a longer time to respond, but isn't fooled if a service responds very quickly.
    • Added per-Thread CPU profiling. This is so, if someone says "MonAMI is consuming vast amounts of CPU" we can figure out why.
    • Spring-clean of user-guide and tutorial: lots of effort has gone into this, mostly in ensuring a consistency in the typesetting. The document should look a lot nicer now and hopefully be easier to read.
You can download MonAMI from the SourceForge page:
http://sourceforge.net/project/showfiles.php?group_id=151885
or configure your YUM to download it automatically. Details are available here:
http://monami.scotgrid.ac.uk/

Enjoy!

Monday, 12 November 2007

New output plugin: grmonitor

Ladies and Gentlemen, MonAMI now has a new output plugin: grmonitor. This allows the latest version of gr_monitor (available from the project's home page) to connect to MonAMI and fetch the data it then plots.

gr_monitor plots data in 3D using an OpenGL library (e.g. the open-source Mesa). This allows you to pan around and see the live data from different points of view. On the right is a screen snapshot showing several Torque metrics.

gr_monitor expects data in a series of regular n-by-m grids. This is quite different to how MonAMI sees data (a tree structure) so the configuration has to map between the two. This makes it slightly verbose, but I'm hoping to add a few tricks to improve this.

Hands-on workshop at Imperial College, London

The recent HEP-SysMan workshop was dedicated to monitoring: what software is available and how to configure it. I was honoured and delighted to be asked to give a presentation on MonAMI.

Well, given the meeting was a "workshop", I wanted to get people working! What better way than a hands-on tutorial: a step-by-step guide that walking you through increasingly more complex examples.

Pete and I had previously started something similar before as a GridPP wiki page, I wanted to convert this to DocBook so people had a good looking tutorial to work from. Since I wasn't too sure how long people would take, some extra material was added (e.g. using the MySQL plugin to save monitoring data). It took a surprisingly long time to get the tutorial good, which is one of the reasons things have been so quite recently.

This also finally forced me to figure out how to produce diagrams of datatrees. Thank's to GraphViz and some XSLT, the tutorial sports some nice diagrams. (Just need to add some to the user-guide now!)

The logistics were fun. Everyone needed their own environment to play with. Some people were able to used a spare machines at their home institute, but the rest used some 20 virtual machines that Ewan MacMahon managed to throw together. Each VM had its own install of Torque, maui and MySQL. Big thanks to Ewan!

Many people helped in getting this tutorial together. Mike Kenyon, Andrew Elwell, Caitriana Nicholson, Graeme Stewart and Tom Doherty (sorry if I've forgotten anyone!) all helped in proof reading and a big thanks also to Mona Aggarwal for organising the printed versions.

The meeting went well and people were happy with what they were doing.

Tuesday, 18 September 2007

Storing monitoring data in a database? no problem!

Ganglia is a monitoring system that uses RRDTool for its storage and graphs. This provides an excellent solution for monitoring, but suffers from data becoming less detailed ("averaged out") when you look further back in time. This is deliberate, but does make later analysis of the data difficult.

If you wanted to keep detailed records of monitoring data with MonAMI that don't degrade over time, now you can, I've committed changes to the mysql plugin in CVS. In addition to monitoring a MySQL database, the plugin can now store information. You tell it which table and how to map the information into that table and it does the rest, it'll even create the table if it doesn't exist.

Wednesday, 5 September 2007

Greetings from CHEP 2007!



Greetings from Victoria!

For those that thought things have been a bit quiet recently; well, yes, they have been. Recently, all my time has been spent preparing for the CHEP 2007 and All Hands 2007 conferences.

CHEP has now started, with various GridPP people here. Graeme, Greig and myself are giving a poster-presentation of MonAMI at CHEP. The poster is deliberately "visual": I'm aiming to use it to talk people through the concepts, rather than providing a poster that has lots of text.

For those interested, the poster was put together using Inkscape: a SVG editor. The whole poster is made up of SVG graphics with the only exception of the GridPP backgrounds and University logos (which are, unfortunately, large bitmaps). Inkscape is a very powerful editor. If you are doing anything involving SVG, I would recommend inkscape. Be sure to take the tutorials: they're both easy to follow and will greatly increase your productivity.

CHEP itself is an excellent conference. There's lots of people in the HEP computing field often facing similar computational challenges. I'm looking forward to meeting more people during the poster sessions.