Lies, damn lies and statistics

Published on

Okay, so it’s been quite some time since my last blog, but I’ve got a very good excuse – I’m a finalist, and essay deadlines and looming exams have got in the way of doing Nouse things.

Nothing too exciting has happened technically with the website since Roses, with the exception of the EU Elections tab, which was quite a simple job with HTML and then abstracting some of the lightbox code from the YUSU Elections minisite to make it reusable.

One thing I’ve talked about in the past with non-Nouse people is how we monitor things, both the Nouse server, and the website itself.

The website itself has nothing particularly fancy, Google Analytics is the workhorse and what we use as our official statistics, but we also have AWStats running which analyses log files hourly and makes the result available, instead of the lag that GA has.

One tool I’ve seen that does look particularly interesting is Woopra (Tim Ngwena tweeted about this a while back), which offers live tracking and analysis. This would be particularly useful for watching the statistics on live events such as Roses or Elections, but I’m unsure if putting yet more JavaScript in the site is such a good idea, and Google Analytics is far too good a tool to let go, so at the moment I’m not implementing it.

So far, so normal, however as Nouse operates from Linode as opposed to a typical shared server, there is a lot more that we can monitor, and it is normally these tools that interest other people.

I’ve been using MRTG now for quite some time on my own servers and graphs are quite nifty.

MRTG is designed to map the traffic of a router, but can easily be extended by a script that provides two numbers (representing “in” and “out” bits per second, but the labels can easily be changed), a hostname and an uptime. There’s a number of fairly normal things I track, such as CPU usage, RAM usage, HDD usage, etc, but there are a few things in particular that are more important for Nouse:

The first is the number of Apache requests being handled. This is crucially important, especially considering how heavyweight WordPress is in memory use (just under 60 MB per request) and keeping this below the point at which we start swapping and performance degrades past a useful point is very important.

The second thing I track is MySQL activity – both read (SELECT) and write (UPDATE, INSERT, DELETE, etc), and is the best indicator of server load. Looking at the year graph also shows us a few interesting events:

db-year

The blue line represents SELECT statements, and the green line UPDATE, INSERT, etc. The interesting events you can see on this graphs is when I configured WP Super Cache which made a massive difference to server load, and also when I deployed the new site, which contained an annoying bug resulting in far too many INSERT statements (the Popular This Week plugin used to count a view as when a post is displayed, but WordPress thinks a post is displayed whenever it’s headline is shown, resulting in an INSERT for every headline, rather than one per pageload). Spotting once this bug was fixed is also quite easy on that graph. Another interesting point is the increased amount of queries since May, which I hope means that the momentum we gathered during Roses has continued through the rest of the term.

The final thing I track was developed for the live blog and shows how many people are currently connected. Here’s the graph from Friday:

meteor-day

The little blip the day before is a result of testing, and at first it might seem that the numbers here are quite low, especially as Google Analytics recorded over 2000 views for the live blog on this day alone, but I suspect that means that user behaviour is that for something like Roses, most people just popped in to check, and then moved on, coming back later, rather than leaving it to run all the time.

Of course, statistics are useless without anything to compare it against. The Yorker no longer publish their web stats on their advertising page (we do, and I try to keep them up-to-date, especially considering the massive growth we’ve seen since the new site launched in February), but if they’ve not grown since they last published them (which seems unlikely, of course), our statistics compare favourably.

(Also, in an unrelated note, I will be filming some of the Apprentice York stuff for YSTV tonight, and one of the remaining teams include Nouse Comment Editor Charlotte Hogarth-Jones, and News Correspondent Holly Hyde, so I feel obliged to recommend that you all pop down to Derwent tonight to support Mr York 2009, where Nouse editor Henry James Foy will be competing against a few people, including regular Nouse commenter, Dan Taylor).