Downtime

As any webmaster will know, downtime is evil. It’s to be avoided at all costs. When it happens, you look bad, you lose visitors, and if you’re a commercial entity, advertising revenue.

The Nouse site was down for a few hours around midday today, and although as much I hope to be able to say “it’s not my fault” and blame external factors, it was. Problems I did not foresee during what should have been a lightweight procedure essentially crippled our MySQL server, until I figured out the problem and solved it.

Last night I started an overhaul of our databases. Due to a legacy issue with WordPress, our database stored UTF-8 text, but it was identified as Latin-1. This caused a huge number of internal problems – backups are double encoded UTF-8 (which is horrible, every quote is replaced with three characters) and the databases actually held incorrect data, which made doing anything outside of WordPress complex – MySQL would convert any UTF-8 data incoming to Latin-1, despite us actually wanting UTF-8.

The solution was deceptively simple, and thanks to blogger Alex King, I solved this last night.

The motivation to solve this is that this character encoding issue was blocking me from introducing something I wanted to do, and mentioned a few entries ago – migrating the original Nouse site over to WordPress.

It’s this migration that took the site down today. Our central “wp_posts” table hits 213 MB in size, with the vast majority of that being fulltext search indexes for the search function and the related posts plugin. It’s these indexes that started to kill the site. An INSERT was taking up to 2 minutes, which was fine on the wp_posts table in our testing site because nothing else was trying to read that at the same time, but what I did not realise was when I ran the operation live on the server, it would start blocking normal wp_post reads, tying up Apache processes and quickly hitting the connection limit.

It took me quite a long time to realise the cause of the long inserts. Once I did, a simple ALTER TABLE wp_posts DISABLE KEYS solved it, re-enabling at the end to enable the fulltext searching once again. This change was dramatic, instead of running for an hour and only getting a sixth of the way through the conversion, by the time I had switched back to the other terminal, the job had completed.

So, I apologise for the downtime earlier – I had a few people e-mail me to complain about the site being down, but think of the silver lining on the cloud – we now have an additional 800 or so articles in our archive, stretching back to 2003.

The articles back then are written in a different style, less “broadsheet”, more of an “us against them” mentality (Nouse wasn’t redesigned into it’s current form until 2005 – look at the old PDFs for comparison!), but for people like me who like seeing the way things were and with posterity, they’re fascinating.

2 comments.

Lies, damn lies and statistics

Okay, so it’s been quite some time since my last blog, but I’ve got a very good excuse – I’m a finalist, and essay deadlines and looming exams have got in the way of doing Nouse things.

Nothing too exciting has happened technically with the website since Roses, with the exception of the EU Elections tab, which was quite a simple job with HTML and then abstracting some of the lightbox code from the YUSU Elections minisite to make it reusable.

One thing I’ve talked about in the past with non-Nouse people is how we monitor things, both the Nouse server, and the website itself.

The website itself has nothing particularly fancy, Google Analytics is the workhorse and what we use as our official statistics, but we also have AWStats running which analyses log files hourly and makes the result available, instead of the lag that GA has.

One tool I’ve seen that does look particularly interesting is Woopra (Tim Ngwena tweeted about this a while back), which offers live tracking and analysis. This would be particularly useful for watching the statistics on live events such as Roses or Elections, but I’m unsure if putting yet more JavaScript in the site is such a good idea, and Google Analytics is far too good a tool to let go, so at the moment I’m not implementing it.

So far, so normal, however as Nouse operates from Linode as opposed to a typical shared server, there is a lot more that we can monitor, and it is normally these tools that interest other people.

I’ve been using MRTG now for quite some time on my own servers and graphs are quite nifty.

MRTG is designed to map the traffic of a router, but can easily be extended by a script that provides two numbers (representing “in” and “out” bits per second, but the labels can easily be changed), a hostname and an uptime. There’s a number of fairly normal things I track, such as CPU usage, RAM usage, HDD usage, etc, but there are a few things in particular that are more important for Nouse:

The first is the number of Apache requests being handled. This is crucially important, especially considering how heavyweight WordPress is in memory use (just under 60 MB per request) and keeping this below the point at which we start swapping and performance degrades past a useful point is very important.

The second thing I track is MySQL activity – both read (SELECT) and write (UPDATE, INSERT, DELETE, etc), and is the best indicator of server load. Looking at the year graph also shows us a few interesting events:

db-year

The blue line represents SELECT statements, and the green line UPDATE, INSERT, etc. The interesting events you can see on this graphs is when I configured WP Super Cache which made a massive difference to server load, and also when I deployed the new site, which contained an annoying bug resulting in far too many INSERT statements (the Popular This Week plugin used to count a view as when a post is displayed, but WordPress thinks a post is displayed whenever it’s headline is shown, resulting in an INSERT for every headline, rather than one per pageload). Spotting once this bug was fixed is also quite easy on that graph. Another interesting point is the increased amount of queries since May, which I hope means that the momentum we gathered during Roses has continued through the rest of the term.

The final thing I track was developed for the live blog and shows how many people are currently connected. Here’s the graph from Friday:

meteor-day

The little blip the day before is a result of testing, and at first it might seem that the numbers here are quite low, especially as Google Analytics recorded over 2000 views for the live blog on this day alone, but I suspect that means that user behaviour is that for something like Roses, most people just popped in to check, and then moved on, coming back later, rather than leaving it to run all the time.

Of course, statistics are useless without anything to compare it against. The Yorker no longer publish their web stats on their advertising page (we do, and I try to keep them up-to-date, especially considering the massive growth we’ve seen since the new site launched in February), but if they’ve not grown since they last published them (which seems unlikely, of course), our statistics compare favourably.

(Also, in an unrelated note, I will be filming some of the Apprentice York stuff for YSTV tonight, and one of the remaining teams include Nouse Comment Editor Charlotte Hogarth-Jones, and News Correspondent Holly Hyde, so I feel obliged to recommend that you all pop down to Derwent tonight to support Mr York 2009, where Nouse editor Henry James Foy will be competing against a few people, including regular Nouse commenter, Dan Taylor).

1 comment.