Shared MySQL improvements 23 Dec 09
Over the last few weeks we’ve been working on scaling our shared MySQL facilities. Until recently, we’ve been able to run a single (albeit hefty) shared MySQL cluster but due to growing demand we’ve needed to scale this up considerably. The main cluster has had some performance problems recently and, while some tuning and vertical scaling bought us some time (we more than doubled the resources of the main cluster), the real focus has been on horizontal scaling.
We’ve built a bunch of new master-master replicated pairs and our backend systems now distribute customers between them on sign-up. We’ve also been contacting some customers and moving them to new clusters, to relieve the pressure on the main cluster (customers with heavy requirements are still recommended managed dedicated clusters – these will be available to purchase simply as additional products soon). We’re using puppet to automate a lot of the setup of the new clusters and can deploy a new one, with monitoring and backups, very quickly.
This work has almost quadrupled the shared MySQL resources within the space of a couple of weeks, and provides a simple platform to continue scaling indefinitely. The decentralisation also makes some aspects of administration easier, such as arranging downtime for maintenance.
The visible differences are small: rather than everyone connecting to one address, sqlreadwrite.brightbox.net, each account needs to use the address provided in the control panel. The old sqlreadwrite.brightbox.net has become db01.mysql.vm.brightbox.net (the old name will of course continue to work indefinitely), and the new clusters are at db02.mysql.vm.brightbox.net, db03.mysql.vm.brightbox.net etc. Our wiki documentation has been updated to reflect this – customers on the old cluster don’t have to make any changes, it only really affects new customers and customers we’ve contacted to arrange a move.
We’ve also been working on improving the slow query logger to provide more useful results. Instead of reporting every slow query ever logged, it produces an intelligent summary of the week’s queries. This means when you see a slow query in the control panel, it means it’s shown up repeatedly and very likely needs attention, as opposed to queries that just happened to take longer than usual due to load on the cluster. We’ll be rolling this work out just after Christmas.
This work represents a big investment in our shared MySQL platform, which we know is invaluable to a lot of our customers, and allows us to keep growing without sacrificing performance.