Tagged: server admin.

Amazon AWS Elastic Load Balancing and MySQL

I’m writing this post in response to a problem I experienced recently while putting MySQL behind an Amazon Elastic Load Balancer (ELB). Basically, the MySQL server stopped accepting connections made through the ELB, and sent the client an error to the effect of:

ERROR 1129 (00000): Host ‘elb-hostname.compute-1.internal'
is blocked because of many connection errors; unblock with
'mysqladmin flush-hosts'

At first, it didn’t go off in my head – but the reason this error is being thrown, is because of the default ELB health-check configuration.

The Problem

By default, when you create a new ELB, it configures the health-checker to open a TCP connection to the instance-port (which you define upon creation of the ELB) for each instance you register on the ELB. In the case of MySQL, this is port 3306. At first thought, this should work perfectly. If MySQL dies, the health-check will fail, the ELB will take that server out of the server pool, and life will go on. Unfortunately, this is not the case.

The ELB does not know how to speak MySQL, nor does it know (or even care) that your instances are running MySQL and that it is forwarding MySQL traffic. When the ELB opens the TCP connection to port 3306, the request is successful (MySQL will always open a connection to the host – you just don’t get access to the server until you provide valid credentials), but because the ELB doesn’t speak MySQL it doesn’t know how to terminate that TCP connection correctly. MySQL recognizes this as a connection error, and after several of them it will stop accepting connections from this host. This in turn kills every application that tries to connect to the MySQL server pool via the ELB.

The Solution

Luckily, the solution to this problem is really easy.

First, I’ll cover the easy-yet-not “ideal”-or-“suggested” way. You can simply configure a new health-check for the server pool that, instead of checking port 3306 (MySQL), checks port 22 (SSH). The command to do this looks similar to:

elb-configure-healthcheck --healthy-threshold 5 \
--interval 15 \
--target TCP:22 \
--timeout 2 \
--unhealthy-threshold 3

This will configure the ELB to check each instance registered to it every 15 seconds by opening a TCP connection to port 22. If the connection is not successful within two seconds, it counts that as a failed connection. After three failed connections, the host is considered “OutOfService,” and is removed from the distribution rotation. Even while the host is not being selected to receive traffic, the ELB continues to check the host every 15 seconds. If it should receive five consecutive successful connections, the host is considered “InService” and is re-added to the distribution rotation.

The reason this is not “ideal” or “suggested” is because, if your MySQL daemon crashes or becomes unavailable for any reason other than your server going down; the ELB will not recognize the service as unavailable – and will continue to try and use the server in it’s pool. This will obviously result in application failures whenever this server gets selected to host the DB connection for a specific request.

A more dependable solution would involve some sort of script which responds via HTTP or TCP connection to the health-checker with the appropriate responses based on the actual status of your MySQL service. There are literally hundreds of ways to implement this, all with their own advantages and disadvantages. One, simple-ish way, would be to install apache + php. You can set up a single host on apache, which loads a php script that tries to connect to MySQL. If the connection fails return a 500 error, and if it’s successful return a 200 error (200 OK means we’re all good – to the ELB, anything else is considered a failure). This isn’t the most elegant solution – but it’s quick, and easy.

I hope this helps someone, I know I spent a few hours googling around trying to understand what was actually happening only to find two posts which were relevant to my search. Those two links are:

09:55 pm, by jimsc  Comments