Monday, January 18, 2016

SSD Tuning for Linux Servers




I got a call from a client that was having issues with really slow database queries.   Part of the solution was to upgrade to the latest version of Percona Server 5.6 but while we where in, we took the time to look into the tuning of the machine.

The machine was a beast -- over a 100G of RAM, a few SSD's in a RAID controller but it's reported IO was dismal in production.   UPDATE queries that should have only taken microseconds where taking minutes.

We optimized tables, removed duplicate indexes -- pretty typical stuff.   During the optimization of tables, the SSD array was getting over 500Mb/s for writes.  I was reasonably sure the hardware wasn't failing given those numbers.

We found that is was running EXT4 with barriers.   I have found that MySQL performs poorly with EXT4's barrier feature enabled. We disabled that in fstab and rebooted.

After that was all done, I looked into what we can do with respect to kernel tuning for this workload. I found that these settings should help in theory for this type of situation.

echo 0 > /sys/block/sda/queue/rotational
echo 0 > /sys/block/sda/queue/add_random
 echo 2 > /sys/block/sda/queue/rq_affinity  
echo noop > /sys/block/sda/queue/scheduler
echo 2 > /sys/block/sda/queue/nomerges
/sbin/blockdev --setra 0  /dev/sda

Given that it is an array of SSD's attached to RAID controller card, I went with the idea that the kernel should just let MySQL and the RAID controller do as much of the work as possible and stop second guessing it.  This is why the scheduler is set to 'noop' and I've set the read ahead on this logical drive to 0. 

I found that the OS was not able to ID the drive as an array of SSD's so we had to set the 'rotational' setting to 0 which is supposed to trigger certain optimizations with in the kernel for SSD's.

Next, the 'add_random' setting is disabled so that we can stop collecting entropy from this device.   Since it is an SSD, I would suspect that the entropy provided would be poor and it's adding a small amount of overhead to each IO.  The node also doesn't do much crypto work either so this was a obvious choice.

Alas, the client was keen on getting this node back into production so I was not able to do any real benchmarking on these configuration changes but in the end we did improve the situation considerably so the client was happy about that.

No comments: