Thursday, December 29, 2016

Slow MySQL Replication with MySQL 5.6

I can't believe that this is not common knowledge... MySQL 5.6 (and even Percona's 5.6 XTRADB Server, can have poor replication performance.

We built a beefy slave machine for the sole purpose of backing up production database.  We had the high hopes of keeping them in sync ~100% of the time.  With the replication performance present in MySQL 5.6, that idea seemed to be impossible at first.

The server itself had 128G of RAM and was using a Samsung EVO 950 Pro drive in the M2 slot for the MySQL data directory. Replication was completely CPU bound with little disk IO.   In fact, the amount of disk IO would not have even choked old IDE ATA drives.

Turns out that a Facebook engineer blogged about it in Dec 2013.  His conclusion was that MySQL 5.6 was unsuitable for Facebook's replication needs and that they could use MySQL 5.1 instead as he showed that MySQL 5.1's replication performance is much better than 5.6.

Switching to MySQL 5.1 was out of the question so  I was left with a bit of a problem...  I didn't like the idea of replication running from a MySQL 5.6 Master to MySQL 5.1 slave.  I gave MySQL 5.7 a go but I found that was unstable in this configuration.  I had two trial upgrades of the master's database go awry and a few times in production/testing with the end user I had weird user account issues crop up.

I ended up filtering some of the tables out  with the 'replicate_ignore_table' option.  I picked tables that received a high number of writes but where of little value for business purposes or reporting.

Further tuning of my.cnf helped.  Below are parts of this setup's MySQL configuration:


lower_case_table_names = 1
skip_name_resolve
innodb_file_per_table=1
binlog_format=row
default_storage_engine = innodb
max_allowed_packet = 64M
binlog_cache_size = 1M
max_heap_table_size = 1024M
sort_buffer_size = 256M
join_buffer_size = 256M
tmp_table_size = 8192M
innodb_doublewrite=0
innodb_support_xa=ON
innodb_checksum_algorithm=none
innodb_read_io_threads=64
innodb_write_io_threads=64
innodb_buffer_pool_instances=8
innodb_buffer_pool_size = 64G
innodb_data_file_path = ibdata1:10M:autoextend
innodb_read_io_threads=64
innodb_write_io_threads=64
innodb_io_capacity=2000
innodb_io_capacity_max=2500
innodb_thread_concurrency = 0
innodb_flush_log_at_trx_commit = 0
innodb_log_buffer_size = 32M
innodb_log_files_in_group = 3
innodb_max_dirty_pages_pct = 90
innodb_flush_method=O_DSYNC
innodb_lock_wait_timeout = 120
innodb_flush_neighbors=0
innodb_log_file_size = 4G
innodb_lru_scan_depth=2500
innodb_open_files=65536
innodb_print_all_deadlocks

sync_binlog=0
sync_master_info=0
sync_relay_log=0
sync_relay_log_info=0
relay_log=relay-bin
relay_log_index=relay-bin.index
master_info_repository=FILE
relay_log_info_repository=FILE
relay_log_recovery=0

Please only use this for reference and read up on each of these options before enabling them on your production servers.  If you have a test server that you can replicate the same work load, even better. Some of the options enabled here have some serious side effects that in the event of a crash, may corrupt the database.

No comments: