|
发表于 2013-1-18 22:14:35
|
显示全部楼层
Jan 18, 2013 2:43:13 AM
Re: Degraded database performance
The good news is that over the past few days we have seen the back end server daemons catch up and work has been flowing steadily to the users.
The bad news is that a number of our backend processes that that read large numbers of records for reports or for large batch updates of information have been performing worse and worse. We have identified the continuing cause of this issue.
I've included some articles below that help explain the issue.
Briefly:
MySQL maintains an 'undo log' that allows multiple current transactions to proceed with different isolation levels. Adding records to this undo log is done as part of ongoing transactions. However, cleaning up the 'undo log' is performed on a delayed basis when the server has time and is able to perform the clean up (known as purge) . If the server is under heavy load, then this delay can be very long and the 'history list length' can be come very large.
By default, the purge process is performed as part of MySQL's 'master thread'. This means that the purge process competes with other activities for time to perform purge operations.
While the undo log is small and the log can remain in memory, the purge process can operate quickly. However, if the process starts to fall behind, the log can go large and eventually it has to be moved to disk. Once this happens the process takes considerably longer and it is likely to fall further behind.
Due to the nature of the undo log, its size directly impacts the performance of different types of queries. As it gets longer, the performance of these queries becomes slower and slower.
At this time, our undo log contains over 51 million entries. It is significantly behind and it is falling further behind daily. The database averages about 3,500 transactions per second. The undo log only stores entries that modify the database which are about half our transactions. This means that the undo log is over 8 hours behind and falling further behind.
Yesterday and this morning we made a number of changes to the database in order to improve the performance of the purge process (including using the new MySQL 5.5 option to run the purge process in its own thread). Unfortunately, with the current size of the undo log and it not residing entirely within memory, these additional options are not able to allow the process to catch up during normal operations.
As a result, we are going to have to take unusual action in order to clear out the undo log. Specifically, we are going to have to stop all access to the database, take a complete backup, delete the existing database, and then restore the database from backup. This is the process that we used when we migrated from MySQL 5.1 to MySQL 5.5. Unfortunately, we estimate that with the current database size, this outage could take up to 24 hours. I will be posting details about that outage shortly.
This will restore the database to its normal behavior. We have high confidence that the changes that we have put in place this week and last week will allow the purge process to keep up with the database transactions. However, as we plan to continually recruit additional volunteers to help us grow bigger, we are examining more substantive changes to accommodate this growth. These changes range from repacking workunits with various sizes in order to reduce the total number of results per day, migrating to MySQL 5.6 when it is released later this year, migrating to Percona Server, as well as adding a replica database for the purpose of performing read only transactions against it (reports, backups, result status page on website, etc). Any one of these options will ensure that we do not face this issue again in the future. We need to investigate further against our long term growth plans to make the proper changes.
We appreciate the patience you have shown while we investigate and we look forward to returning to normal operations soon.
http://www.pythian.com/news/3257 ... mysql-history-list/
https://mysqlquicksand.wordpress ... ade-blues-part-one/
http://www.mysqlperformanceblog. ... -innodb-tablespace/
http://dev.mysql.com/doc/refman/5.5/en/glossary.html#glos_purge
大意:
坏消息,由于数据库操作过于频繁,导致数据库经常锁死。
我们打算重建数据库,看能否改善。届时将停机24小时左右。
将来,等年底MySQL 5.6发布我们会尽快升级,并且将数据库迁移至Percona系统。并将把只读的操作迁移到镜像数据库,以减少写表时死锁的产生。 |
评分
-
查看全部评分
|