遇到经典的MDL等待场景时怎么分析和处理。 本文节选自叶金荣有赞《乱弹MySQL》专栏。
MySQL从5.5版本开始,新增元数据锁,也就是metadata lock,简称MDL锁。
MDL锁的主要目是保护表元数据并行请求过程中不被修改,例如会话1在执行SELECT查询数据,会话2要新增一列,这时第二个会话会被阻塞,以免造成会话1前后看到的表结构不一致。
当然了,MDL后来被扩展了,并不只是针对表(TABLE)对象,也包含库(SCHEMA)、表空间(TABLESPACE)、存储程序(过程、函数、触发器、定时器)等对象,也受到MDL的保护。此外,MDL锁也会阻塞全局 COMMIT 行为,比如加了FTWRL(FLUSH TABLE WITH READ LOCK)后,就会持有一个全局的 COMMIT 共享锁,此时会阻止其他会话的事务提交。
我们从 performance_schema.metadata_lock 就可以看到MDL的详细情况,从MySQL 5.7版本开始,还可以从 sys.schema_table_lock_waits 查看MDL阻塞等待的情况。要特别注意的是,MDL锁等待超时阈值由选项 lock_wait_timeout 控制,该选项默认值是 31536000秒,也就是 一年、一年、一年(重要的话重复三遍),建议调低,比如改成5-10分钟,建议最长不超过1小时(想想,这种MDL等待超过1小时还不报警的话,DBA也该下岗了吧)。
另外,想要在PFS(performance_schema)和 sys schema中能看到MDL详情的话,需要先启用相关的设置:(横屏观看)
[root@yejr.me]> use performance_schema;[root@yejr.me]> UPDATE setup_consumers SET ENABLED = 'YES' WHERE NAME ='global_instrumentation';[root@yejr.me]> UPDATE setup_instruments SET ENABLED = 'YES' WHERE NAME =‘wait/lock/metadata/sql/mdl';
然后就能查看当前是否有MDL锁了:(横屏观看)
# session1执行一个未结束的只读事务[root@yejr.me]> begin; select * from test.t1 limit 1;# session2 查看MDL详情[root@yejr.me]> SELECT * FROM metadata_locks\G*************************** 1. row *************************** OBJECT_TYPE: TABLE OBJECT_SCHEMA: test OBJECT_NAME: t1 COLUMN_NAME: NULLOBJECT_INSTANCE_BEGIN: 140334142005184 LOCK_TYPE: SHARED_READ -- 该MDL请求类型为共享读 LOCK_DURATION: TRANSACTION -- 该MDL请求处于事务级别 LOCK_STATUS: GRANTED -- 该MDL请求已获得 SOURCE: sql_parse.cc:5929 OWNER_THREAD_ID: 1892 -- 线程ID OWNER_EVENT_ID: 13
已有各路大神对MDL做了详细解析,所以本文不再赘述,我更喜欢写点接地气的。所以我们共同分析一个非常经典的MDL等待案例,借此分析遇到这种问题时,该怎么分析和判断,以及如何应对处理。(横屏观看)
session1 | session2 | session3 |
---|---|---|
begin; select * from t1 limit 1; | ||
alter table t1 add cx int; # 此时该请求被阻塞 | ||
select * from t1 limit 1; # 此时该请求被阻塞 |
这时执行 SHOW PROCESSLIST
的话,能看到下面这样的现场(删除了部分无用输出列)(横屏观看)
+------+------+---------------------------------+--------------------------+| Id | Time | State | Info |+------+------+---------------------------------+--------------------------+| 1853 | 0 | starting | show processlist || 1854 | 134 | Waiting for table metadata lock | alter table t1 add cx int|| 1855 | 83 | Waiting for table metadata lock | select * from t1 limit 1 |+------+------+---------------------------------+--------------------------+
如果只看现场,我们是没办法分析出到底哪个线程导致的MDL阻塞,这正是MySQL 5.7之前版本的痛苦之处,遇到MDL等待,只能靠DBA的经验、感觉去分析,还不一定完全可靠。
但是,5.7版本之后,我们就可以用PFS和sys schema进行跟踪分析了:(横屏观看)
# 1、查看当前的MDL请求状态(删除部分信息)[root@yejr.me]> select * from performance_schema.metadata_locks;+-------------+-----------+---------------------+---------------+-------------+------+| OBJECT_TYPE | OBJECT_NAM| LOCK_TYPE | LOCK_DURATION | LOCK_STATUS | TID |+-------------+-----------+---------------------+---------------+-------------+------+| TABLE | t1 | SHARED_READ | TRANSACTION | GRANTED | 1892 || GLOBAL | NULL | INTENTION_EXCLUSIVE | STATEMENT | GRANTED | 1893 || SCHEMA | NULL | INTENTION_EXCLUSIVE | TRANSACTION | GRANTED | 1893 || TABLE | t1 | SHARED_UPGRADABLE | TRANSACTION | GRANTED | 1893 || BACKUP LOCK | NULL | INTENTION_EXCLUSIVE | TRANSACTION | GRANTED | 1893 || TABLESPACE | test/t1 | INTENTION_EXCLUSIVE | TRANSACTION | GRANTED | 1893 || TABLE | #sql-e953_| EXCLUSIVE | STATEMENT | GRANTED | 1893 || TABLE | t1 | EXCLUSIVE | TRANSACTION | PENDING | 1893 || TABLE | t1 | SHARED_READ | TRANSACTION | PENDING | 1894 |+-------------+-----------+---------------------+---------------+-------------+------+# 2、查看当前的MDL等待状态[root@yejr.me]> select * from sys.schema_table_lock_waits\G*************************** 1. row *************************** object_schema: test object_name: t1 waiting_thread_id: 1893 -- 等待的线程ID waiting_pid: 1854 -- 等待的连接PID waiting_account: root@localhost waiting_lock_type: EXCLUSIVE -- 等待的锁类型 waiting_lock_duration: TRANSACTION waiting_query: alter table t1 drop cx waiting_query_secs: 134 -- 锁等待时长 waiting_query_rows_affected: 0 waiting_query_rows_examined: 0 blocking_thread_id: 1892 -- 这里的线程ID值对应上面的TID blocking_pid: 1853 -- 连接PID,对应下方的trx_mysql_thread_id blocking_account: root@localhost blocking_lock_type: SHARED_READ -- 阻塞的锁类型 blocking_lock_duration: TRANSACTION sql_kill_blocking_query: KILL QUERY 1853sql_kill_blocking_connection: KILL 1853*************************** 2. row *************************** object_schema: test object_name: t1 waiting_thread_id: 1894 waiting_pid: 1855 waiting_account: root@localhost waiting_lock_type: SHARED_READ waiting_lock_duration: TRANSACTION waiting_query: select * from t1 limit 1 waiting_query_secs: 83 waiting_query_rows_affected: 0 waiting_query_rows_examined: 0 blocking_thread_id: 1892 blocking_pid: 1853 blocking_account: root@localhost blocking_lock_type: SHARED_READ blocking_lock_duration: TRANSACTION sql_kill_blocking_query: KILL QUERY 1853sql_kill_blocking_connection: KILL 1853*************************** 3. row *************************** object_schema: test object_name: t1 waiting_thread_id: 1893 waiting_pid: 1854 waiting_account: root@localhost waiting_lock_type: EXCLUSIVE waiting_lock_duration: TRANSACTION waiting_query: alter table t1 drop cx waiting_query_secs: 83 waiting_query_rows_affected: 0 waiting_query_rows_examined: 0 blocking_thread_id: 1893 blocking_pid: 1854 blocking_account: root@localhost blocking_lock_type: SHARED_UPGRADABLE blocking_lock_duration: TRANSACTION sql_kill_blocking_query: KILL QUERY 1854sql_kill_blocking_connection: KILL 1854*************************** 4. row *************************** object_schema: test object_name: t1 waiting_thread_id: 1894 waiting_pid: 1855 waiting_account: root@localhost waiting_lock_type: SHARED_READ waiting_lock_duration: TRANSACTION waiting_query: select * from t1 limit 1 waiting_query_secs: 83 waiting_query_rows_affected: 0 waiting_query_rows_examined: 0 blocking_thread_id: 1893 blocking_pid: 1854 blocking_account: root@localhost blocking_lock_type: SHARED_UPGRADABLE blocking_lock_duration: TRANSACTION sql_kill_blocking_query: KILL QUERY 1854sql_kill_blocking_connection: KILL 1854# 3、顺便查看当前是否有活跃未提交的事务,通常持续时间最久的事务也是引起大面积MDL锁阻塞的根源[root@yejr.me]> select trx_id,trx_state,time_to_sec(timediff(now(), trx_started)) as trx_active_secs, trx_mysql_thread_id as pid, trx_query from information_schema.innodb_trx order by trx_active_secs desc \G*************************** 1. row *************************** trx_id: 281479749621520 trx_state: RUNNINGtrx_active_secs: 226 pid: 1853 trx_query: select trx_id,trx_state...
上述测试的环境是:
[root@yejr.me]>...Server version: 8.0.16 MySQL Community Server - GPL...
从上面的结果可分析得到以下结论:
想要让session2和3都不被阻塞,只需要让session1上持有的锁释放即可
KILL QUERY 1853
,但实际上是不管用的,因为PID=1853中导致MDL等待的SQL已经执行结束,只是事务还没提交而已,因此正确的方法是执行 KILL 1853
彻底杀掉该连接,回滚相应的事务,释放MDL锁MySQL Reference Manual, 8.11.4 Metadata Locking,http://t.cn/AiY94bO6
MySQL · 源码分析 · 常用SQL语句的MDL加锁源码分析,http://t.cn/AiY94BBp
最后,欢迎扫码订阅《乱弹MySQL》专栏,快人一步获取
信息加载中,请等待
微信客服(速回)
微信客服(慢回)