原文丨https://www.modb.pro/db/7440 (复制链接至浏览器,即可查看)
作者丨杨柳
本文为原创文章,如有转载,请标明出处
导读:数据库表空间扩容是我们在运维过程中经常需要做的事情,常用的方法有增加表空间数据文件或resize数据文件等,本文分享一起使用resize datafile扩容表空间时偶遇enq: TX - index contention引起系统异常的特殊案例,供大家参考。
问题描述
某客户生产系统核心库在2019-08-21 23:50:00左右出现了业务告警,应用无法连接的情况。
查看故障期间数据库的告警日志,发现在23:49:56,数据库进行了数据库文件扩容操作。
Wed Aug 21 23:49:56 2019
ALTER DATABASE DATAFILE xxxxxxx RESIZE 100G
Wed Aug 21 23:51:38 2019
Completed: ALTER DATABASE DATAFILE xxxxxxx RESIZE 100G
Wed Aug 21 23:52:31 2019
ALTER DATABASE DATAFILE xxxxxxx RESIZE 950G
Thu Aug 22 00:05:23 2019
Completed: ALTER DATABASE DATAFILE xxxxxxx RESIZE 950G
Thu Aug 22 00:15:18 2019
查看故障期间数据库的监听日志,发现从23:58:44到00:05:23这段时间内,数据库没有新的JDBC连接进来。
专家解答
查看故障期间的ash信息,发现在应用异常的时间段内,SQL_ID为g64z3fw1nffvj的insert操作产生了大量的enq: TX - index contention等待,并且造成这个等待事件的原因就是alter database操作和它本身的insert操作,在resize的执行过程中enq: TX - index contention等待事件一直升高,resize操作成功后(即2019-08-22 00:05:00左右)等待事件恢复正常。
查看相关的trace文件信息:
*** 2019-08-21 23:57:14.907 =============================================================================== HANG ANALYSIS: instances (db_name.oracle_sid): xxxxx =============================================================================== Chains most likely to have caused the hang<<<<<<<<<<<<<<<<<<<<<<查看HANG住三个可能因素 [a] Chain 1 Signature: <not in a wait><='buffer busy waits'<='enq: TX - index contention' Chain 1 Signature Hash: 0xbeca5ef7 [b] Chain 2 Signature: <not in a wait><='buffer busy waits'<='enq: TX - index contention' Chain 2 Signature Hash: 0xbeca5ef7 [c] Chain 3 Signature: <not in a wait><='row cache lock' Chain 3 Signature Hash: 0xccefbfc0 =============================================================================== Non-intersecting chains: ------------------------------------------------------------------------------- Chain 1: ------------------------------------------------------------------------------- Oracle session identified by: { instance: 2 (xxxxx) os id: 15343 process id: 67 session id: 2882 session serial #: 39285 } is waiting for 'enq: TX - index contention' with wait info: { p1: 'name|mode'=0x54580004 p2: 'usn<<16 | slot'=0x470000 p3: 'sequence'=0xa32a2d …… and is blocked by => Oracle session identified by: { instance: 2 (xxxxxxx) os id: 105251 process id: 79 session id: 3400 <<<<<<<<<<<<<<<<<<<<<<chain 1被session_id=3400会话阻塞了 session serial #: 22853 } which is not in a wait: { last wait: 36.074155 sec ago blocking: 4 sessions wait history: 1. event: 'ASM file metadata operation' time waited: 0.000018 sec (last interval) time waited: 0.001953 sec (total) wait id: 94 p1: 'msgop'=0x21 p2: 'locn'=0x0 * time between wait #1 and #2: 0.000000 sec 2. event: 'KSV master wait' time waited: 0.001856 sec wait id: 96 * time between wait #2 and #3: 0.000000 sec 3. event: 'ASM file metadata operation' time waited: 0.000003 sec (last interval) time waited: 0.000079 sec (total) wait id: 94 p1: 'msgop'=0x21 p2: 'locn'=0x0 } Chain 1 Signature: <not in a wait><='buffer busy waits'<='enq: TX - index contention' Chain 1 Signature Hash: 0xbeca5ef7 ------------------------------------------------------------------------------- =============================================================================== Intersecting chains: ------------------------------------------------------------------------------- Chain 2: ------------------------------------------------------------------------------- Oracle session identified by: { instance: 2 (xxxxxxx) os id: 70061 process id: 69 session id: 2969 session serial #: 46127 } ……. and is blocked by 'instance: 2, os id: 83344, session id: 2840', which is a member of 'Chain 1'. <<<<<<<<<<<<<<<<<<<<<<被chain1阻塞了 Chain 2 Signature: <not in a wait><='buffer busy waits'<='enq: TX - index contention' Chain 2 Signature Hash: 0xbeca5ef7 ------------------------------------------------------------------------------- ------------------------------------------------------------------------------- Chain 3: ------------------------------------------------------------------------------- Oracle session identified by: { instance: 2 (xxxxxx) os id: 102165 process id: 75 session id: 3227 session serial #: 3457 } is waiting for 'row cache lock' with wait info: { p1: 'cache id'=0x6 p2: 'mode'=0x0 p3: 'request'=0x3 time in wait: 2.020260 sec heur. time in wait: 1 min 59 sec timeout after: 0.979740 sec wait id: 360 blocking: 0 sessions wait history: * time between current wait and wait #1: 0.000015 sec 1. event: 'row cache lock' time waited: 3.001099 sec wait id: 359 p1: 'cache id'=0x6 p2: 'mode'=0x0 p3: 'request'=0x3 * time between wait #1 and #2: 0.000015 sec 2. event: 'row cache lock' time waited: 3.000983 sec wait id: 358 p1: 'cache id'=0x6 p2: 'mode'=0x0 p3: 'request'=0x3 * time between wait #2 and #3: 0.000013 sec 3. event: 'row cache lock' time waited: 3.000978 sec wait id: 357 p1: 'cache id'=0x6 p2: 'mode'=0x0 p3: 'request'=0x3 } and is blocked by 'instance: 2, os id: 105251, session id: 3400'。 which is a member of 'Chain 1'. <<<<<<<<<<<<<<<<<<<<<<被session_id 3400阻塞了 Chain 3 Signature: <not in a wait><='row cache lock' Chain 3 Signature Hash: 0xccefbfc0 ------------------------------------------------------------------------------- =============================================================================== Extra information that will be dumped at higher levels: [level 4] : 1 node dumps -- [LEAF] [LEAF_NW] [level 5] : 4 node dumps -- [NO_WAIT] [INVOL_WT] [SINGLE_NODE] [NLEAF] [SINGLE_NODE_NW] State of ALL nodes [2839]/2/2840/17657/0x13d1a33000/83344/NLEAF/[3399] [2881]/2/2882/39285/0x13c1a48b38/15343/NLEAF/[2839] [2968]/2/2969/46127/0x13b1a69c08/70061/NLEAF/[2839] [3226]/2/3227/3457/0x13c1ade618/102165/NLEAF/[3399] [3399]/2/3400/22853/0x1391b4cad8/105251/LEAF_NW/ *** 2019-08-21 23:57:14.908 =============================================================================== END OF HANG ANALYSIS =============================================================================== *** 2019-08-21 23:58:53.765 Suspected Hangs in the System Root Chain Total Hang Hang Hang Inst Root #hung #hung Hang Hang Resolution ID Type Status Num Sess Sess Sess Conf Span Action ----- ---- -------- ---- ----- ----- ----- ------ ------ ------------------- 1 HANG VALID 2 3400 3 286 HIGH LOCAL Terminate Process <<<<<<<<<<<<<<<<<<< 系统阻塞的可能原因是session id为3400的本地进程引起的 inst# SessId Ser# OSPID Event ----- ------ ----- --------- ----- 2 2 19927 111623 enq: TX - index contention 2 2840 17657 83344 buffer busy waits 2 3400 22853 105251 not in wait *** 2019-08-21 23:58:53.766 HM: Short Stack of immediate waiter session ID 2840, OSPID 83344 of hang ID 1 Short stack dump: ksedsts() 465<-ksdxfstk() 32<-ksdxcb() 1927<-sspuser() 112<-__sighandler()<-semtimedop() 10<-skgpwwait() 178<-ksliwat() 2022<-kslwaitctx() 163<-kcbzwb() 1548<-kcbgcur() 8885<-ktfbhget() 254<-ktfbfsearch() 300<-ktfbtgex() 1240<-ktsxs_add() 2094<-ktspnr_next() 1115<-ktrsexec() 529<-ktspbmphwm() 1236<-ktspmvhwm() 53<-ktsp_bump_hwm() 212<-ktspgsp_main() 1941<-kdisnew() 279<-kdisnewle() 115<-kdisle() 4332<-kdiins0() 34743<-kdiinsp() 91<-kauxsin() 1784<-qesltcLoadIndexList() 922<-qesltcLoadIndexes() 55<-qerltcNoKdtBufferedInsRowCBK() 374<-qerltcSingleRowLoad() 279<-qerltcFetch() 379<-insexe() 691<-opiexe() 5632<-kpoal8() 2380<-opiodr() 917<-ttcpip() 2183<-opitsk() 1710<-opiino() 969<-opiodr() 917<-opidrv() 570<-sou2o() 103<-opimai_real() 133<-ssthrdmain() 265<-main() 201<-__libc_start_main() 253 HM: Short Stack of root session ID 3400, OSPID 105251 of hang ID 1 Short stack dump: ksedsts() 465<-ksdxfstk() 32<-ksdxcb() 1927<-sspuser() 112<-__sighandler()<-pwrite(
从上述trace文件可以看出系统HANG住的可能原因是session_id为 3400的会话引起的。根据查看dba_hist_active_sess_history视图可以看出session_id为 3400的会话就是alter database datafile resize的操作。
根据该故障现象,查询mos发现:Alter Tablespace Resize on Bigfile Tablespace Causes Sessions to Hang with ‘Enq: TX - contention’ and ‘Buffer busy waits’ (Doc ID 2089689.1) ,说明ORACLE 11.2.0.4及后续版本中存在该BUG,调整表空间大小时将阻塞表空间中的其他DDL操作。
问题解决
- 建议在数据库负载较低的时候进行数据文件扩容
- 建议每次表空间扩容的大小不要过大
读完本文,觉得不够?2019 数据技术嘉年华来啦!现场大咖云集,与你共畅数据的魅力。现在加入,尽享超低票价优惠,扫描下方二维码直接报名!
,