首先我們來看下朋友傳我的alert log:
? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43Sat May 24 18:29:27 2014 SMON: enabling tx recovery Sat May 24 18:29:27 2014 Database Characterset is ZHS16GBK Sat May 24 18:29:28 2014 FAST_START_MTTR_TARGET 300 is out of the valid MTTR range, use 540 instead. Sat May 24 18:31:36 2014 Errors in file e:\oracle\admin\oracle\bdump\oracle_smon_4044.trc: ORA-00600: internal error code, arguments: [kcoapl_blkchk], [1], [51720], [6401], [], [], [], [] ......省略部分內容 Sat May 24 18:55:36 2014 Errors in file e:\oracle\admin\oracle\udump\oracle_ora_404.trc: ORA-00600: 內部錯誤代碼,參數: [12700], [18], [4246724], [2], [], [], [], [] ORACLE Instance oracle (pid = 6) - Error 600 encountered while recovering transaction (10, 38) on object 36. Sat May 24 18:58:13 2014 Errors in file e:\oracle\admin\oracle\bdump\oracle_smon_4044.trc: ORA-00600: internal error code, arguments: [6006], [1], [], [], [], [], [], [] Sat May 24 19:09:55 2014 Errors in file e:\oracle\admin\oracle\bdump\oracle_smon_4044.trc: ORA-00600: internal error code, arguments: [kcoapl_blkchk], [1], [51720], [6401], [], [], [], [] ORACLE Instance oracle (pid = 6) - Error 607 encountered while recovering transaction (10, 37) on object 3. Sat May 24 19:17:50 2014 Errors in file e:\oracle\admin\oracle\bdump\oracle_smon_4044.trc: ORA-00607: Internal error occurred while making a change to a data block ORA-00600: internal error code, arguments: [kcoapl_blkchk], [1], [51720], [6401], [], [], [], [] Recovery of Online Redo Log: Thread 1 Group 3 Seq 4 Reading mem 0 Mem# 0 errs 0: E:\ORACLE\ORADATA\ORACLE\REDO03.LOG Sat May 24 19:18:59 2014 Errors in file e:\oracle\admin\oracle\udump\oracle_ora_4072.trc: ORA-00600: 內部錯誤代碼,參數: [12700], [18], [4246724], [2], [], [], [], [] Sat May 24 19:29:35 2014 Errors in file e:\oracle\admin\oracle\bdump\oracle_smon_4044.trc: ORA-00600: internal error code, arguments: [6006], [1], [], [], [], [], [], [] Sat May 24 19:38:19 2014 Errors in file e:\oracle\admin\oracle\udump\oracle_ora_2296.trc: ORA-00600: 內部錯誤代碼,參數: [12700], [18], [4246724], [2], [], [], [], []
可以看到雖然該數據庫能正常open,但是不停的報ora-00600錯誤,這裡主要涉及到3個internal錯誤,分別為如下:
ORA-00600: internal error code, arguments: [kcoapl_blkchk]
ORA-00600: internal error code, arguments: [6006]
ORA-00600: 內部錯誤代碼,參數: [12700]
我們先來看第一個錯誤:[kcoapl_blkchk],從該錯誤來看,可以判斷是block存在問題,而又異常的block為:file 1 block 51720
我們可以看到open後smon進程在進行事務rollback的時候,回滾(10, 37) 事務失敗了,主要是操作對象object 3.
這裡可以簡單的解釋一下:(10,37) 中,標示回滾段編號,37標示slot編號。
? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15SQL> select * from v$version; BANNER ---------------------------------------------------------------- Oracle9i Enterprise Edition Release 9.2.0.8.0 - Production PL/SQL Release 9.2.0.8.0 - Production CORE 9.2.0.8.0 Production TNS for Linux: Version 9.2.0.8.0 - Production NLSRTL Version 9.2.0.8.0 - Production SQL> select owner,object_name,object_type from dba_objects where object_id=3; OWNER OBJECT_NAME OBJECT_TYPE ------------------------------ ------------------------- ------------------ SYS I_OBJ# INDEX
我們可以確認,在Oracle 9i的環境中,object id=3 是i_obj#這個index。
我們從trace中搜索下,看下這個51720 的block有問題是什麼問題 ?
? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44buffer tsn: 0 rdba: 0x0040ca08 (1/51720) scn: 0x0000.0a2c991d seq: 0x01 flg: 0x06 tail: 0x991d0601 frmt: 0x02 chkval: 0x221e type: 0x06=trans data Hex dump of corrupt header 3 = CHKVAL 。。。。。 6BF8DFF0 00000000 00000000 00000000 991D0601 [................] Block Checking: DBA = 4246024, Block Type = KTB-managed data block **** actual rows locked by itl 2 = 1 != # in trans. header = 0 ---- end index block validation rechecking block failed with error code 6401 Disk Block image: buffer tsn: 0 rdba: 0x0040ca08 (1/51720) scn: 0x0000.0a2c991d seq: 0x01 flg: 0x06 tail: 0x991d0601 frmt: 0x02 chkval: 0x221e type: 0x06=trans data Block header dump: 0x0040ca08 Object id on Block? Y seg/obj: 0x3 csc: 0x00.a2c991b itc: 3 flg: O typ: 2 - INDEX fsl: 0 fnx: 0x0 ver: 0x01 Itl Xid Uba Flag Lck Scn/Fsc 0x01 0x0002.002.00020b74 0x0080080c.0f56.01 CB-- 0 scn 0x0000.09cdb7c4 0x02 0x0009.007.00021f49 0x00800093.10c7.21 --U- 1 fsc 0x0000.0a2c991d 0x03 0x0009.023.00021f3d 0x00800093.10c7.1f --U- 1 fsc 0x0012.0a2c991c Leaf block dump =============== header address 104080500=0x6342474 kdxcolev 0 KDXCOLEV Flags = - - - kdxcolok 0 kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y kdxconco 1 kdxcosdc 0 kdxconro 160 kdxcofbo 356=0x164 kdxcofeo 3406=0xd4e kdxcoavs 5096 kdxlespl 0 kdxlende 1 kdxlenxt 0=0x0 kdxleprv 4246023=0x40ca07 kdxledsz 8 kdxlebksz 8012
從上面的錯誤可以看出,實際上這個Index block中有2個需要操作的ITL,其LCK都為1. 而Oracle認為這個block
這裡需要操作的ITL應該只有1個。 這應該就是掉電導致沒寫入更改的原因。
換句話,要處理這個壞塊,我們只需要將第3個ITL的信息改掉就行了。 通過bbed可以很容易的處理,修改狀態,LCK等信息即可。
下面我繼續看第2個ora-00600 [6006]錯誤。 我以前講過多次,對於Oracle ora-00600錯誤,後面第一個錯誤號的范圍
是4000~8000,其都跟Oracle事務有關系。 這裡也不例外。 從這裡的錯誤來看,Oracle smon進程在恢復事務(10, 38) on object 36
時出現異常,進而拋出該錯誤。
1* select owner,object_name,object_type from dba_objects where object_id=36 SQL> / OWNER OBJECT_NAME OBJECT_TYPE ------------------------------ ------------------------- ------------------ SYS I_OBJ1 INDEX SQL>
可以看到,9i中,object id=36 也是一個Index。 從這裡的alert log來看,無法得到導致ora-00600 6006錯誤是什麼block導致。
我們這裡搜索smon trace即可,怎麼搜索呢,很簡單,搜索 seg/obj: 0×24即可,我們可以搜到如下內容:
? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39----- end of leaf block dump ----- [1] Error during leaf key undo operation: 3 count=3 KTB Redo op: 0x04 ver: 0x01 op: L itl: xid: 0x000a.02b.00021f77 uba: 0x008000c9.1036.4d flg: C--- lkc: 0 scn: 0x0000.0a2c96d4 Dump kdilk : itl=2, kdxlkflg=0x1 sdc=77869776 indexid=0x4000d9 block=0x0040cf0f purge leaf row (6): 05 c4 02 0d 15 16 dump block being looked at now Block header dump: 0x0040cf0f Object id on Block? Y seg/obj: 0x24 csc: 0x00.a2ca278 itc: 3 flg: O typ: 2 - INDEX fsl: 0 fnx: 0x40cf10 ver: 0x01 Itl Xid Uba Flag Lck Scn/Fsc 0x01 0x000a.00c.0001fe24 0x00800272.0eee.01 CB-- 0 scn 0x0000.098b9ed9 0x02 0x000a.00d.00021f8b 0x008000ce.1036.37 C--- 0 scn 0x0000.0a2c9bfa 0x03 0x0004.01a.00021f16 0x00800964.104c.3c C--- 0 scn 0x0000.0a2c9c22 Leaf block dump =============== header address 1804271732=0x6b8b0074 kdxcolev 0 KDXCOLEV Flags = - - - kdxcolok 0 kdxcoopc 0x80: opcode=0: iot flags=--- is converted=Y kdxconco 1 kdxcosdc 0 kdxconro 223 kdxcofbo 482=0x1e2 kdxcofeo 3017=0xbc9 kdxcoavs 4411 kdxlespl 0 kdxlende 0 kdxlenxt 0=0x0 kdxleprv 4247310=0x40cf0e kdxledsz 6 kdxlebksz 8012
根據這裡的uba信息,我們還能搜索到回滾段的信息,如下;
? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80index state cflags wrap# uel scn dba parent-xid nub stmt_num ------------------------------------------------------------------------------------------------ 0x00 9 0x00 0x21f89 0x0007 0x0000.0a2c9502 0x0080009c 0x0000.000.00000000 0x00000003 0x00000000 0x01 9 0x00 0x21f84 0x001e 0x0000.0a2c9542 0x0080009c 0x0000.000.00000000 0x00000001 0x00000000 。。。。。 0x24 9 0x00 0x21f82 0x0004 0x0000.0a2c949e 0x0080009a 0x0000.000.00000000 0x00000001 0x00000000 0x25 10 0x10 0x21f84 0x0001 0x0000.0a2c9771 0x008000ca 0x0000.000.00000000 0x00000001 0x00000000 0x26 10 0x90 0x21f84 0x0001 0x0000.0a2c976c 0x008000cc 0x0000.000.00000000 0x00000001 0x00000000 0x27 9 0x00 0x21f80 0xffff 0x0000.0a2dc56f 0x00000000 0x0000.000.00000000 0x00000000 0x00000000 0x28 9 0x00 0x21f87 0x0019 0x0000.0a2c976d 0x008000ca 0x0000.000.00000000 0x00000001 0x00000000 0x29 9 0x00 0x21f7d 0x000b 0x0000.0a2c976f 0x008000ca 0x0000.000.00000000 0x00000001 0x00000000 0x2a 9 0x00 0x21f7f 0x000f 0x0000.0a2c95c5 0x0080009c 0x0000.000.00000000 0x00000001 0x00000000 0x2b 9 0x00 0x21f77 0x0028 0x0000.0a2c96d4 0x008000cb 0x0000.000.00000000 0x00000002 0x00000000 0x2c 9 0x00 0x21f87 0x0018 0x0000.0a2c96a0 0x0080009e 0x0000.000.00000000 0x00000001 0x00000000 0x2d 9 0x00 0x21f74 0x0016 0x0000.0a2c9652 0x0080009e 0x0000.000.00000000 0x00000001 0x00000000 0x2e 9 0x00 0x21f7a 0x002a 0x0000.0a2c95b6 0x0080009c 0x0000.000.00000000 0x00000001 0x00000000 0x2f 9 0x00 0x21f85 0x001b 0x0000.0a2c9574 0x0080009c 0x0000.000.00000000 0x00000001 0x00000000 The buffer with tsn: 1 rdba: 0x00800099 (2/153) has already been dumped The buffer with tsn: 1 rdba: 0x00800099 (2/153) has already been dumped BH (0x6BBED3B8) file#: 2 rdba: 0x008000cc (2/204) class 36 ba: 0x6B8E4000 set: 3 dbwrid: 0 obj: -1 objn: 0 hash: [6c3ee6d0,6bbef05c] lru: [6b7ec6c8,6bbed578] ckptq: [NULL] fileq: [NULL] st: XCURRENT md: NULL rsop: 0x00000000 tch: 1 flags: gotten_in_current_mode LRBA: [0x0.0.0] HSCN: [0xffff.ffffffff] HSUB: [255] RRBA: [0x0.0.0] buffer tsn: 1 rdba: 0x008000cc (2/204) scn: 0x0000.0a2c9856 seq: 0x01 flg: 0x04 tail: 0x98560201 frmt: 0x02 chkval: 0xb53d type: 0x02=KTU UNDO BLOCK ******************************************************************************** UNDO BLK: xid: 0x000a.026.00021f84 seq: 0x1036 cnt: 0x14 irb: 0x13 icl: 0x0 flg: 0x0000 Rec Offset Rec Offset Rec Offset Rec Offset Rec Offset --------------------------------------------------------------------------- 0x01 0x1f74 0x02 0x1ef8 0x03 0x1ea4 0x04 0x1e30 0x05 0x1db4 0x06 0x1d60 0x07 0x1cec 0x08 0x1c70 0x09 0x1c1c 0x0a 0x1ba8 0x0b 0x1b2c 0x0c 0x1ad8 0x0d 0x1a64 0x0e 0x19e8 0x0f 0x1994 0x10 0x1938 0x11 0x18ec 0x12 0x1884 0x13 0x1828 0x14 0x17bc 。。。。。。 。。。。。。 *----------------------------- * Rec #0x12 slt: 0x26 objn: 18(0x00000012) objd: 18 tblspc: 0(0x00000000) * Layer: 11 (Row) opc: 1 rci 0x00 Undo type: Regular undo Begin trans Last buffer split: No Temp Object: No Tablespace Undo: No rdba: 0x00000000 *----------------------------- uba: 0x008000cc.1036.0d ctl max scn: 0x0000.0a2c930b prv tx scn: 0x0000.0a2c931e KDO undo record: KTB Redo op: 0x04 ver: 0x01 op: L itl: xid: 0x000a.02b.00021f77 uba: 0x008000c9.1036.4c flg: C--- lkc: 0 scn: 0x0000.0a2c96d4 KDO Op code: DRP row dependencies Disabled xtype: XA bdba: 0x0040ccc4 hdba: 0x00400079 itli: 2 ispac: 0 maxfr: 4863 tabn: 0 slot: 41(0x29) *----------------------------- * Rec #0x13 slt: 0x26 objn: 36(0x00000024) objd: 36 tblspc: 0(0x00000000) * Layer: 10 (Index) opc: 22 rci 0x12 Undo type: Regular undo Last buffer split: No Temp Object: No Tablespace Undo: No rdba: 0x00000000 *----------------------------- index undo for leaf key operations KTB Redo op: 0x04 ver: 0x01 op: L itl: xid: 0x000a.02b.00021f77 uba: 0x008000c9.1036.4d flg: C--- lkc: 0 scn: 0x0000.0a2c96d4 Dump kdilk : itl=2, kdxlkflg=0x1 sdc=77869776 indexid=0x4000d9 block=0x0040cf0f purge leaf row key <img class="wp-smiley" alt=":(" src=http://www.bkjia.com/uploads/allimg/140527/001H05C2-0.gif"> 6): 05 c4 02 0d 15 16 。。。。
從上面的信息回滾段頭的dump內容可以看出,該回滾段涉及2個活動事務,事務槽編號為0×25,0×26 即:37,38。 這和前面的錯誤是符合的.
我們可以清楚的看到,這裡的0×26的這個事務涉及的blockdump來看,LCK都是0,看上去沒啥異常,為什麼這個事務會有異常呢 ?
在Oracle中,smon 進行回滾操作,是以事務為單位進行的。對於undo而言,涉及到一個undo chain的結構。
關於Oracle undo chain,在我的Oracle特殊恢復課程裡面講過。
我們可以看出,0×26這個事務應該從0×13 這個record開始回滾,到0×12這個record這裡就結束。從信息來看似乎也沒有什麼不對的地方?
那為什麼這個事務會rollback失敗呢 ?
大家注意看undo record 0×13,0×12的XID 信息:xid: 0x000a.02b.00021f77
關於XID的結構,在我的Oracle特殊恢復課程裡面也有講解。第2部分其實表是ktuxe結構中的index編號。第3部分標示ktuxe中的wrap#。
從這裡看來,是Oracle沒來得及更改block中的信息。因此這裡我懷疑是undo有點問題,正常情況下,undo block中的xid的信息這裡
應該會更改為0x21f84,同時ktuxe中的事務狀態信息會更改,cflags的值也會更改為0×00. 這樣才標示一個事務commit結束。
最後我們來看下ORA-00600: 內部錯誤代碼,參數: [12700]這個錯誤。管喲12700錯誤,Oracle mos有一篇文檔:
ERROR: ORA-600 [12700] [a] [b] [c] VERSIONS: versions 6.0 to 9.2 DESCRIPTION: Oracle is trying to access a row using its ROWID, which has been obtained from an index. in Oracle 8.x and 9.x, it is ORA-600 [12700][a][b][c] , where Arg [a] dataobj# from sys.obj$ Arg [b] relative dba of the data block Arg [c] slot number of the row in the data block
根據上面的解釋來看下這個錯誤:
ORA-00600: 內部錯誤代碼,參數: [12700], [18], [4246724], [2], [], [], [], []
我們可以看出,問題出在obj#=18 這個對象上的block(dba地址4246724)上的第2個ITL。
可以通過dbms包可以將該dba地址進行轉換,我們來看下是什麼block :
? 1 2 3 4 5 6SQL> select dbms_utility.data_block_address_file(4246724) file_id, 2 dbms_utility.data_block_address_block(4246724) block_id from dual; FILE_ID BLOCK_ID ---------- ---------- 1 52420
我們搜索下trace,來看下該block的dump信息:
? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37BH (0x6B7EAD88) file#: 1 rdba: 0x0040ccc4 (1/52420) class 1 ba: 0x6B47C000 set: 3 dbwrid: 0 obj: 18 objn: 18 hash: [67e8aa14,6bfee6d0] lru: [6b7edcd0,6b7ee36c] LRU flags: hot_buffer ckptq: [NULL] fileq: [NULL] use: [67e3d1d0,67e3d1d0] wait: [NULL] st: XCURRENT md: SHR rsop: 0x00000000 tch: 6 LRBA: [0x0.0.0] HSCN: [0xffff.ffffffff] HSUB: [255] RRBA: [0x0.0.0] buffer tsn: 0 rdba: 0x0040ccc4 (1/52420) scn: 0x0000.0a2ca3e0 seq: 0x01 flg: 0x04 tail: 0xa3e00601 frmt: 0x02 chkval: 0x2d3b type: 0x06=trans data Block header dump: 0x0040ccc4 Object id on Block? Y seg/obj: 0x12 csc: 0x00.a2c9c1d itc: 2 flg: O typ: 1 - DATA fsl: 0 fnx: 0x40ccc2 ver: 0x01 Itl Xid Uba Flag Lck Scn/Fsc 0x01 0x0002.011.00021f3d 0x00800ace.1020.0a C--- 0 scn 0x0000.0a2c9c13 0x02 0x0004.01a.00021f16 0x00800964.104c.3b --U- 2 fsc 0x0096.0a2c9c22 data_block_dump,data header at 0x6b47c05c =============== tsiz: 0x1fa0 hsiz: 0x96 pbl: 0x6b47c05c bdba: 0x0040ccc4 76543210 flag=-------- ntab=1 nrow=66 frre=0 fsbo=0x96 fseo=0x6a2 avsp=0x1455 tosp=0x14ef 0xe:pti[0] nrow=66 offs=0 。。。。。
從dump來看,該block的第2個ITL 存在事務操作,鎖定了2行記錄. 這跟alert log的拋出的錯誤是符合的。
針對該錯誤,是Oracle讀取時發現index和table的數據不一致導致的。針對表obj$,我們可以來看下報錯的sql的執行計劃:
? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24Plan Table -------- ------------------------------------------------------------------------------------------------------------------------- | Operation | Name | Rows | Bytes | Cost | TQ |IN-OUT| PQ Distrib |Pstart| Pstop | ------------------------------------------------------------------------------------------------------------------------- | SELECT STATEMENT | | 0 | 0 | 0 | | | | | | | SORT ORDER BY | | 0 | 0 | 0 | | | | | | | FILTER | | 0 | 0 | 0 | | | | | | | NESTED LOOPS | | 0 | 0 | 0 | | | | | | | NESTED LOOPS OUTER | | 0 | 0 | 0 | | | | | | | NESTED LOOPS | | 0 | 0 | 0 | | | | | | | NESTED LOOPS | | 0 | 0 | 0 | | | | | | | TABLE ACCESS FULL | USER$ | 0 | 0 | 0 | | | | | | | TABLE ACCESS BY INDEX R | OBJ$ | 0 | 0 | 0 | | | | | | | INDEX RANGE SCAN | I_OBJ2 | 0 | 0 | 0 | | | | | | | TABLE ACCESS BY INDEX RO | SYN$ | 0 | 0 | 0 | | | | | | | INDEX UNIQUE SCAN | I_SYN1 | 0 | 0 | 0 | | | | | | | TABLE ACCESS BY INDEX ROW | USER$ | 0 | 0 | 0 | | | | | | | INDEX UNIQUE SCAN | I_USER1 | 0 | 0 | 0 | | | | | | | TABLE ACCESS CLUSTER | USER$ | 0 | 0 | 0 | | | | | | | INDEX UNIQUE SCAN | I_USER# | 0 | 0 | 0 | | | | | | | TABLE ACCESS BY INDEX ROWID | OBJ$ | 0 | 0 | 0 | | | | | | | INDEX RANGE SCAN | I_OBJ2 | 0 | 0 | 0 | | | | | | -------------------------------------------------------------------------------------------------------------------------
很明顯可以看出,這裡對於obj$表的訪問使用了2個Index,i_obj2. 看來問題就出在該對象之上。
這裡其實可以用過如下命令來判斷具體是什麼行的信息不匹配導致:
? 1analyze table obj$ validate structure cascade ;
定位到問題hang之後,由於這是bootstrap$的對象,因此無法通過在數據庫open的時候進行rebuild 來進行解決。
可以通過如下2種方式來解決該問題:
1) bbed modify index block
2) 通過bbed 將i_obj2這個index drop掉。