一.系統環境:
Oracle:9IR2
OS:Windows 2003 SERVER
二.問題描述:
現場人員報怨報表數據不准確,查明為韓國方面數據回傳中斷,聯系韓國相關人員後,給我發了個他那邊回傳程式報的一個錯:ORA-02049:timeout:distributed transaction waiting for lock.
三.問題分析:
找了下有關ORA-02049錯誤的簡短說明如下:
ORA-02049:timeout:distributed transaction waiting for lock
cause:exceeded INIT.ORA distributed_lock_timeout seconds waiting for lock.
action:treat as a deadlock.
以為系統哪做了改變,造成了死鎖了,導致系統內部將回傳程式所作的修改回滾了.
查看了下alert.log文件,沒有發現死鎖的信息.
再查看了下當前系統的鎖情況:
SELECT
lpad(' ', decode(l.xidusn, 0, 3, 0)) || l.Oracle_username User_name,
o.owner,
o.object_name,
o.object_type,
s.sid,
s.serial#
FROM v$locked_object l, dba_objects o, v$session s
WHERE l.object_id = o.object_id
AND l.session_id = s.sid
ORDER BY o.object_id, xidusn DESC;
OWNER OBJECT_NAME OBJECT_TYPE SID SERIAL# SEC CONSUMABLESPEC TABLE 50 3558
SEC PROCESSGROUP TABLE 41 796
SEC PROCESSGROUPHISTORY TABLE 16 2535
SEC NCDEFECTHISTORY TABLE 34 46603
對比了一下報表,上面的涉及到的4張表正好是回傳中斷的表.
查看了下50,41,16,34這4個session的信息,狀態均為ACTIVE,LOGON_TIME卻是昨天中午11點多,這與4張表裡記錄的最大時間相差無幾.難道這4個session執行了將近半天了.
查看下v$session_wait是否有異常的等待事件,卻也沒有發現可疑的等待事件.
再次聯系韓國方面,要求回傳一下數據,以便做個跟蹤.這時再查看v$session_wait,出現了eqeue等待事件.
詳細查看一下系統的鎖等待情況:
SELECT DECODE(request,0,'Holder: ','Waiter: ')|| sid sess, id1, id2, lmode,
request, type
FROM V$LOCK
WHERE (id1, id2, type) IN (SELECT id1, id2, type FROM V$LOCK WHERE request>0)
ORDER BY id1, request;
這時就可以看到回傳程式在等待以上4個session持有的鎖.
問題的原因總算有個大概了,可能是因為某些原因,昨天中午11點的那次回傳程式意外中斷了.但是他們所持有的表鎖並沒有正常釋放,造成後面的回傳數據發生鎖等待.等待時間超出
參數distributed_lock_timeout定義的大小,ORA-02049錯誤也由此產生.順便查看一下此參數在系統中的定義.
SQL> show parameter distribut
NAME TYPE VALUE distributed_lock_timeout integer 60
呵呵...60秒.
四.問題解決:
簡單有效的辦法當然就是KILL這4個session了.讓它們釋放出所持有的表鎖.
SQL> alter system kill session '50,3558';
alter system kill session '50,3558'
ORA-00031: session marked for kill
SQL> alter system kill session '41,796';
alter system kill session '41,796'
ORA-00031: session marked for kill
SQL> alter system kill session '16,2535';
alter system kill session '16,2535'
ORA-00031: session marked for kill
SQL> alter system kill session '34,46603';
alter system kill session '34,46603'
ORA-00031: session marked for kill
執行的結果並不是system altered.只是標記為KILL狀態.重新查看系統的表鎖情況,並沒有釋放.
Windows平台下那就用ORAKILL吧....
先查看一下50,41,16,34這4個session在Windows下對應的線程號:
select s.sid, p.spid from v$session s, v$process p where s.sid in ('50','41','16','34') and s.paddr=p.addr;
SID SPID 50 4496
41 1436
16 6140
34 4360
利用ORAKILL工具將其KILL:
E:\>orakill samsung 4496
Kill of thread id 4496 in instance wiptest successfully signalled.
E:\>orakill samsung 1436
Kill of thread id 1436 in instance wiptest successfully signalled.
E:\>orakill samsung 6140
Kill of thread id 6140 in instance wiptest successfully signalled.
E:\>orakill samsung 4360
Kill of thread id 4360 in instance wiptest successfully signalled.
此時再查看系統的表鎖信息,可以發現沒有session持有這4張表的鎖了.至此,問題解決.