同事在Toad裡面執行SQL語句時,突然無線網絡中斷了,讓我檢查一下具體情況,如下所示(有些信息,用xxx替換,因為是在處理那些歷史歸檔數據,使用的一個特殊用戶,所以可以用下面SQL找到對應的會話信息):
SQL> SELECT B.USERNAME ,
2 B.SID ,
3 B.SERIAL# ,
4 LOGON_TIME ,
5 A.OBJECT_ID
6 FROM V$LOCKED_OBJECT A, V$SESSION B
7 WHERE A.SESSION_ID = B.SID AND B.USERNAME=&USERNAME
8 ORDER BY B.LOGON_TIME;
USERNAME SID SERIAL# LOGON_TIM OBJECT_ID
------------------------------ ---------- ---------- --------- ----------
xxxxxx 523 41890 06-MAY-16 825891
xxxxxx 523 41890 06-MAY-16 825892
執行了kill會話的語句後,檢查發現對應的會話仍然存在,只是SERIAL#值變化了,再次去kill會話時,出現ORA-00030錯誤,如下所示
SQL> alter system kill session '523, 41890' immediate;
System altered.
SQL> SELECT A.ORACLE_USERNAME ,
2 A.OS_USER_NAME ,
3 B.OWNER ,
4 B.OBJECT_NAME ,
5 A.SESSION_ID ,
6 A.PROCESS ,
7 A.LOCKED_MODE
8 FROM V$LOCKED_OBJECT A, DBA_OBJECTS B
9 WHERE B.OBJECT_ID = A.OBJECT_ID AND B.OWNER=&OWNER
10 ORDER BY A.ORACLE_USERNAME,
11 A.OS_USER_NAME;
ORACLE_USERNAME OS_USER_NAME OWNER OBJECT_NAME SESSION_ID PROCESS LOCKED_MODE
---------------- ------------- ----------- ----------------- ---------------------- -------------
xxxxxxxxxxxxxxx ZhanxxxnL xxxxxxxxxxxx INV_xxxx_HD 523 6208:7548 3
xxxxxxxxxxxxxxx ZhanxxxxL xxxxxxxxxxxx INV_xxxx_LINES 523 6208:7548 3
SQL> SELECT B.USERNAME ,
2 B.SID ,
3 B.SERIAL# ,
4 LOGON_TIME ,
5 A.OBJECT_ID
6 FROM V$LOCKED_OBJECT A, V$SESSION B
7 WHERE A.SESSION_ID = B.SID
AND B.USERNAME=&USERNAME
8 ORDER BY B.LOGON_TIME;
USERNAME SID SERIAL# LOGON_TIM OBJECT_ID
------------------------------ ---------- ---------- --------- ----------
xxxxxxxxxxxxxx 523 41891 06-MAY-16 825892
xxxxxxxxxxxxxx 523 41891 06-MAY-16 825891
SQL> alter system kill session '523, 41891' immediate;
alter system kill session '523, 41891' immediate
*
ERROR at line 1:
ORA-00030: User session ID does not exist.
在metalink上,查看了ORA-00030錯誤的描述、原因、解決方案。如下所示
SQL> ho oerr ora 30
00030, 00000, "User session ID does not exist."
// *Cause: The user session ID no longer exists, probably because the
// session was logged out.
// *Action: Use a valid session ID.
The command may have been issued for one or more of the following reasons:
1. The process no longer exists at the os level, but does show up as active in v$session.
2. The user reboots the client machine without logging off, leaving a shadow process.
3. That session is holding onto a lock that needs to be released.
CAUSE
This error occurs because PMON is already trying to kill the session.
This is indicated by the fact that the serial number keeps changing.
When PMON attempts to cleanup a dead session, it will increase the serial number.
PMON may take a long time to clean up the process. If the process was doing a very large transaction at the time it aborted, then PMON has to rollback the large transaction.
When PMON makes progress, i.e. if it manages to free at least some of the process's resource, it will repeatedly keep trying to delete the process. When it finally gets to the point where it can't free up any of the process's resource (i.e. there are no more free buffers), it will print a message to the trace file and try to delete that process a second time.
The problem is encountered when PMON lacks the resources needed to remove the process. If there are not enough buffers, then the removal of the process is delayed. This is a free buffer problem in the data cache.
SOLUTION
Encountering an ORA-30 when attempting to manually kill a process is not necessarily a bug but a result of trying to kill a process already marked as killed.
PMON can take anywhere from 5 minutes to over 24 hours to clean up a job. The impact is that often the process being cleaned up is holding locks that prevents others from performing certain operations.
The solution is to wait for PMON to clean up the process.
基本上只能等待pmon進程回收處理這個進程,等了十來分鐘,這個會話進程還是沒有被清理,於是我查看了一下會話的相關信息,在網上查看到相關資料,可以從系統層面kill掉會話
SQL>
SQL> select event from v$session_wait where sid=523;
EVENT
----------------------------------------------------------------
db file sequential read
SQL> select sql_text from v$session a,v$sqltext_with_newlines b
2 where decode(a.sql_hash_value, 0, prev_hash_value, sql_hash_value)=b.hash_value
3 and a.sid=&sid order by piece;
Enter value for sid: 523
old 3: and a.sid=&sid order by piece
new 3: and a.sid=523 order by piece
SQL_TEXT
----------------------------------------------------------------
DELETE from inv_xxx_lines WHERE (xxx) IN ( SELECT tr
ans_line_id FROM xxxx GROUP BY trans_line_id HAVING C
OUNT(xxxxx) > 1) AND ROWID NOT IN (SELECT MIN(ROWID) FRO
M xxxx GROUP BY xxx HAVING COUNT(*) > 1)
於是我嘗試從系統層面kill掉對應的系統進程。執行完成後,驗證發現對應的會話已經Kill掉了。不知道是湊巧pmon進程回收了這個會話進程還是真的能從系統進程能kill掉(因為不能重新這種場景),如果下次碰到這種場景,就可以測試、驗證了。特此記錄一下
SQL> ! kill -9 4884
參考資料:
https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=533785808734847&id=1011386.6&_afrWindowMode=0&_adf.ctrl-state=13ipo04jjr_4
http://www.linuxidc.com/Linux/2011-09/43730.htm