一數據庫突然連接不上,在自己電腦上使用SQL Developer也連接不上。立即使用SecureCRT連接上了這台服務器,從下面幾個方面檢查。
1:檢查了數據庫的狀態是否正常
$ sqlplus / as sysdba
SQL*Plus: Release 10.2.0.5.0 - Production on Mon May 16 11:07:55 2016
Copyright (c) 1982, 2010, Oracle. All Rights Reserved.
Connected to:
Oracle Database 10g Release 10.2.0.5.0 - 64bit Production
SQL> select status from v$instance;
STATUS
------------
OPEN
SQL>
2:檢查監聽日志是否出現錯誤和異常
$ more listener.log | grep TNS
TNS-12502: TNS:listener received no CONNECT_DATA from client
TNS-12502: TNS:listener received no CONNECT_DATA from client
TNS-12502: TNS:listener received no CONNECT_DATA from client
TNS-12502: TNS:listener received no CONNECT_DATA from client
TNS-12502: TNS:listener received no CONNECT_DATA from client
TNS-12502: TNS:listener received no CONNECT_DATA from client
如上所示,監聽日志沒有很特別的錯誤出現,使用tail -60f listener.log 也能看到監聽進程一直處理用戶請求,但是新的連接無法登錄到數據庫。事後檢查(這種情況持續的實際較短,我還在檢查時,數據庫已經正常了,數據庫又能正常連接),發現在12:59出現了TNS-12535錯誤,跟這個沒有關系。
TNS-12525: TNS:listener has not received client's request in time allowed
TNS-12535: TNS:operation timed out
TNS-12606: TNS: Application timeout occurred
TNS-12502: TNS:listener received no CONNECT_DATA from client
TNS-12502: TNS:listener received no CONNECT_DATA from client
3:檢查告警日志和跟蹤日志
檢查告警日志,沒有發現任何錯誤信息,但是在這個時間點出現了幾個trc文件
*** 2016-05-13 12:39:57.979
NS Primary Error: TNS-12535: TNS:operation timed out
NS Secondary Error: TNS-12606: TNS: Application timeout occurred
kmduicxd: 0x7f88cafc4620, kmduiflg: 1, circuit: 0x8507edda0
(circuit) dispatcher process id = (0x85c694738, 1)
parent process id = (18, 1)
serial # = 2
connection context = 0x7f88cafc4620
user session = ((nil)), flag = (100c0), queue = (9)
current buffer = (0), status = (4, 0)
Client Address = (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.xxx.xxx)(PORT=60049))
*** 2016-05-13 12:40:25.202
NS Primary Error: TNS-12535: TNS:operation timed out
NS Secondary Error: TNS-12606: TNS: Application timeout occurred
kmduicxd: 0x7f88cafc4dd0, kmduiflg: 1, circuit: 0x8507e8210
(circuit) dispatcher process id = (0x85c694738, 1)
parent process id = (18, 1)
serial # = 2
connection context = 0x7f88cafc4dd0
user session = ((nil)), flag = (100c0), queue = (9)
current buffer = (0), status = (4, 0)
Client Address = (ADDRESS=(PROTOCOL=tcp)(HOST=192.168.xxx.xxx)(PORT=58630))
*** 2016-05-13 12:40:45.147
NS Primary Error: TNS-12535: TNS:operation timed out
NS Secondary Error: TNS-12606: TNS: Application timeout occurred
kmduicxd: 0x7f88cafc52f0, kmduiflg: 1, circuit: 0x8507e5448
(circuit) dispatcher process id = (0x85c694738, 1)
parent process id = (18, 1)
serial # = 2
connection context = 0x7f88cafc52f0
user session = ((nil)), flag = (100c0), queue = (9)
current buffer = (0), status = (4, 0)
查了一下metalink文檔上關於TNS-12535的錯誤,如下所示
Error TNS-12535 when connecting to Oracle database via dispatchers (shared servers).
The dispatcher trace shows following error message -
NS Primary Error: TNS-12535: TNS:operation timed out
NS Secondary Error: TNS-12606: TNS: Application timeout occurred
kmduicxd: 0EEE01DC, kmduiflg: 1, circuit: 4D08C930
(circuit) dispatcher process id = (85917028, 1)
parent process id = (64, 1)
serial # = 44
connection context = 0EEE01DC
user session = (00000000), flag = (100c0), queue = (9)
current buffer = (0), status = (4, 0)
Client Address = (ADDRESS=(PROTOCOL=tcp)(HOST=<IP address>)(PORT=<port>))
The SHARED_SERVER parameter is set to an inadequate value and should be raised. A connection via shared server will fail if there are no free shared servers available to honor the request.
The 12535 error in the dispatcher trace indicates that the connection was timed out in the dispatchers queue.
Increase value for following database parameters -
max_shared_servers
shared_servers
These parameters are modifiable using ALTER SYSTEM. For example -
SQL> alter system set shared_server=30;
其中有這麼一段描述,在共享服務器連接模式當中,如果SHARED_SERVER參數設置過小或不足,如果沒有空閒的shared server進程可用,那麼通過共享服務器模式連接到數據庫的連接請求就會失敗, TNS-12535錯誤就會出現dispatcher trace文件中,表明dispatcher隊列中的請求超時。
The SHARED_SERVER parameter is set to an inadequate value and should be raised. A connection via shared server will fail if there are no free shared servers available to honor the request.
The 12535 error in the dispatcher trace indicates that the connection was timed out in the dispatchers queue.
我們的shared_server和max_shared_server參數一直以來都較合理,突然出現這種情況是為什麼額? 後面再DPA的監控裡面看到,當時有個用戶使用Toad更新了某個表,但是他後面意識到自己忘記在UPDATE語句裡面添加WHERE條件了,就回滾了該SQL語句,這個時間段導致該表被鎖,很多會話被阻塞。這就能解釋為什麼出現這種情況了。
接下來,我們通過下面例子來重現這個案例情況吧,我們先准備測試環境,如下所示,我們設置shared_servers、max_shared_servers、兩個參數的值。由於是測試環境,我們盡量將這些設置小一點,方便我們測試。
SQL> alter system set shared_servers=4 scope=both;
System altered.
SQL> show parameter shared_server
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
max_shared_servers integer 30
shared_server_sessions integer
shared_servers integer 4
SQL> alter system set max_shared_servers=4 scope=both;
System altered.
SQL> show parameter shared_server
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
max_shared_servers integer 4
shared_server_sessions integer
shared_servers integer 4
SQL>
創建test用戶,然後以test用戶登錄數據庫,執行下面操作後,退出會話。
SQL> create table test(id number(10), name varchar(12));
Table created.
SQL> insert into test
2 select 1, 'kerry' from dual union all
3 select 2, 'jimmy' from dual union all
4 select 3, 'jerry' from dual;
3 rows created.
SQL> commit;
Commit complete.
SQL> exit;
然後開啟四個cmd命令窗口分別執行這個sql語句,當我們開啟第五個cmd命令窗口時,依然能連接執行SQL(因為第一個會話變成INACTIVE狀態,SERVER變成NONE狀態了)
C:\Users>sqlplus test/test@mytest
SQL*Plus: Release 11.2.0.1.0 Production on 星期一 5月 16 23:46:13 2016
Copyright (c) 1982, 2010, Oracle. All rights reserved.
連接到:
Oracle Database 10g Release 10.2.0.5.0 - 64bit Production
SQL> update test set name='kkk' where id=1;
已更新 1 行。
SQL>
會話窗口1
會話窗口2、3、4、5都如下所示
開啟第六個窗口執行SQL命令時,報下面錯誤
使用sqlplus / as sysdba 登錄數據庫,執行下面命令,你會看到第一個會話的STATUS已經變成INACTIVE狀態了, SERVER為NONE狀態,所以開啟第五個會話窗口時才會依然能連接數據庫,當你開啟第6個會話窗口才報上面錯誤。可以看到第一個會話961,阻塞了其它四個會話。
檢查dispatcher的跟蹤文件,你就能看到下面這樣的錯誤信息。跟我們上面案例的幾乎是一樣的
當然,有時候你可能被ORA-00104這樣的錯誤給干擾測試。
此時你需要設置參數的inbound_connect_timeout,不要給的過大。否則測試結果很容易被上面錯誤干擾。
其實V$SESSION中,狀態為NONE的當前共享服務器進程沒有任務處理,若果進程正在執行某些任務則會為SHARED狀態。參考我這篇文章v$session中server為none與shared值解析
In MTS/Shared Server configuration when you see value 'NONE' (as shown below), it means there is no task being processed by shared server for that session. The server column will infact show status of 'SHARED' if there is some task being processed at that particular time by the shared server process for that session.
在共享服務器模式中,當並發工作的ACTIVE SESSION數大於max_shared_servers參數時,就不能在增加新的shared server processes,新的session請求就會得不到空閒的shared server processes的響應,dispatcher進程無法為會話分配一個服務器端進程(shared server perocesss)也就會出現數據庫連接不上的現象,最後出現超時現象。如果這個有點難以理解,那麼你想象一下,在一個餐廳裡面,假設有40位客人在就餐,但是只有4個服務器員,如果同時四個服務員正在服務4個客戶,那麼第五個客戶,就不能得到服務員的服務了。除非他等待其中一個客戶已經服務完了,如果時間太長,他就會放棄這個服務了。
小結:
保證事務是短小的--這是用於MTS(共享服務器模式)的第一條規則。它們可以是頻繁的,但它們應該是短小的(如OLTP 系統所表現的特點)。如果它們不是短小的,那麼由於共享資源被一些進程獨占,將導致系統速度整體下降。在極端的情況下,如果全部的共享服務都是忙的情況下,系統將被掛起。
所以對於事務具有時間短、頻率高特點的OLTP 系統,MTS 是最合適的。在OLTP 系統中,事務是在幾毫秒內執行完的。另一方面,MTS 非常不合適於數據倉庫。
--------------------------------------------------------------------------------------------------------------------------------
上面這段小結來自於《ORACLE專家高級編程》,用在此處非常恰當、合適!!!
參考資料:
https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=170439072788756&id=823274.1&_afrWindowMode=0&_adf.ctrl-state=222ndlb4e_34