在SQL Server數據庫或OACLE數據庫當中,通常一個會話持有某個資源的鎖,而另一個會話在請求這個資源,就會出現阻塞(blocking)。這是DBA經常會遇到的情況。當出現SQL語句的阻塞時,很多人想查看阻塞的源頭(哪個SQL語句阻塞了哪個SQL),這樣方便直觀、簡潔明了的定位問題。但是很多時候,很多場景,我們通過SQL語句並不能或者說不容易定位到阻塞者(Blocker)的SQL語句,當然我們可以很容易找到被阻塞的SQL語句,以及它在等待的鎖資源。下面我們先分析一下SQL Server數據庫的這類場景,然後分析一下ORACLE數據庫的這類場景。如有不足的地方,敬請指出。
在SQL Server當中,我們先准備下面測試環境(測試用的表和數據)。
USE Test;
GO
CREATE TABLE Test
(
ID INT ,
NAME VARCHAR(12)
);
INSERT INTO Test
VALUES (1000, 'Kerry');
INSERT INTO Test
VALUES(1001, 'Jimmy');
場景1:我們構造這樣一個簡單的場景,例如如下:
在會話81中執行下面SQL語句
BEGIN TRAN
UPDATE Test SET NAME='Tina' WHERE ID=1000;
在會話72中執行下面SQL語句
SELECT * FROM TEST;
在另外一個會話窗口執行下面語句,查看阻塞(blocker)者和被阻塞者的SQL語句(這裡能夠定位到阻塞者(blocker)的SQL語句)。如下所示
SELECT wt.blocking_session_id AS BlockingSessesionId
,sp.program_name AS Blocking_ProgramName
,COALESCE(sp.LOGINAME, sp.nt_username) AS Blocking_HostName
,ec1.client_net_address AS ClientIpAddress
,db.name AS DatabaseName
,wt.wait_type AS WaitType
,ec1.connect_time AS BlockingStartTime
,wt.WAIT_DURATION_MS/1000 AS WaitDuration
,ec1.session_id AS BlockedSessionId
,h1.TEXT AS BlockedSQLText
,h2.TEXT AS BlockingSQLText
FROM sys.dm_tran_locks AS tl WITH(NOLOCK)
INNER JOIN sys.databases AS db WITH(NOLOCK)
ON db.database_id = tl.resource_database_id
INNER JOIN sys.dm_os_waiting_tasks AS wt WITH(NOLOCK)
ON tl.lock_owner_address = wt.resource_address
INNER JOIN sys.dm_exec_connections ec1 WITH(NOLOCK)
ON ec1.session_id = tl.request_session_id
INNER JOIN sys.dm_exec_connections ec2 WITH(NOLOCK)
ON ec2.session_id = wt.blocking_session_id
LEFT OUTER JOIN master.dbo.sysprocesses AS sp WITH(NOLOCK)
ON SP.spid = wt.blocking_session_id
CROSS APPLY sys.dm_exec_sql_text(ec1.most_recent_sql_handle) AS h1
CROSS APPLY sys.dm_exec_sql_text(ec2.most_recent_sql_handle) AS h2
但是這個場景是一個非常理想化的場景,實際場景中,可能會話81接下來會去執行其它SQL語句,它並不會一直停留在這個SQL語句上,例如,我們在會話81中執行SELECT GETDATE();這個SQL語句
BEGIN TRAN
UPDATE Test SET NAME='Tina' WHERE ID=1000;
SELECT GETDATE();
如上所示,此時查到的Blocker者的SQL語句為"SELECT GETDATE();", 而這個SQL其實和被阻塞的SQL沒有半毛關系。即使使用sp_WhoIsActive這樣專業的SQL亦是如此。
當然我們可以查看其等待的鎖對象信息,這也是我們所能追蹤、捕獲的。如下所示:
<Database name="Test">
<Locks>
<Lock request_mode="S" request_status="GRANT" request_count="1" />
</Locks>
<Objects>
<Object name="Test" schema_name="dbo">
<Locks>
<Lock resource_type="OBJECT" request_mode="IS" request_status="GRANT" request_count="1" />
<Lock resource_type="PAGE" page_type="*" request_mode="IS" request_status="GRANT" request_count="1" />
<Lock resource_type="RID" page_type="*" request_mode="S" request_status="WAIT" request_count="1" />
</Locks>
</Object>
</Objects>
</Database>
這種場景,如果只是某個會話發出的即席查詢,那麼你幾乎已經很難捕獲到阻塞的源頭UPDATE Test SET NAME='Tina' WHERE ID=1000這個SQL語句了。除非你結合其它一些手段,逆向推斷。
場景2:上面查找SQL阻塞的SQL語句,有時候只能定位到某一個存儲過程或一大段即席查詢SQL。
例如,下面一個構造的存儲過程,一個用戶正在一個會話當中執行它,
CREATE PROCEDURE PRC_TEST
AS
BEGIN
BEGIN TRAN TR1
UPDATE Test SET NAME='YourName' WHERE ID=1000;
SELECT * FROM sys.sysprocesses WHERE spid=@@SPID;
WAITFOR DELAY '00:00:20';
COMMIT TRAN TR1;
END
GO
另外一個用戶在另外一個會話執行下面查詢SQL語句
SELECT * FROM TEST;
查看阻塞的歷史記錄
你會看到捕獲的是整個存儲過程,當然這個測試案例很容易知道是那個SQL語句阻塞了,實際的存儲過程可能業務很復雜,SQL語句也非常多,你想從一個存儲過程裡面找到阻塞者(Blocker)的SQL語句其實是非常麻煩的。需要你仔細甄別,當存儲過程的業務邏輯復雜,SQL語句非常多時,這是一個頭痛的事情。
其實遇到這些場景,我們大可不必一定要查看阻塞這(Blocker)的具體SQL,我們只需要查看被阻塞者,等待的鎖對象資源的相關信息即可,你可以大致判斷到底是一個什麼類型的SQL導致了這類阻塞。
那麼我們接下來看看ORACLE數據庫場景吧。我們先准備一個測試環境(測試表和相關數據)
CREATE TABLE "TEST"."TEST"
( "ID" NUMBER,
"NAME" VARCHAR2(12)
);
INSERT INTO TEST
SELECT 1001, 'jimmy' FROM DUAL UNION ALL
SELECT 1002, 'Kerry' FROM DUAL;
COMMIT;
接下來我們在會話窗口一執行下面SQL:
[oracle@DB-Server ~]$ sqlplus test/test
SQL*Plus: Release 11.2.0.1.0 Production on Tue Aug 30 10:16:43 2016
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> show user;
USER is "TEST"
SQL> UPDATE TEST SET NAME='KKK' WHERE ID =1001;
1 row updated.
SQL>
在另外一個會話窗口二執行下面SQL
[oracle@DB-Server ~]$ sqlplus test/test
SQL*Plus: Release 11.2.0.1.0 Production on Tue Aug 30 10:17:22 2016
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> show user;
USER is "TEST"
SQL> UPDATE TEST SET NAME='Ken' WHERE ID =1001;
然後我們在第三個窗口執行下面SQL語句,查看阻塞和被阻塞的SQL語句
SELECT dba_objects.object_name,
locks_t.row#,
locks_t.blocked_secs,
locks_t.blocker_text,
locks_t.blocked_text,
locks_t.blocked_sql_text,
locks_t.blocking_sql_text
FROM (SELECT /*+ NO_MERGE */
blocking_lock_session.username||'@'||blocking_lock_session.machine||'(SID='||blocking_lock_session.sid||') ['||
blocking_lock_session.program||'/PID='||blocking_lock_session.process||']' as blocker_text,
blocked_lock_session.username||'@'||blocked_lock_session.machine|| '(SID='||blocked_lock_session.sid||') ['||
blocked_lock_session.program||'/PID='||blocked_lock_session.process||']' as blocked_text,
blocked_lock_session.row_wait_obj#,
blocked_lock_session.row_wait_file#,
blocked_lock_session.row_wait_block#,
blocked_lock_session.row_wait_row#,
DBMS_ROWID.ROWID_CREATE (1,
blocked_lock_session.row_wait_obj#,
blocked_lock_session.row_wait_file#,
blocked_lock_session.row_wait_block#,
blocked_lock_session.row_wait_row#) row#,
blocked_lock_session.seconds_in_wait blocked_secs,
blocked_sql.sql_text blocked_sql_text,
blocking_sql.sql_text blocking_sql_text
FROM v$lock blocking_lock,
v$session blocking_lock_session,
v$lock blocked_lock,
v$session blocked_lock_session,
v$sql blocked_sql,
v$sql blocking_sql
WHERE blocking_lock.block = 1
AND blocking_lock.id1 = blocked_lock.id1
AND blocking_lock.id2 = blocked_lock.id2
AND blocked_lock.request > 0
AND blocking_lock.sid = blocking_lock_session.sid
AND blocked_lock.sid = blocked_lock_session.sid
AND blocked_lock_session.sql_id = blocked_sql.sql_id
AND blocked_lock_session.sql_child_number = blocked_sql.child_number
AND blocking_lock_session.PREV_SQL_ADDR(+) =blocking_sql.ADDRESS
) locks_t,
dba_objects
WHERE locks_t.row_wait_obj# = dba_objects.object_id
ORDER BY locks_t.blocked_secs;
如果我們在會話窗口1,再執行一個語句,如下所示
SQL> show user;
USER is "TEST"
SQL> UPDATE TEST SET NAME='KKK' WHERE ID =1001;
1 row updated.
SQL> select * from dual;
D
-
X
此時捕獲到的是select * from dual; 這個SQL跟被阻塞的SQL沒有任何關系,當然如果你繼續在會話窗口執行其它SQL語句,捕獲的都是不相關的SQL語句,已經沒有任何意義
出現這個問題,是因為當一個會話正在執行某個SQL語句,那麼v$session視圖中的SQL_ID記錄的是正在執行SQL的SQL_ID,當會話空閒或執行其它SQL語句後,SQL_ID就會變化,PRE_SQL_ID記錄上一個執行完的SQL的SQL_ID值,PREV_SQL_ADDR也是如此。如下英文所述 :
According to the Reference Manual entry for V$SESSION the SQL_ID column represents the current SQL statement being executed by a session. If the session is idle there is no current SQL statement. Also if a session performs an update then performs a query the SQL_ID would reflect the query and not the update which is the statement that is blocking. There is in fact no query that is guaranteed to find the blocking SQL. Unless the blocking statement is the current statement all you can find for sure I the blocking session
如果你不用SQL*Plus,使用PL/SQL Developer這個工具,你會看到BLOCKING_SQL_TEST永遠都是begin sys.dbms_output.get_line(line => :line, status => :status); end; 這個是因為PL/SQL Developer在執行完SQL後,會調用其它SQL語句,當然SQL Developer不會有這樣的問題。
所以綜上述,想要找到阻塞的源頭SQL語句,只用SQL查詢,其實在很多場景是不太現實的,所以很多SQL語句都只給出阻塞者的會話信息或鎖定對象信息。如下所示
會話ID為8的會話執行下面SQL
UPDATE TEST SET NAME='TEST' WHERE ID=1001;
會話ID為137的會話執行下面SQL
UPDATE TEST SET NAME='TES1' WHERE ID=1001;
然後我們使用get_locked_objects_rpt.sql查看被阻塞的SQL,以及鎖定相關對象的信息(get_locked_objects_rpt.sql請參考get_locked_objects_rpt.sql)
SQL> @get_locked_objects_rpt.sql
Enter value for 1: 6
old 42: AND locks_t.blocked_secs > &1
new 42: AND locks_t.blocked_secs > 6
========= $Revision: 1.4 $ ($Date: 2013/09/16 13:15:22 $) ===========
Locked object : TEST
Locked row# : AAASEkAAEAAAADVAAA
Blocked for : 19 seconds
Blocker info. : TEST@GFG1\GET253194(SID=8) [plsqldev.exe/PID=17988:14616]
Blocked info. : TEST@get253194(SID=137) [SQL Developer/PID=17780]
Blocked SQL : UPDATE TEST SET NAME='TES1' WHERE ID=1001
Found 1 blocked session(s).
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
[oracle@DB-Server ~]$
然後我通過上面的Locked Object知道被鎖定的對象為Test表的ROWID為AAASRCAAEAAAADVAAA的記錄,如下所示