在整理了DB2 LOG相關內容的基礎之上,這章整理lock的內容,相比較與log,lock的內容更多的與應用相關,涉及內容方方面面更加復雜。DBMS在本質上不同於文件系統的地方在於DBMS系統所支持的事務。 如果log的引入是為了保證持久化,那麼LOCK的引入就是保證事務串行化,解決事務的並發帶來了資源的RACE CONDITION。不同的事務場景,定義了不同的並發需求,為此定義了4種事務的隔離級別,不同的DBMS引入了不同的鎖機制來實現。DB2的串行化機制主要是通過LOCK,LATCH,CLAIM/DRAIM 來實現,具體的LOCK的相關屬性以及LOCK屬性對應用程序的影響比如OBJECT,SIZE,MODE,DURATION,PARTICIPANTS,PARAMETER LOCATIONS都進行了介紹。當引入了DBMS CLUSTER(DB2中為DATA SHARING GROUP,而ORACLE中為ORACEL RAC) 以後,為了處理不同不同MEMBER之間數據的一致性,DB2引入了PHYSICAL LOCK以及CF LOCK STRTURE 來實現全局LOCK 沖突檢測。單個SUBSYSTEM中,主要的沖突有TIMEOUT,DEADLOCK,在DATASHARING GROUP,引入了新的CONTENTION,XES CONTENTION,FALSE CONTENTION,GLOBAL LOCK CONTENTION。引入LOCK的同時不可避免的帶來了新的OVERHEAD,DB2主要通過IRLM,XES,CF等處理鎖資源的請求,除了地址空間除了鎖請求之外,由於應用本身設計不合理或是某些特定的場景帶來了很多問題,如TEMEOUT,DEADLOCK,LOCK ESCALATION等,如何有效的避免這些問題,需要系統運維人員以及開發人員共同努力。
本文的行文脈絡基於個人對DBMS LOCK的認知層次,行文的邏輯性,合理性,整理內容的知識面的廣度和深度都有待進一步的思考。
這篇博客從落筆到完成大體的框架,持持續續時間已經接近2周,應該是自己耗時最長的一篇博客,縱使如此,每一次查看,發下仍有新的東西需要自己補充,等後續會繼續補充自己的理解。
DBMS區別與文件系統的最本質區別:DBMS支持事務
DBMS :Allowing multiple users to access a database simultaneously without compromising data integrity.
A transaction (or unit of work)is a recoverable sequence of one or more SQL operations that are grouped together as a single unit, usually within an application process.
One of the mechanisms DB2 uses to keep data consistent is the transaction. A transaction or(otherwise known as a unit of work) is a recoverable sequence of one or more SQL operations that are grouped together as a single unit, usually within an application process. The initiation and termination of a single transaction defines points of data consistency within a database; either the effects of all SQL operations performed within a transaction are applied to the database and made permanent (committed), or the effects of all SQL operations performed are completely "undone" and thrown away (rolled back).
事務並發帶來的問題,為了解決這些問題,DB2定義了4種事務隔離級別。
Occurs when two transactions read the same data, both attempt to update the data read, and one of the updates is lost
Transaction 1 and Transaction 2 read the same row of data and both calculate new values for that row based upon the original values read. If Transaction 1 updates the row with its new value and Transaction 2 then updates the same row, the update operation performed by Transaction 1 is lost
Occurs when a transaction reads data that has not yet been committed
Transaction 1 changes a row of data and Transaction 2 reads the changed row before Transaction 1 commits the change. If Transaction 1 rolls back the change, Transaction 2 will have read data that theoretically, never existed.
Occurs when a transaction executes the same query multiple times and gets different results with each execution
Transaction 1 reads a row of data, then Transaction 2 modifies or deletes that row and commits the change. When Transaction 1 attempts to reread the row, it will retrieve different data values
Occurs when a row of data that matches some search criteria is not seen initially
Transaction 1 retrieves a set of rows that satisfy some search criteria, then Transaction 2 inserts a new row that contains matching search criteria for Transaction 1’s query. If Transaction 1 re-executes the query that produced the original set of rows, a different set of rows will be retrieved – the new row added by Transaction 2 will now be included in the set of rows returned
這裡的object是指鎖所施加的對象,不同的鎖所能適用的對象是不同的。如PAGE LOCK的對象肯定是PAGE.
OWNER 表示誰持有鎖,有的是TRAN,有的是DB2 MEMBER
LOCK PARTICIPANTS
Locking is a complex interaction of many parts.
Page/Row locks are not compatible with tablespace locks The solution is "Intent Locking" Regardless of the locksize, DB2 will always start with a tablespace lockINTENT LOCKS:IS/IX USED whenever page or row locks are being used
持有數據的時間:commit,across commit(with hold)
FROM:START OF FIRST USE
TO :COMMIT OR MOMENTIALIY
When modifying data (Insert, Update and Delete),DB2 determines all locking actions
每一種事務隔離級別鎖所能解決的問題
DECLARE upd_cur CURSOR FOR SELECT data1, data2 FROM table FOR UPDATE OF colxRead-Only
DECLARE ro_cur CURSOR FOR SELECT DEPTNO, AVG(SALARY) FROM EMP GROUP BY DEPTNOAmbiguous
DECLARE amb_cur CURSOR FOR SELECT data1, data2 FROM table=> Use FOR READ ONLY or FOR FETCH ONLY
下面分別介紹了系統參數,應用參數如何影響LOCK行為
OBJECT DEFINITION,CREATE TABLESPACE:
NOFOR Option:NO FOR update clause mandatory for positioned updates
STDSQL(YES): Implies NOFOR
BETWEEN SQL AND SQL:DB2 USE LOCKS
BETWEEN UTILITY AND SQL:DB2 USE CLAIM AND DRAIN
BASE:
FOR EACH OBJECT(TABLESPACE,PARTITION,TABLE,IDNEXSPACE),THERE IS A CLAIM-COUNT
WHICH IS INCREASED 1 AT START AND DECREASE BY 1 AT COMMIT
A utility starts with a Drain
No new Claims are allowed (wait, maybe timeout)
If the Claim-count >0 the utility waits (maybe timeout)
When the Claim-count reaches zero, the utility can continue
Two important situations:
A claim is a notification to DB2 that a particular object is currently being accessed. Claims usually do not continue to exist beyond the commit point one exception being a Cursor with Hold. In order to access the DB2 object within the next unit of work, an application needs to make a new claim.Claims notify DB2 that there is current interest in or activity on a DB2 object.Even if Uncommitted Read doesn’t take any locks, a claim is taken. As long as there are any claims on a DB2 object, no drains may be taken on the object until those claims are released.
A drain is the action of obtaining access to a DB2 object, by:Preventing any new claims against the object.Waiting for all existing claims on the object to be released.
A drain on a DB2 object causes DB2 to quiesce all applications currently claiming that resource, by allowing them to reach a commit point but preventing them (or any other application process) from making a new claim.A drain lock also prevents conflicting processes from trying to drain the same object at the same time.
Utilities detect claimers are present and wait
Drain Write waits for all write claims to be released
Drain All waits for claims on all classes to be released
SHRLEVEL(CHANGE) Utilities are CLAIMers
Latches – managed by DB2
? BM page latching for index and data pages
? DB2 internal latching (many latches grouped into 32 latch classes)
? Latches – managed by IRLM
? Internal IRLM serialization
雖然標題是advanced topics,其實涉及的內容並沒有本質的區別,只不過我們在看待問題,分析問題時需要從單個DB2 SUBSYSTEM的視角上升到DBMS CLUSTER的高度,即DATA SHARING GROUP的高度。為了處理不同的member之間共享數據的COHERENCY問題,DB2引入P-LOCK,區別主要在於單個DB2 SUBSYSTEM中的LOCK owner均為tran,而P-LOCK的owner為DB2 SUBSYSTEM.因此這裡的內容按照這種分類進行介紹L-LOCK,P-LOCK,RETAINED LOCK,這幾種鎖之間的區別需要了解。
特點:
THE KIND YOU HAVE IN BOTH DATA SHARING AND NON DATA SHARING SUBSYTEM THEN CONTROL DATA CONCRRRENCY OF ACCESS TO OBJECTS THEN CAN BE LOCAL OR GLOBAL THEY ASSOTIATED WITH PROGRAMS
WHY IS HIERACHICAL
SO THE MEMBER DB2 SUBSYSTEM WILL NOT PROPAGATE L-LOCKS TO THE CF UNNECESSARILY
特點:
EXCLUSIVE TO DATA SHARING
ARE ONLY GLOBAL
USED TO MAINTAIN DATA COHERENCY IN DATA SHARING
ALSO USED TO MAINTAIN EDMPOOL CONSISTENCY AMONG MEMBERS
ASSOTIATED BY(OWNED BY) DB2 MEMBERS
P-LOCK CAN BE NEGOTIATTED BETWEEN DB2 MEMBERS
NO TIMEOUT OR DEADLOCK DETECTION
分類
The ONLY way to clear retained locks is by estarting the failed DB2 subsystem.
A LOCK THAT A DB2 SHARING DB2 MEMBERHAS TOMAKE KNOWN TO OTHERS MEMBERS OF THIS DATA SHARING GROUP
THE IRLM VIA XES,PROPAGATE TO THES LOCKS TO THE STRUCTURE IN THE CF
XCF,XES的內容在XES contention部分介紹。
HOW ARE GLOBAL LOCK REQUESTS MADE KNOWN TO OTHER DB2 MEMBERS?
THEY ARE PROPAGATED TO DB2 LOCK STRUCTUREPage Set P-Lock Negotiation
與APPLICATION LOCK不同的是,PHYSCIAL LOCK 可以做Page Set P-Lock Negotiation。
上圖展示了INERDB2 READ/WRITE INTEREST場景下的PAGE SET P-LOCK NEGOTIATION。所謂的negotiation就是鎖請求方以及鎖申請方將鎖請求均DOWNGRADE,從而達到compatible的目的。需要注意的是,PAGE SET P-LOCK STATE狀態的變化需要涉及IRLM以及XCF通訊機制。PAGE SET P-LOCK對GLOBAL BUFFER POOL起一種標志作用,
對於READ的DB2 SUBSYSTEM, must register its
interest in read pages in the GBP directory
對於UPDATER的DB2 SUBSYSTEM,must write updated pages to the group buffer pool and register those updated pages in the GBP directory
DEVIDE INTO TWO PARTS OF CF LOCK STRUCTURE RATIO OF LAST TWO COMPONETS DEFAULT ROUGHLY 50%CAN BE ALTERED BY IRLM PARAMETER
OVERVIEW OF MRL STRUCTURE
ACTIVEstatus of active means that the DB2 system is running.
Retained, on the other hand, means that a DB2 and/or its associated IRLM has failed
CONTAINS ENTRIES FOR ALL CRURENTLY HELD MODIFY LOCKS
IN THE EVENT OF DB2 FAILTURE ENSURE THAT OTHER MEMBER OF DB2 KNOW WHICH LOCKS MUST BERETIANED
It is essentially a list structure that lists all the update or modify locks in the data sharing group
DISPLAY GROUP ; Display “LIST ENTRIES IN USE”
NUMBER LIST ENTRIES : 7353, LIST ENTRIES IN USE: 1834
WARNING: When the LIST ENTRIES IN USE catches up to the NUMBER LIST ENTRIES, new modify lock requests will not be granted, and transactions will begin to fail with -904s.
A 'POOL' OF HASH POINTERS
CONTAINS ENTRIES FOR ALL READ AND MODIFY LOCKS
USED FOR GLOBAL LOCK CONTENTION
It is very difficult to keep track of how full a randomly populated table is, so the only way to track its utilization is indirectly. In DB2 data sharing, that tracking mechanism is false contention
SIZE OF EACH INDIVIDUAL ENTRY SET BY MAXUSRS PARAMETER IN IRLMPROC OF FIRST IRLM JOIN THE GROUP
NUMBER OF MEMBER IN THE GROUP
其實主要的目的就是提供INTER-SUBSYTEM之間通訊的API,主要的功能有三個:
產生的根源為XES本身設計時僅支持兩種鎖類型,
?XES Contention = XES-level resource contention as XES
only understands S or X
eg member 1 asking for IX and member 2 for IS
Big relief in V8
產生的根源主要是由於DB2 lock table定義的太小或是hash算法離散型不夠好,導致不同的DB2資源hash到同一lock table hash class上而產生的沖突,可以通過放大CF lock table來解決,因此FALSE CONTENTION標志著 CF LOCK STRUCTURE的大小是否合適。
GLOBAL CONTENTIONGLOBAL CONTENTION 本身包含XES CONTENTION,FALSE CONTENTION,REAL CONTENTION,其中REAL CONTENTION 確實為資源沖突,如資源熱點,其它的兩種沖突是由於引入DATA SHARING 的OVERHEAD
根據上面的介紹,下面給出A DB2 Performance Tuning Roadmap-2data sharing BLOCK的一個展示
L-LOCK(SHARING)部分表明了數據共享的程度
P/L XES CONTENTION表明P-LOCK,L-LOCK中需要上送CF的比例。其中P-LOCK需要上送的大部分為PAGE P-LOCK.
WHEN YOU Running Out of Space in the Lock Structure:
DXR170IDJP5005 THE LOCK STRUCTURE DSNDB0G_LOCK1 IS 50%(60%,70%) IN USE
DXR142IDJP5005 THE LOCK STRUCTURE DSNDB0G_LOCK1 IS zz% IN USE
ACTION:
DYNAMICALLY:SETXCF START,ALTER,STRNAME=DSNDB0G_LOCK1,SIZE=newsize Only Increases Lock List Table (MRL)REBUILDING – Two Methods:Letting IRLM Split Lock Structure (1:1) Change the INITSIZE, SIZE in CFRM Policy START New Policy Rebuild Lock Structure Using LTE= or F xxxx,SET,LTE=yy
Concurrency warning signs
(Periodic) monitoring for concurrency problems
DISPLAY DATABASE USE/RESTRICT/CLAIMERS/LOCKSDoes not provide info about all locks (eg. page/row locks are not shown)DISPLAY THD TYPE(* / ACTIVE / INACTIVE / SYSTEM / INDOUBT / POSTPONED / PROC) Note that ‘*’ displays only active, indoubt, postponed, and system threads Use LIMIT(*) to avoid output truncation
EXAMPLE:
>
HOW TO INTERPRETING -DISPLAY LOCKS OUTPUT:
DSN397I -DB8A
NAME TYPE PART STATUS CONNID CORRID LOCKINFO
TSNAME TS RW TSO XXXX H-IX,S,C
AGENT TOKEN(9999)
TABLE OF LOCKS
SOMETHING YOU NEED TO NOTICE:
PLAN_TABLE ‐ TSLOCKMODE column
Info depends on whether or not DB2 can determine the isolation level at bind/prepare time
Only applies to gross L‐ locks (TS/table lock)
關於trace的內容,在新年開篇第一篇博客裡面已經進行了介紹,這裡主要介紹一些highlight:
subsystem services/drda remote locations
non-data sharing locks
datasharing lock section
system addres cpu time
latch contention section
DML/DDL/DCL
GBP SECTION(稍後單獨拿出一章進行整理)
如果你想更深入的一步進行測試,可以通過臨時開啟對應的Zooming in using detailed trace data進行,最好在vendor的建議下開啟。
NEXT WHAT TO CONSIDER ABOUT TRANSACTION LOCKING: