程式師世界 >> 編程語言 >> JAVA編程 >> JAVA綜合教程 >> GC之詳解CMS收集過程和日志分析，gc詳解cms日志分析

GC之詳解CMS收集過程和日志分析，gc詳解cms日志分析

編輯：JAVA綜合教程

GC之詳解CMS收集過程和日志分析，gc詳解cms日志分析

2016-08-23 關於GC的算法和垃圾收集器的種類就暫且不說了，網上有大把的資料供參考

話題引入

讓我們先簡單的看下整個堆年輕代和年老代的垃圾收集器組合（以下配合java8完美支持，其他版本可能稍有不同），其中標紅線的則是我們今天要著重講的內容：

ParNew and CMS

"Concurrent Mark and Sweep" 是CMS的全稱，官方給予的名稱是：“Mostly Concurrent Mark and Sweep Garbage Collector”;

年輕代：采用 stop-the-world mark-copy 算法；

年老代：采用 Mostly Concurrent mark-sweep 算法；

設計目標：年老代收集的時候避免長時間的暫停；

能夠達成該目標主要因為以下兩個原因：

1 它不會花時間整理壓縮年老代，而是維護了一個叫做 free-lists 的數據結構，該數據結構用來管理那些回收再利用的內存空間；

2 mark-sweep分為多個階段，其中一大部分階段GC的工作是和Application threads的工作同時進行的（當然，gc線程會和用戶線程競爭CPU的時間），默認的GC的工作線程為你服務器物理CPU核數的1/4；

補充：當你的服務器是多核同時你的目標是低延時，那該GC的搭配則是你的不二選擇。

日志

GC日志初體驗

首先對整個GC日志有一個大概的認知

2016-08-23T02:23:07.219-0200: 64.322: [GC (Allocation Failure) 64.322: [ParNew: 613404K->68068K(613440K), 0.1020465 secs] 10885349K->10880154K(12514816K), 0.1021309 secs] [Times: user=0.78 sys=0.01, real=0.11 secs]

2016-08-23T02:23:07.321-0200: 64.425: [GC (CMS Initial Mark) [1 CMS-initial-mark: 10812086K(11901376K)] 10887844K(12514816K), 0.0001997 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2016-08-23T02:23:07.321-0200: 64.425: [CMS-concurrent-mark-start]
2016-08-23T02:23:07.357-0200: 64.460: [CMS-concurrent-mark: 0.035/0.035 secs] [Times: user=0.07 sys=0.00, real=0.03 secs]
2016-08-23T02:23:07.357-0200: 64.460: [CMS-concurrent-preclean-start]
2016-08-23T02:23:07.373-0200: 64.476: [CMS-concurrent-preclean: 0.016/0.016 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
2016-08-23T02:23:07.373-0200: 64.476: [CMS-concurrent-abortable-preclean-start]
2016-08-23T02:23:08.446-0200: 65.550: [CMS-concurrent-abortable-preclean: 0.167/1.074 secs] [Times: user=0.20 sys=0.00, real=1.07 secs]
2016-08-23T02:23:08.447-0200: 65.550: [GC (CMS Final Remark) [YG occupancy: 387920 K (613440 K)]65.550: [Rescan (parallel) , 0.0085125 secs]65.559: [weak refs processing, 0.0000243 secs]65.559: [class unloading, 0.0013120 secs]65.560: [scrub symbol table, 0.0008345 secs]65.561: [scrub string table, 0.0001759 secs][1 CMS-remark: 10812086K(11901376K)] 11200006K(12514816K), 0.0110730 secs] [Times: user=0.06 sys=0.00, real=0.01 secs]
2016-08-23T02:23:08.458-0200: 65.561: [CMS-concurrent-sweep-start]
2016-08-23T02:23:08.485-0200: 65.588: [CMS-concurrent-sweep: 0.027/0.027 secs] [Times: user=0.03 sys=0.00, real=0.03 secs]
2016-08-23T02:23:08.485-0200: 65.589: [CMS-concurrent-reset-start]
2016-08-23T02:23:08.497-0200: 65.601: [CMS-concurrent-reset: 0.012/0.012 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]

Minor GC

2016-08-23T02:23:07.219-0200¹: 64.322²:[GC³(Allocation Failure⁴) 64.322: [ParNew⁵: 613404K->68068K⁶(613440K)⁷, 0.1020465 secs⁸] 10885349K->10880154K⁹(12514816K)¹⁰, 0.1021309 secs¹¹][Times: user=0.78 sys=0.01, real=0.11 secs]¹²

2016-08-23T02:23:07.219-0200 – GC發生的時間；
64.322 – GC開始，相對JVM啟動的相對時間，單位是秒；
GC – 區別MinorGC和FullGC的標識，這次代表的是MinorGC;
Allocation Failure – MinorGC的原因，在這個case裡邊，由於年輕代不滿足申請的空間，因此觸發了MinorGC;
ParNew – 收集器的名稱，它預示了年輕代使用一個並行的 mark-copy stop-the-world 垃圾收集器；
613404K->68068K – 收集前後年輕代的使用情況；
(613440K) – 整個年輕代的容量；
0.1020465 secs – 這個解釋用原滋原味的解釋：Duration for the collection w/o final cleanup.
10885349K->10880154K – 收集前後整個堆的使用情況；
(12514816K) – 整個堆的容量；
0.1021309 secs – ParNew收集器標記和復制年輕代活著的對象所花費的時間（包括和老年代通信的開銷、對象晉升到老年代時間、垃圾收集周期結束一些最後的清理對象等的花銷）；
[Times: user=0.78 sys=0.01, real=0.11 secs] – GC事件在不同維度的耗時，具體的用英文解釋起來更加合理:
- user – Total CPU time that was consumed by Garbage Collector threads during this collection
- sys – Time spent in OS calls or waiting for system event
- real – Clock time for which your application was stopped. With Parallel GC this number should be close to (user time + system time) divided by the number of threads used by the Garbage Collector. In this particular case 8 threads were used. Note that due to some activities not being parallelizable, it always exceeds the ratio by a certain amount.

我們來分析下對象晉升問題（原文中的計算方式有問題）：

開始的時候：整個堆的大小是 10885349K，年輕代大小是613404K，這說明老年代大小是 10885349-613404=10271945K，

收集完成之後：整個堆的大小是 10880154K，年輕代大小是68068K，這說明老年代大小是 10880154-68068=10812086K，

老年代的大小增加了：10812086-10271945=608209K，也就是說年輕代到年老代promot了608209K的數據；

圖形分析：

Full/Major GC

2016-08-23T11:23:07.321-0200: 64.425: [GC (CMS Initial Mark)¹ [1 CMS-initial-mark: 10812086K(11901376K)] 10887844K(12514816K), 0.0001997 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2016-08-23T11:23:07.321-0200: 64.425: [CMS-concurrent-mark-start]
2016-08-23T11:23:07.357-0200: 64.460: [CMS-concurrent-mark²: 0.035/0.035 secs] [Times: user=0.07 sys=0.00, real=0.03 secs]
2016-08-23T11:23:07.357-0200: 64.460: [CMS-concurrent-preclean-start]
2016-08-23T11:23:07.373-0200: 64.476: [CMS-concurrent-preclean³: 0.016/0.016 secs] [Times: user=0.02 sys=0.00, real=0.02 secs]
2016-08-23T11:23:07.373-0200: 64.476: [CMS-concurrent-abortable-preclean-start]
2016-08-23T11:23:08.446-0200: 65.550: [CMS-concurrent-abortable-preclean⁴: 0.167/1.074 secs] [Times: user=0.20 sys=0.00, real=1.07 secs]
2016-08-23T11:23:08.447-0200: 65.550: [GC (CMS Final Remark⁵)
[YG occupancy: 387920 K (613440 K)]65.550: [Rescan (parallel) , 0.0085125 secs]65.559: 
[weak refs processing, 0.0000243 secs]65.559: [class unloading, 0.0013120 secs]65.560: 
[scrub symbol table, 0.0008345 secs]65.561: [scrub string table, 0.0001759 secs][1 CMS-remark: 10812086K(11901376K)] 11200006K(12514816K), 0.0110730 secs] 
[Times: user=0.06 sys=0.00, real=0.01 secs]
2016-08-23T11:23:08.458-0200: 65.561: [CMS-concurrent-sweep-start]
2016-08-23T11:23:08.485-0200: 65.588: [CMS-concurrent-sweep⁶: 0.027/0.027 secs] [Times: user=0.03 sys=0.00, real=0.03 secs]
2016-08-23T11:23:08.485-0200: 65.589: [CMS-concurrent-reset-start]
2016-08-23T11:23:08.497-0200: 65.601: [CMS-concurrent-reset⁷: 0.012/0.012 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]

Phase 1: Initial Mark

這是CMS中兩次stop-the-world事件中的一次。它有兩個目標：一是標記老年代中所有的GC Roots；二是標記被年輕代中活著的對象引用的對象。

標記結果如下：

分析：

2016-08-23T11:23:07.321-0200: 64.42¹: [GC (CMS Initial Mark²[1 CMS-initial-mark: 10812086K³(11901376K)⁴] 10887844K⁵(12514816K)⁶, 0.0001997 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]⁷

016-08-23T11:23:07.321-0200: 64.42 – GC事件開始，包括時鐘時間和相對JVM啟動時候的相對時間，下邊所有的階段改時間的含義相同；
CMS Initial Mark – 收集階段，開始收集所有的GC Roots和直接引用到的對象；
10812086K – 當前老年代使用情況；
(11901376K) – 老年代可用容量；
10887844K – 當前整個堆的使用情況；
(12514816K) – 整個堆的容量；
0.0001997 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] – 時間計量；

Phase 2: Concurrent Mark

這個階段會遍歷整個老年代並且標記所有存活的對象，從“初始化標記”階段找到的GC Roots開始。並發標記的特點是和應用程序線程同時運行。並不是老年代的所有存活對象都會被標記，因為標記的同時應用程序會改變一些對象的引用等。

標記結果如下：

在上邊的圖中，一個引用的箭頭已經遠離了當前對象（current obj）

分析：

2016-08-23T11:23:07.321-0200: 64.425: [CMS-concurrent-mark-start]
2016-08-23T11:23:07.357-0200: 64.460: [CMS-concurrent-mark¹: 035/0.035 secs²] [Times: user=0.07 sys=0.00, real=0.03 secs]³

CMS-concurrent-mark – 並發收集階段，這個階段會遍歷整個年老代並且標記活著的對象；
035/0.035 secs – 展示該階段持續的時間和時鐘時間；
[Times: user=0.07 sys=0.00, real=0.03 secs] – 同上

Phase 3: Concurrent Preclean

這個階段又是一個並發階段，和應用線程並行運行，不會中斷他們。前一個階段在並行運行的時候，一些對象的引用已經發生了變化，當這些引用發生變化的時候，JVM會標記堆的這個區域為Dirty Card(包含被標記但是改變了的對象，被認為"dirty")，這就是 Card Marking。

如下圖：

在pre-clean階段，那些能夠從dirty card對象到達的對象也會被標記，這個標記做完之後，dirty card標記就會被清除了，如下：

另外，一些必要的清掃工作也會做，還會做一些final remark階段需要的准備工作；

分析

2016-08-23T11:23:07.357-0200: 64.460: [CMS-concurrent-preclean-start]
2016-08-23T11:23:07.373-0200: 64.476: [CMS-concurrent-preclean¹: 0.016/0.016 secs²] [Times: user=0.02 sys=0.00, real=0.02 secs]³

CMS-concurrent-preclean – 這個階段負責前一個階段標記了又發生改變的對象標記；
0.016/0.016 secs – 展示該階段持續的時間和時鐘時間；
[Times: user=0.02 sys=0.00, real=0.02 secs] – 同上

Phase 4: Concurrent Abortable Preclean

又一個並發階段不會停止應用程序線程。這個階段嘗試著去承擔STW的Final Remark階段足夠多的工作。這個階段持續的時間依賴好多的因素，由於這個階段是重復的做相同的事情直到發生aboart的條件（比如：重復的次數、多少量的工作、持續的時間等等）之一才會停止。

2016-08-23T11:23:07.373-0200: 64.476: [CMS-concurrent-abortable-preclean-start]
2016-08-23T11:23:08.446-0200: 65.550: [CMS-concurrent-abortable-preclean1: 0.167/1.074 secs2] [Times: user=0.20 sys=0.00, real=1.07 secs]3

CMS-concurrent-abortable-preclean – 可終止的並發預清理；
0.167/1.074 secs – 展示該階段持續的時間和時鐘時間（It is interesting to note that the user time reported is a lot smaller than clock time. Usually we have seen that real time is less than user time, meaning that some work was done in parallel and so elapsed clock time is less than used CPU time. Here we have a little amount of work – for 0.167 seconds of CPU time, and garbage collector threads were doing a lot of waiting. Essentially, they were trying to stave off for as long as possible before having to do an STW pause. By default, this phase may last for up to 5 seconds）；
[Times: user=0.20 sys=0.00, real=1.07 secs] – 同上

這個階段很大程度的影響著即將來臨的Final Remark的停頓，有相當一部分重要的 configuration options 和失敗的模式；

Phase 5: Final Remark

這個階段是CMS中第二個並且是最後一個STW的階段。該階段的任務是完成標記整個年老代的所有的存活對象。由於之前的預處理是並發的，它可能跟不上應用程序改變的速度，這個時候，STW是非常需要的來完成這個嚴酷考驗的階段。

通常CMS盡量運行Final Remark階段在年輕代是足夠干淨的時候，目的是消除緊接著的連續的幾個STW階段。

分析：

2016-08-23T11:23:08.447-0200: 65.550¹: [GC (CMS Final Remark²) [YG occupancy: 387920 K (613440 K)³]65.550: [Rescan (parallel) , 0.0085125 secs]⁴65.559: [weak refs processing, 0.0000243 secs]65.559⁵: [class unloading, 0.0013120 secs]65.560⁶: [scrub string table, 0.0001759 secs⁷][1 CMS-remark: 10812086K(11901376K)⁸] 11200006K(12514816K) ⁹, 0.0110730 secs¹⁰] [[Times: user=0.06 sys=0.00, real=0.01 secs]¹¹

2016-08-23T11:23:08.447-0200: 65.550 – 同上；
CMS Final Remark – 收集階段，這個階段會標記老年代全部的存活對象，包括那些在並發標記階段更改的或者新創建的引用對象；
YG occupancy: 387920 K (613440 K) – 年輕代當前占用情況和容量；
[Rescan (parallel) , 0.0085125 secs] – 這個階段在應用停止的階段完成存活對象的標記工作；
weak refs processing, 0.0000243 secs]65.559 – 第一個子階段，隨著這個階段的進行處理弱引用；
class unloading, 0.0013120 secs]65.560 – 第二個子階段(that is unloading the unused classes, with the duration and timestamp of the phase);
scrub string table, 0.0001759 secs – 最後一個子階段（that is cleaning up symbol and string tables which hold class-level metadata and internalized string respectively）
10812086K(11901376K) – 在這個階段之後老年代占有的內存大小和老年代的容量；
11200006K(12514816K) – 在這個階段之後整個堆的內存大小和整個堆的容量；
0.0110730 secs – 這個階段的持續時間；
[Times: user=0.06 sys=0.00, real=0.01 secs] – 同上；

通過以上5個階段的標記，老年代所有存活的對象已經被標記並且現在要通過Garbage Collector采用清掃的方式回收那些不能用的對象了。

Phase 6: Concurrent Sweep

和應用線程同時進行，不需要STW。這個階段的目的就是移除那些不用的對象，回收他們占用的空間並且為將來使用。

如圖：

分析：

2016-08-23T11:23:08.458-0200: 65.561: [CMS-concurrent-sweep-start] 2016-08-23T11:23:08.485-0200: 65.588: [CMS-concurrent-sweep¹: 0.027/0.027 secs²] [[Times: user=0.03 sys=0.00, real=0.03 secs] ³

CMS-concurrent-sweep – 這個階段主要是清除那些沒有標記的對象並且回收空間；
0.027/0.027 secs – 展示該階段持續的時間和時鐘時間；
[Times: user=0.03 sys=0.00, real=0.03 secs] – 同上

Phase 7: Concurrent Reset

這個階段並發執行，重新設置CMS算法內部的數據結構，准備下一個CMS生命周期的使用。

2016-08-23T11:23:08.485-0200: 65.589: [CMS-concurrent-reset-start] 2016-08-23T11:23:08.497-0200: 65.601: [CMS-concurrent-reset¹: 0.012/0.012 secs²] [[Times: user=0.01 sys=0.00, real=0.01 secs]³