A survey of garbage collection and the changes CLR 4.0 brings in - serIEs of what is new in CLR 4.0
導言Introduction
垃圾回收(Garbage Collection)在.net中是一個很重要的機制. 本文將要談到CLR4.0對垃圾回收做了哪些改進. 為了更好地理解這些改進, 本文也要介紹垃圾回收的歷史. 這樣我們對整個垃圾回收有一個大的印象. 這個大印象對於我們掌握.Net架構是有幫助的.
Garbage Collection is an important component of .net. The post will talk about what has been improved in CLR 4.0. To understand it, I will take a survey of the history of garbage collection. This way we can have a big picture of garbage collection. This will help us master .Net architecture in comprehensive manner.
關於垃圾回收About Garbage collection
在C++時代,我們需要自己來管理申請內存和釋放內存. 於是有了new, delete關鍵字. 還有的一些內存申請和釋放函數(malloc/free). C++程序必須很好地管理自己的內存, 不然就會造成內存洩漏(Memory leak). 在.Net時代, 微軟為開發人員提供了一個強有力的機制--垃圾回收. 垃圾回收機制是CLR的一部分, 我們不用操心內存何時釋放, 我們可以花更多精力關注應用程序的業務邏輯. CLR裡面的垃圾回收機制用一定的算法判斷某些內存程序不再使用,回收這些內存並交給我們的程序再使用.
In the times of C++, we need to allocate and release memory by ourselves carefully, therefore there are new, delete keyWords in C++, and fuctions(malloc/free) to allocate and release memory. C++ program has to manage its memory well, otherwise there will be memory leak. In .Net, Microsoft provides a strong Machanism to developers—Garbage collection. The Garbage collection is part of CLR. We do not need to worry about when to release memory. We can spend more time on buisness logic of applications. The Garbage colleciton of CLR adopts algorithms to decide which part of memory the program does not need any more, and then release these memory for further use.
垃圾回收的功能The functionalitIEs of Garbage collection
用來管理托管資源和非托管資源所占用的內存分配和釋放。In charging of the releasing and re-allocation of memory of managed and unmanaged resources.
尋找不再使用的對象,釋放其占用的內存, 以及釋放非托管資源所占用的內存. Find the objects no longer needed, release the memory the objects occupied, and affranchise memory occupIEd by unmanaged resources.
垃圾回收器釋放內存之後, 出現了內存碎片, 垃圾回收器移動一些對象, 以得到整塊的內存,同時所有的對象引用都將被調整為指向對象新的存儲位置。After releasing the memory no longer needed, there is memory scrap. Garbage collector shifts objects to get consecutive memory space, and then the references of objects will be adjusted according to the shifted address of objects.
下面我們來看看CLR是如何管理托管資源的. Let’s see how CLR takes care of managed resources.
托管堆和托管棧Managed heap and Managed stack:
.Net CLR在運行我們的程序時,在內存中開辟了兩塊地方作不同的用處--托管棧和托管堆. 托管棧用來存放局部變量, 跟蹤程序調用與返回. 托管堆用來存放引用類型. 引用類型總是存放於托管堆. 值類型通常是放在托管棧上面的. 如果一個值類型是一個引用類型的一部分,則此值類型隨該引用類型存放於托管堆中. 哪些東西是值類型? 就是定義於System.ValueType之下的這些類型:
bool byte char decimal double enum float int long sbyte short struct uint ulong ushort
When .Net CLR runs our program, CLR declares two ranges of memory for different purposes. Managed stack is to store local variables, and trace the call and return of routines. Managed heap is to store reference types. Usually value types was put on managed s
tack. If a value type is a part of a reference type, then the value type will be stored in managed heap along with the reference type. What are value types? They are the types defined in System.ValueType:
bool byte char decimal double enum float int long sbyte short struct uint ulong ushort
什麼是引用類型呢? 只要用class, interface, delegate, object, string聲明的類型, 就是引用類型. What are reference types? The types declared with class, interface, delegate, object, stirng, are reference types.
我們定義一個局部變量, 其類型是引用類型. 當我們給它賦一個值, 如下例:We declare a local variable, which is a reference type, and we assign a value to the local variable, like the following:
private void MyMethod()
{
MyType myType = new MyType();
myType.DOSomeThing();
}
在此例中, myType 是局部變量, new實例化出來的對象存儲於托管堆, 而myType變量存儲於托管棧. 在托管棧的myType變量存儲了一個指向托管堆上new實例化出來對象的引用. CLR運行此方法時, 將托管棧指針移動, 為局部變量myType分配空間, 當執行new時, CLR先查看托管堆是否有足夠空間, 足夠的話就只是簡單地移動下托管堆的指針, 來為MyType對象分配空間, 如果托管堆沒有足夠空間, 會引起垃圾收集器工作. CLR在分配空間之前,知道所有類型的元數據,所以能知道每個類型的大小, 即占用空間的大小.
In this sample, myType is a local variable. the object instantiated by new operation is stored in managed heap, and the myType local variable is stored in managed stack. The myType local variable on managed stack has a pointer pointing to the address of the object instantiated by new operation. When CLR executes the method, CLR moves the pointer of managed stack to allocate memory for the local variable myType. When CLR executes new Operation, CLR checks first whether managed heap has enough space, if enough then do a simple action – move the pointer of managed heap to allocate space for the object of MyType. If managed heap does not have space, this triggers garbage collector to function. CLR knows all the metadata of types, and knows the size of all the types, and then knows how big space the types need.
當CLR完成MyMethod方法的執行時, 托管棧上的myType局部變量被立即刪除, 但是托管堆上的MyType對象卻不一定馬上刪除. 這取決於垃圾收集器的觸發條件.後面要介紹此觸發條件.When CLR finishs execution of MyMethod method, the local variable myType on managed stack is deleted immediately, but the object of MyType on managed heap may not be deleted immediately. This depends on the trigger condition of garbage collector. I will talk about the trigger condition later.
上面我們了解了CLR如何管理托管資源. 下面我們來看垃圾收集器如何尋找不再使用的托管對象,並釋放其占用的內存. In previous paragraphs, we learn how CLR manages managed resources. In following paragraphs, we will see how garbage collector find objects no longer needed, and release the memory.
垃圾收集器如何尋找不再使用的托管對象,並釋放其占用的內存How garbage collector find objects no longer needed and release memory
前面我們了解了CLR如何管理托管棧上的對象.按照先進後出原則即可比較容易地管理托管棧的內存. 托管堆的管理比托管棧的管理復雜多了.下面所談都是針對托管堆的管理. In previous paragraphs, we learn how CLR manages the objects on managed stack. It is easy to manage managed stack as long as you utilize the rule “first in last out”. The management of managed heap is much more complicated than the management of managed stack. The following is all about the management of managed heap.
根The root
垃圾收集器尋找不再使用的托管對象時, 其判斷依據是當一個對象不再有引用指向它, 就說明此對象是可以釋放了. 一些復雜的情況下可以出現一個對象指向第二個對象,第二個對象指向第三個對象,…就象一個鏈表. 那麼, 垃圾收集器從哪裡開始查找不再使用的托管對象呢? 以剛才所說的鏈表為例, 顯然是應該從鏈表的開頭開始查找. 那麼,在鏈表開頭的是些什麼東東呢? The criteria garbage collector uses to judge whether an object is no longer needed is that an object can be released when the object does have any reference. In some complicated cases, it happends that the first object refers to the second object, and the second object points to the third object, etc. It is looking like a chain of single linked nodes. Then the question is : where does the garbage collector begins to find objects no longer needed? For the example of the single linked node chain, we can say it is obvious garbage collector starts from the beginning of the chain. Then the next question is: what are the stuff at the beginning of the chain.
是局部變量, 全局變量, 靜態變量, 指向托管堆的CPU寄存器. 在CLR中,它們被稱之為根. The answer is : local variables, global variables, static variables, the CPU registers pointing to managed heap. In CLR, they are called “the roots”.
有了開始點, 垃圾收集器接下來怎麼做呢? Got the roots, what will garbage collector do next?
創建一個圖, 一個描述對象間引用關系的圖. Build a graph, which shows the reference relationship among objects.
垃圾收集器首先假定所有在托管堆裡面的對象都是不可到達的(或者說沒有被引用的,不再需要的), 然後從根上的那些變量開始, 針對每一個根上的變量, 找出其引用的托管堆上的對象, 將找到的對象加入這個圖, 然後再沿著這個對象往下找,看看它有沒有引用另外一個對象, 有的話,繼續將找到的對象加入圖中,如果沒有的話, 就說明這條鏈已經找到尾部了. 垃圾收集器就去從根上的另外一個變量開始找, 直到根上的所有變量都找過了, 然後垃圾收集器才停止查找. 值得一提的是, 在查找過程中, 垃圾收集器有些小的優化, 如: 由於對象間的引用關系可能是比較復雜的, 所以有可能找到一個對象, 而此對象已經加入圖了, 那麼垃圾收集器就不再在此條鏈上繼續查找, 轉去其他的鏈上繼續找. 這樣對垃圾收集器的性能有所改善.
First garbage collector supposes all the objects in managed heap are not reachable( do not have reference, or no longer needed). Then start from the variables in the roots. For each of the variable in the roots, search the object the variable refers to, and add the found object into the graph, and search again after the found object for next refered object, etc. Check whether the found object has next reference. If has, continue to add the next found object into the graph. If not, it means this is the end of the chain, then stop searching on the chain, continue on next variable in the roots, keep searching on roots, until all the searching are finished. In the searching process, garbage collector has some optimization to improve the performance. Like: Because the reference relationship could be complicated among objects, it is possible to find an object that has been added into the graph, then garbage collector stops searching on the chain, continue to search next chain. This way helps on performance of garbage collection.
垃圾收集器建好這個圖之後, 剩下那些沒有在這個圖中的對象就是不再需要的. 垃圾收集器就可以回收它們占用的空間.After buidling the reference graph among objects, the objects not in the graph are no longer needed objects. Garbage collector could release the memory space occupIEd by the no longer needed
objects.