Catalog
python Reference mechanism
Python Reference count
Reference counter principle
Get reference count : getrefcount()
Increase the reference count
Reduce reference count
Memory leaks and memory overflows
Mark clear # It is mainly used to solve circular references .
The advantages of the reference counting mechanism :
The disadvantages of the reference counting mechanism
Garbage collection
Recycling principle
gc Mechanism
The efficiency problem
Three conditions trigger garbage collection
generational (generation) Recycling -- Determine which objects are there when you start garbage collection
Python Buffer pool ( Memory pool )
Why introduce memory pools (why)
How the memory pool works (how)
Integer object buffer pool
String cache
Be careful :
A string of intern Mechanism
python Memory management in : Based on reference count , Generation based recycling and mark removal are supplemented by garbage collection , And the memory pool mechanism for caching small integers and hosting simple strings .
Python Dynamic type
• An object is an entity stored in memory . Objects in memory (pycodeobject) Include ( Reference count , data type , value )
• The object name we write in the program , Just a reference to this object (reference)
• Reference and object separation , It's the core of dynamic types
• A reference can point to a new object at any time ( Memory address will be different
answer = 42
identifier value
stay Python in , Each object has the total number of references to that object , That is, reference count (reference count).
Python Save in memory variable tracking by reference counting , That is to record the number of times that the object is referenced by other objects that are used .
Python There's an internal tracking variable in called the reference counter , How many references are there for each variable , Referred to as reference count . When the reference count of an object is 0 when , It's in the garbage collection queue .
>>> a=[1,2]
>>> import sys
>>> sys.getrefcount(a) ## Get objects a The number of citations
2
>>> b=a
>>> sys.getrefcount(a)
3
>>> del b ## Delete b References to
>>> sys.getrefcount(a)
2
>>> c=list()
>>> c.append(a) ## Add to container
>>> sys.getrefcount(a)
3
>>> del c ## Delete container , quote -1
>>> sys.getrefcount(a)
2
>>> b=a
>>> sys.getrefcount(a)
3
>>> a=[3,4] ## Reassign
>>> sys.getrefcount(a)
2
Be careful : When put a Pass as parameter to getrefcount when , A temporary reference will be generated , So the result is better than the real situation +1
python Each object in maintains one ob_ref Field , Used to record the number of times the object is currently referenced
Whenever a new reference points to the object , Its reference count ob_ref Add 1( Reference count from 0 Start )
Count every time a reference to this object fails ob_ref reduce 1, Once the reference count of an object is 0, The object can be recycled , The memory space occupied by the object will be freed .
When using a reference as an argument , Pass to getrefcount() when , Parameter actually creates a temporary reference . therefore ,getrefcount() The result obtained , It will be more than expected 1
>>> from sys import getrefcount from sys Module import getrefcount attribute , You can use it directly later getrefcount attribute
If you import directly sys modular , Use later getrefcount Properties need to use sys.getrefcount()
>>> a=[1,2,3]
>>> print(getrefcount(a))
2
>>> b=a
>>> print(getrefcount(b))
3
Its disadvantage is that it requires additional space to maintain reference counts , This problem is secondary to
The main problem is that it can't release the object's “ Circular reference ” Space
Be an object A By another object B When referencing ,A The reference count of will increase
del When deleted or re referenced , The reference count changes (del Just delete the reference
>>> x=[1]
>>> y=[2]
>>> x.append(y)
>>> y.append(x)
>>> getrefcount(x)
3
>>> getrefcount(y)
3
>>> del x
>>> del y
According to the law of reference counting , Circular reference occurs , Memory cannot be released by reference counting
This can cause memory leaks
Memory leak : Some memory is occupied and cannot be released , The process is inaccessible again
Memory leaks can lead to memory overflow (oom --out of memory): Out of memory , The memory required by the program is greater than the free memory of the system
Mark - Clearing mechanism , seeing the name of a thing one thinks of its function , First mark the object ( Garbage detection ), Then remove the garbage ( Garbage collection ).
1. Mark : Activities ( It's quoted ), Inactive ( Can be deleted )
2. eliminate : Clear all inactive pairs
>>> a=[1,2]
>>> b=[3,4]
>>> sys.getrefcount(a)
2
>>> sys.getrefcount(b)
2
>>> a.append(b)
>>> sys.getrefcount(b)
3
>>> b.append(a)
>>> sys.getrefcount(a)
3
>>> del a
>>> del b
a quote b,b quote a, At this point, the two objects are referenced 2 Time ( Remove getrefcout() A temporary reference to )
perform del after , object a,b The number of citations is -1, At this point, the respective reference counters are 1, Fall into a circular reference
Mark : Find one end of it a, Because it has a right to b References to , Will b Reference count of -1
Mark : Then follow the quotation to b,b There is one a References to , take a Reference count of -1, At this point the object a and b The number of citations is all 0, Marked as inaccessible (Unreachable)
eliminate : Objects marked as inaccessible are those that need to be released
The garbage collection phase described above , Will pause the entire application , Wait for the mark to clear before resuming the application . In order to reduce the application pause time ,Python adopt “ Generational recycling ”(Generational Collection) Space for time to improve the efficiency of garbage collection .
• Simple
• The real time
• Maintain reference count consumption resources
• When quoting circularly , It can't be recycled
Python The mechanism of garbage collection is mainly based on the reference counting mechanism , Mark - A strategy supplemented by scavenging and generational recycling mechanisms . among , Mark - The clearing mechanism is used to solve the problem of circular references caused by counting references and unable to free memory , Generation recycling mechanism is to improve the efficiency of garbage collection .
When Python The reference count of an object of the is reduced to 0 when , It can be recycled by garbage
• GC As an automatic memory management mechanism of modern programming language , Focus on two things
• Find useless garbage resources in memory
• Clean up the garbage and let the memory out for other objects .
GC Free programmers from the burden of resource management , Give them more time to focus on business logic . But that doesn't mean
The yard farmer can not understand GC, After all, know more GC Knowledge is still good for us to write more robust code
Garbage collection ,Python You can't do anything else . Frequent garbage collection will greatly reduce Python Work efficiency .
When Python Runtime , It will record the assigned objects (object allocation) And unassign objects (object deallocation) Of
frequency . When the difference between the two is higher than a certain threshold , Garbage collection will start
import gc
print(gc.get_threshold())
(700,10,10) #700 Gate threshold , The interval between generations of recycling is 10 Time
• call gc.collect()
>>> gc.collect()
2
• GC When the threshold is reached
• When the program exits
The basic assumption of this strategy is : Objects that live longer , The less likely it is to become garbage in later programs .
• Python Divide all objects into 0,1,2 The three generation .
• All new objects are 0 On behalf of the object .
• When a generation of objects has experienced garbage collection , Still alive , So it's going to be the next generation of objects .
• When garbage collection starts , It will scan all the 0 On behalf of the object .
• If 0 After a certain number of garbage collection , So start right 0 The generation and 1 Generation of scanning and cleaning .
• When 1 Generation also experienced a certain number of garbage collection after , Then it will start. Yes 0,1,2, That is to scan all objects
When creating objects that consume a lot of small memory , Call... Frequently new/malloc It will cause a lot of memory fragmentation , To reduce efficiency . The function of memory pool is to apply a certain amount of memory in advance , Memory blocks of equal size are reserved for standby , When there are new memory requirements , First, allocate memory from the memory pool to this requirement , If not enough, apply for new memory . The most significant advantage of this is that it can reduce memory fragmentation , Improve efficiency .
python Object management is mainly located in Level+1~Level+3 layer
Level+3 layer : about python Built in objects ( such as int,dict etc. ) Each has its own private memory pool , The memory pool between objects is not shared , namely int Free memory , Will not be assigned to float Use
Level+2 layer : When the requested memory size is less than 256KB when , Memory allocation is mainly made up of Python Object allocator (Python’s object allocator) The implementation of
Level+1 layer : When the requested memory size is greater than 256KB when , from Python The native memory allocator allocates , Call in essence C In the standard library malloc/realloc Such as function
About freeing memory , When the reference count of an object changes to 0 when ,Python Will call its destructor . Calling a destructor does not mean that it will eventually call free To free up memory space , If so , Then apply frequently 、 Freeing up memory will make Python The efficiency of the implementation is greatly reduced . Therefore, the memory pool mechanism is also used in the destruct , The memory requested from the memory pool will be returned to the memory pool , To avoid frequent application and release actions .
about [-5,256] Such a small integer , The system has been initialized , You can use it directly . And for other large integers , The system proposes
I applied for a piece of memory space before , Create large integer objects on this when needed .
>>> a=3
>>> print(getrefcount(a))
49
To verify that two references point to the same object , We can use is keyword .is Used to determine whether two references refer to the same object .
When the caching mechanism is triggered , It's just creating new quotes , Not the object itself .
Single character , After creation, it will be stored in the string resident area
Multiple characters , If there are no special characters after creation , It will be stored in the string resident area
This is also an optimized solution for frequently used numbers and strings
python For short , Only alphanumeric strings automatically trigger the caching mechanism . Other situations don't cache