In the previous article , We have introduced two common cache architectures — Penetration cache and bypass cache . Common cache architectures — Penetration cache and bypass cache
The main difference between the penetration cache and the bypass cache architecture is the processing method when there is no accessed data in the cache . however , The capacity of the cache is limited , Over time , Whether it is an architecture that uses a penetration cache or a bypass cache , Eventually, the cache will be full , So at this time , To cache data in real time , We need to get rid of some old 、 Low access data to increase free space for the continued use of the cache . How to select the data to be eliminated ? This is the goal of the cache elimination algorithm .
The most common cache obsolescence algorithms are the following :
FIFO namely First In First Out Abbreviation , This can be said to be the simplest algorithm to implement . The cache maintains a queue , Always insert data at the head of the team , When the cache is full , Delete the end of queue data .
The advantage of this algorithm lies in its simple implementation , But the disadvantages are also obvious :
LFU yes least frequently used Abbreviation . The idea of the algorithm is to eliminate the data that has been accessed the least times . The implementation method is also very simple , Only one queue needs to be maintained , The corresponding relationship between the stored data and the number of accesses . Every time data is accessed , Increase the corresponding access times , And move the node to the team leader in the linked list , Until the whole queue from the number of pairs to the end of the queue, it still keeps decreasing storage according to the number of visits . When the elimination algorithm needs to be executed , Only part of the data at the end of the team can be eliminated .
This algorithm is compared with FIFO algorithm , He takes into account the number of data accesses , So as to avoid hot data being eliminated before cold data . however , This algorithm still has some problems , That is, once a certain data is accessed in a large amount in a short time , Even if there is no access for a long time , This data is still not eliminated by virtue of its huge number of visits .
LRU yes Least Recently Used Abbreviation . This is the most widely used cache elimination algorithm in practice , His algorithmic thinking is similar to LFU Very similar , But he removed the number of visits , In the queue, the policy of "access will rise" , This avoids the impact caused by the excessive number of visits mentioned above . Because the algorithm is widely used , We will follow with python It is a very common method to cache parameters and results — functools.lru_cache, Let's introduce the algorithm in detail .
And LRU relative ,MRU yes Most recently used Abbreviation . This algorithm works with LRU On the contrary ,MRU The algorithm preferentially removes recently used data , It is used to deal with the situation that the data that has not been used for a long time is easier to access .
python In the standard library functools.lru_cache The decorator implements a LRU Algorithm cache , Used to cache the correspondence between all parameters of the method and the return value , It is used to improve the performance of a method when it is frequently called with the same parameters . About python Closures and decorators of , Refer to the previous article : python The closure property of
python Decorator and its principle in
The following is the simplified python Standard library functools.lru_cache Source code :
def make_key(args, kwds):
"""
Get cached through method parameters key
:param args: Method primitive parameters
:param kwds: Method key value pair parameter
:return: After calculation key Of hash value
"""
key = args
if kwds:
key += object()
for item in kwds.items():
key += item
elif len(key) == 1 and type(key[0]) in {int, str, frozenset, type(None)}:
return key[0]
return hash(key)
def lru_cache(maxsize=128):
def decorating_function(user_function):
PREV, NEXT, KEY, RESULT = 0, 1, 2, 3
cache = {} # Cache structure , Storage key Mapping to cached data
hits = misses = 0 # A cache hit 、 Loss statistics
full = False # Whether the buffer is full is marked
cache_len = cache.__len__ # Buffer size
root = []
root[:] = [root, root, None, None] # Queue header node is empty
def lru_cache_wrapper(*args, **kwds):
nonlocal root, hits, misses, full
key = make_key(args, kwds)
with RLock(): # Thread safety lock
link = cache.get(key)
""" A cache hit , Move the hit node , Return the preset result directly """
if link is not None:
""" Remove the hit node from the linked list """
link_prev, link_next, _key, result = link
link_prev[NEXT] = link_next
link_next[PREV] = link_prev
""" Move the hit node to the end of the queue """
last = root[PREV]
last[NEXT] = root[PREV] = link
link[PREV] = last
link[NEXT] = root
hits += 1
return result
""" Cache miss , Call the method to get the return , Create nodes , Elimination algorithm """
result = user_function(*args, **kwds)
with RLock():
if key in cache:
pass
elif full:
""" Buffer full , Eliminate the first element of the team , Delete the corresponding buffer element """
oldroot = root
oldroot[KEY] = key
oldroot[RESULT] = result
root = oldroot[NEXT]
oldkey = root[KEY]
root[KEY] = root[RESULT] = None
del cache[oldkey]
cache[key] = oldroot
else:
""" The buffer is not full , Create nodes directly , insert data """
last = root[PREV]
link = [last, root, key, result]
last[NEXT] = root[PREV] = cache[key] = link
full = (cache_len() >= maxsize)
misses += 1
return result
return lru_cache_wrapper
return decorating_function
The implementation logic of the above code is very clear , The implementation method is also very clever . He implemented a buffer using a dictionary , At the same time, a circular bidirectional linked list is created , Each node in the linked list is a list,list The four elements in represent the reference of the precursor 、 Subsequent references 、key、 Function return value . Through the synchronous operation between the buffer and the ring bidirectional linked list LRU Implementation of algorithm .
The following figure shows the algorithm execution process in the two scenarios of buffer hit and buffer elimination :
We mentioned earlier , because python No tail recursive optimization , The efficiency of recursive execution algorithm is very low . In previous articles , In this case , We implemented a simple tail recursion optimization by ourselves . Classical dynamic programming problem — The frog goes up the steps and python Recursive optimization of
Let's add... From the previous article clock Decorator , Let's look at the procedure of recursively generating Fibonacci sequence .
def clock(func):
@functools.wraps(func)
def clocked(*args, **kwargs):
t0 = time.time()
result = func(*args, **kwargs)
elapsed = time.time() - t0
name = func.__name__
arg_lst = []
if args:
arg_lst.append(', '.join(repr(arg) for arg in args))
if kwargs:
pairs = ['%s=%r' % (k, w) for k, w in sorted(kwargs.items())]
arg_lst.append(', '.join(pairs))
arg_str = ', '.join(arg_lst)
print('[%0.8fs] %s(%s) -> %r ' % (elapsed, name, arg_str, result))
return result
return clocked
@lru_cache()
@clock
def fibonacci(n):
if n < 2:
return n
return fibonacci(n - 2) + fibonacci(n - 1)
if __name__ == '__main__':
print(fibonacci(6))
Execution procedure , Printed out :
[0.00000000s] fibonacci(0) -> 0 [0.00000000s] fibonacci(1) -> 1 [0.00000000s] fibonacci(2) -> 1 [0.00000000s] fibonacci(1) -> 1 [0.00000000s] fibonacci(0) -> 0 [0.00000000s] fibonacci(1) -> 1 [0.00000000s] fibonacci(2) -> 1 [0.00000000s] fibonacci(3) -> 2 [0.00050116s] fibonacci(4) -> 3 [0.00000000s] fibonacci(1) -> 1 [0.00000000s] fibonacci(0) -> 0 [0.00000000s] fibonacci(1) -> 1 [0.00000000s] fibonacci(2) -> 1 [0.00000000s] fibonacci(3) -> 2 [0.00000000s] fibonacci(0) -> 0 [0.00000000s] fibonacci(1) -> 1 [0.00000000s] fibonacci(2) -> 1 [0.00000000s] fibonacci(1) -> 1 [0.00000000s] fibonacci(0) -> 0 [0.00000000s] fibonacci(1) -> 1 [0.00000000s] fibonacci(2) -> 1 [0.00000000s] fibonacci(3) -> 2 [0.00000000s] fibonacci(4) -> 3 [0.00000000s] fibonacci(5) -> 5 [0.00050116s] fibonacci(6) -> 8 8
You can see , There are many repeated parameter calls that consume program performance , As the size of the series grows , The number of repeated calls will multiply . An effective optimization condition is to cache the results of these repeated calls , When calling again, you can return directly , That's exactly what it is. lru_cache Use of .
import functools
import time
from _thread import RLock
def make_key(args, kwds):
"""
Get cached through method parameters key
:param args: Method primitive parameters
:param kwds: Method key value pair parameter
:return: After calculation key Of hash value
"""
key = args
if kwds:
key += object()
for item in kwds.items():
key += item
elif len(key) == 1 and type(key[0]) in {int, str, frozenset, type(None)}:
return key[0]
return hash(key)
def lru_cache(maxsize=128):
def decorating_function(user_function):
PREV, NEXT, KEY, RESULT = 0, 1, 2, 3
cache = {} # Cache structure , Storage key Mapping to cached data
hits = misses = 0 # A cache hit 、 Loss statistics
full = False # Whether the buffer is full is marked
cache_len = cache.__len__ # Buffer size
root = []
root[:] = [root, root, None, None] # Queue header node is empty
def lru_cache_wrapper(*args, **kwds):
nonlocal root, hits, misses, full
key = make_key(args, kwds)
with RLock(): # Thread safety lock
link = cache.get(key)
""" A cache hit , Move the hit node , Return the preset result directly """
if link is not None:
""" Remove the hit node from the linked list """
link_prev, link_next, _key, result = link
link_prev[NEXT] = link_next
link_next[PREV] = link_prev
""" Move the hit node to the end of the queue """
last = root[PREV]
last[NEXT] = root[PREV] = link
link[PREV] = last
link[NEXT] = root
hits += 1
return result
""" Cache miss , Call the method to get the return , Create nodes , Elimination algorithm """
result = user_function(*args, **kwds)
with RLock():
if key in cache:
pass
elif full:
""" Buffer full , Eliminate the first element of the team , Delete the corresponding buffer element """
oldroot = root
oldroot[KEY] = key
oldroot[RESULT] = result
root = oldroot[NEXT]
oldkey = root[KEY]
root[KEY] = root[RESULT] = None
del cache[oldkey]
cache[key] = oldroot
else:
""" The buffer is not full , Create nodes directly , insert data """
last = root[PREV]
link = [last, root, key, result]
last[NEXT] = root[PREV] = cache[key] = link
full = (cache_len() >= maxsize)
misses += 1
return result
return lru_cache_wrapper
return decorating_function
def clock(func):
@functools.wraps(func)
def clocked(*args, **kwargs):
t0 = time.time()
result = func(*args, **kwargs)
elapsed = time.time() - t0
name = func.__name__
arg_lst = []
if args:
arg_lst.append(', '.join(repr(arg) for arg in args))
if kwargs:
pairs = ['%s=%r' % (k, w) for k, w in sorted(kwargs.items())]
arg_lst.append(', '.join(pairs))
arg_str = ', '.join(arg_lst)
print('[%0.8fs] %s(%s) -> %r ' % (elapsed, name, arg_str, result))
return result
return clocked
@lru_cache()
@clock
def fibonacci(n):
if n < 2:
return n
return fibonacci(n - 2) + fibonacci(n - 1)
if __name__ == '__main__':
print(fibonacci(6))
Printed out :
[0.00000000s] fibonacci(0) -> 0 [0.00000000s] fibonacci(1) -> 1 [0.00000000s] fibonacci(2) -> 1 [0.00000000s] fibonacci(3) -> 2 [0.00000000s] fibonacci(4) -> 3 [0.00000000s] fibonacci(5) -> 5 [0.00000000s] fibonacci(6) -> 8 8
We see , Method from the original 25 Recursive calls become only 7 Recursive calls , The optimization effect on execution time is also quite obvious .