This is a “Python craftsman ” Series No 4 An article
Containers ” These two words are rarely used Python Technical articles mention . At the sight of “ Containers ”, Most people think of the little blue whale :Docker, But this article has nothing to do with it . Containers in this article , yes Python An abstract concept in , It is a general term for the data types specially used to hold other objects .
stay Python in , There are four most common types of built-in containers : list (list)、 Tuples (tuple)、 Dictionaries (dict)、 aggregate (set). By using them individually or in combination , You can do a lot of things efficiently .
Python The internal implementation details of the language itself are also closely related to these container types . such as Python Class instance properties of 、 Global variables globals() And so on are stored by dictionary type .
In this article , I will start with the definition of container types , Try to summarize some of the best practices for everyday coding . Then the special functions provided around each container type , Share some programming tips .
1) Bottom view container
- Avoid frequently expanding the list / Create a new list
- Use in scenarios with multiple operations at the top of the list deque modular
- Use set / Dictionary to determine whether members exist
- Write faster code
2) High level viewing container
- Container oriented interface programming
- Write more extensible code
3) Common skills
- Use tuples to improve branching code
- Use dynamic unpacking in more places
- Better not “ Get permission ”, No need “ Ask for forgiveness ”
- Use next() function
- Use an ordered dictionary to eliminate duplication
4) Common misconceptions about
- Watch out for the exhausted iterators
- Don't modify the iterated object in the loop body
5) summary
6) Other articles in the series
7) annotation
I gave it to “ Containers ” A simple definition : The container is used to hold other objects . But this definition is too broad , It doesn't have any guiding value for our daily programming . To really master Python The container in , We need to start from two levels :
below , Let's stand on these two different levels , Re understand the container .
Python It's a high-level programming language , The type of built-in container it provides , It's all the result of a high degree of encapsulation and abstraction . and “ Linked list ”、“ Red and black trees ”、“ Hashtable ” Compared to these names , all Python The name of the built-in type , They only describe the functional features of this type , Other people can't understand even the slightest detail of these names .
This is a Python One of the advantages of programming languages . comparison C Programming languages that are closer to the bottom of the computer ,Python Redesigned and implemented a more programmer friendly built-in container type , Shielding out memory management and other extra work . Provides us with a better development experience .
But if this is Python The advantages of language , Why should we bother to understand the implementation details of container types ? The answer is : Attention to detail can help us write faster code .
1. Avoid frequently expanding the list / Create a new list
All built-in container types do not limit capacity . If you will , You can keep pushing increasing numbers into an empty list , Finally, the memory of the whole machine will burst .
stay Python Language implementation details , The memory of the list is allocated on demand notes 1, When a list currently has insufficient memory , The memory expansion logic will be triggered . Allocating memory is an expensive operation . Although in most cases , It will not have any serious impact on the performance of your program . But when you deal with a very large amount of data , It is easy for memory allocation to drag down the performance of the entire program .
not so bad ,Python I have long been aware of this problem , It also provides official guidelines for problem solving , That's it :“ Become lazy ”.
How to explain “ Become lazy ”?range()
The evolution of functions is a very good example .
stay Python 2 in , If you call range(100000000)
, It takes several seconds to get the result , Because it needs to return a huge list , Spent a lot of time on memory allocation and calculation . But in Python 3 in , The same call will get the result right away . Because the function no longer returns a list , Instead, a type is range
The lazy object of , Only when you iterate it 、 Or slice it , It will return the real number to you .
So , To improve performance , Built-in functions range “ Become lazy ” 了
. In order to avoid too frequent memory allocation , In everyday coding , Our functions also need to be lazy , This includes :
2. Use in scenarios with multiple operations at the top of the list deque modular
Lists are based on array structures (Array) Realized , When you insert a new member at the head of the list (list.insert(0, item))
when , All other members behind it need to be moved , The time complexity of the operation is O(n)
. This results in inserting members at the head of the list rather than appending them at the end (list.append(item)
The time complexity is O(1))
slower .
If your code needs to do this many times , Please consider using collections.deque Type to replace the list . because deque It is based on double ended queue , Whether you append elements to the head or tail , The time complexity is zero O(1).
3. Use set / Dictionary to determine whether members exist
When you need to determine whether a member exists in a container , Sets are more appropriate than lists . because item in [...]
The time complexity of the operation is O(n)
, and item in {...}
The time complexity of is O(1)
. This is because dictionaries and collections are based on hash tables (Hash Table) Data structure .
# This example is not particularly appropriate , Because when the target set is especially small , Using sets or lists has little impact on efficiency
# But that's not the point :)
VALID_NAMES = ["piglei", "raymond", "bojack", "caroline"]
# Conversion to collection type is used for member judgment
VALID_NAMES_SET = set(VALID_NAMES)
def validate_name(name):
if name not in VALID_NAMES_SET:
# Here we use Python 3.6 Added f-strings characteristic
raise ValueError(f"{name} is not a valid name!")
Hint: It is highly recommended to read TimeComplexity - Python Wiki, Learn more about the time complexity of common container types .
If you are interested in the implementation details of the dictionary , It is also highly recommended to watch Raymond Hettinger Speech Modern Dictionaries(YouTube)
Python It's a door “ The duck type ” Language :“ When you see a bird walk like a duck 、 Swimming like a duck 、 It also sounds like a duck , Then this bird can be called a duck .” therefore , When we say what type an object is , It basically means : This object satisfies the specific interface specification for that type , Can be used as this type . For all built-in container types , The same is true of .
Open up collections Under the module of abc(“ abstract class Abstract Base Classes” An acronym for ) Sub module , All container related interfaces can be found ( abstract class )[ notes 2] Definition . Let's take a look at the interfaces that the built-in container types meet :
Each built-in container type , In fact, it is a composite entity that satisfies multiple interface definitions . For example, all container types meet “ Can be iterated ”(Iterable) This interface , This means that they are “ Can be iterated ” Of . But the other way around , Not all “ Can be iterated ” All objects are containers . Just like the string can be iterated , But we don't usually think of it as “ Containers ” To look at .
Knowing this fact , We will be in Python One of the most important principles of object-oriented programming : Programming for interfaces rather than concrete implementations .
Let's take an example , See how to understand Python Inside “ Interface oriented programming ”.
One day , We received a demand : There is a list , It contains a lot of user comments , For normal display on the page , All comments longer than a certain length need to be replaced by ellipsis .
This demand is easy to do , Soon we wrote the first version of the code :
# notes : To enhance the illustrative nature of the sample code , Some code snippets in this article use Python 3.5
# Version added Type Hinting characteristic
def add_ellipsis(comments: typing.List[str], max_length: int = 12):
""" If the list of comments exceeds max_length, The remaining characters are replaced by ellipsis
"""
index = 0
for comment in comments:
comment = comment.strip()
if len(comment) > max_length:
comments[index] = comment[:max_length] + '...'
index += 1
return comments
comments = [
"Implementation note",
"Changed",
"ABC for generator",
]
print("\n".join(add_ellipsis(comments)))
# OUTPUT:
# Implementati...
# Changed
# ABC for gene...
In the code above ,add_ellipsis
The function takes a list as an argument , And then traverse it , Replace the members that need to be modified . All this seems reasonable , Because the original demand we received was :“ There is one list , Inside ...”. But if one day , The comments we get are no longer kept on the list , But in immutable tuples ?
In that case , The existing function design will force us to write add_ellipsis(list(comments))
This kind of slow and ugly code .
We need to improve the function to avoid this problem . because add_ellipsis
Function strongly depends on the list type , So when the parameter type becomes tuple , The current function is no longer applicable ( reason : to comments[index]
The assignment will throw TypeError
abnormal ). How to improve the design of this part ? The secret is : Let functional dependency “ Iteratable object ” This abstract concept , Instead of entity list types .
Use generator properties , The function can be changed to :
def add_ellipsis_gen(comments: typing.Iterable[str], max_length: int = 12):
""" If the content in the iteratable comment exceeds max_length, The remaining characters are replaced by ellipsis
"""
for comment in comments:
comment = comment.strip()
if len(comment) > max_length:
yield comment[:max_length] + '...'
else:
yield comment
print("\n".join(add_ellipsis_gen(comments)))
In the new function , We changed the dependent parameter type from list to iterative abstract class . There are many advantages to doing so , One of the most obvious is : Whether the comment is from a list 、 Tuples or a file , New functions can be easily satisfied :
# Handle comments placed in tuples
comments = ("Implementation note", "Changed", "ABC for generator")
print("\n".join(add_ellipsis_gen(comments)))
# Deal with comments placed in documents
with open("comments") as fp:
for comment in add_ellipsis_gen(fp):
print(comment)
After changing the dependency from a specific container type to an abstract interface , Functions have become more widely applicable . besides , The new function also has more advantages in terms of execution efficiency . Now let's go back to the previous question . From the top , What defines a container ?
The answer is : The interface protocol implemented by each container type defines the container . Different container types in our eyes , Should be Can we iterate
、 Is it possible to modify
、 Is there any length
And so on . We need to write the relevant code , Pay more attention to the abstract properties of the container , Not the container type itself , This will help us write more elegant 、 More extensible code .
Hint: stay itertools In the built-in module, you can find more treasures about dealing with iteratable objects .
Sometimes , There will be more than three branches in our code if/else
. It looks like this :
import time
def from_now(ts):
""" Receive a past timestamp , Returns a text description of the relative time from the current time
"""
now = time.time()
seconds_delta = int(now - ts)
if seconds_delta < 1:
return "less than 1 second ago"
elif seconds_delta < 60:
return "{} seconds ago".format(seconds_delta)
elif seconds_delta < 3600:
return "{} minutes ago".format(seconds_delta // 60)
elif seconds_delta < 3600 * 24:
return "{} hours ago".format(seconds_delta // 3600)
else:
return "{} days ago".format(seconds_delta // (3600 * 24))
now = time.time()
print(from_now(now))
print(from_now(now - 24))
print(from_now(now - 600))
print(from_now(now - 7500))
print(from_now(now - 87500))
# OUTPUT:
# less than 1 second ago
# 24 seconds ago
# 10 minutes ago
# 2 hours ago
# 1 days ago
The above function can't find too many problems , Many, many people will write similar code . however , If you look at it carefully , You can find some obvious in the branch code section “ The border ”. such as , When the function determines whether a certain time should be used “ Number of seconds ” Display time , Yes 60
. And to judge whether it should take minutes , Yes 3600
.
Extracting rules from boundaries is the key to optimizing this code . If we put all these boundaries in an ordered tuple , Then cooperate with the binary search module bisect. The control flow of the entire function can be greatly simplified :
import bisect
# BREAKPOINTS It must be sequenced , Otherwise, binary search cannot be performed
BREAKPOINTS = (1, 60, 3600, 3600 * 24)
TMPLS = (
# unit, template
(1, "less than 1 second ago"),
(1, "{units} seconds ago"),
(60, "{units} minutes ago"),
(3600, "{units} hours ago"),
(3600 * 24, "{units} days ago"),
)
def from_now(ts):
""" Receive a past timestamp , Returns a text description of the relative time from the current time
"""
seconds_delta = int(time.time() - ts)
unit, tmpl = TMPLS[bisect.bisect(BREAKPOINTS, seconds_delta)]
return tmpl.format(units=seconds_delta // unit)
In addition to using tuples, you can optimize too many if/else
Outside the branch , In some cases dictionaries can be used to do the same thing . The key is to find repetitive logic and laws from existing code , And try more .
Dynamic unpacking refers to the use of *
or **
Operator will iterate over the object “ Untie ” act , stay Python 2 Time , This operation can only be used in the function parameters section , And there are very strict requirements for the order and quantity of occurrence , The usage scenario is very simple .
def calc(a, b, multiplier=1):
return (a + b) * multiplier
# Python2 Only dynamic unpacking in function parameters is supported in
print calc(*[1, 2], **{"multiplier": 10})
# OUTPUT: 30
however ,Python 3 In especial 3.5 After version ,*
and **
The usage scenario of is greatly expanded . for instance , stay Python 2 in , If we need to merge two dictionaries , It needs to be done :
def merge_dict(d1, d2):
# Because dictionaries are modifiable objects , To avoid modifying the original object , You need to copy one here d1 The shallow copy
result = d1.copy()
result.update(d2)
return result
user = merge_dict({"name": "piglei"}, {"movies": ["Fight Club"]})
But in Python 3.5 Later versions , You can use it directly ** Operator to quickly complete the dictionary merging operation :
user = {**{"name": "piglei"}, **{"movies": ["Fight Club"]}}
besides , You can also use in normal assignment statements *
Operator to dynamically unpack iteratible objects . If you want to know more about it , You can read the following recommendations PEP.
Hint: Two methods to promote dynamic unpacking scenario expansion PEP:PEP 3132 -- Extended Iterable Unpacking | Python.orgPEP 448 -- Additional Unpacking Generalizations | Python.org
This subtitle may be a little confusing , Let me explain briefly :“ Get permission ” And “ Ask for forgiveness ” There are two different programming styles . If you use a classic requirement :“ Count the number of occurrences of each element in the list ” As an example , Two different styles of code would look like this :
# AF: Ask for Forgiveness
# To do do , If an exception is thrown , Then handle the exception
def counter_af(l):
result = {}
for key in l:
try:
result[key] += 1
except KeyError:
result[key] = 1
return result
# AP: Ask for Permission
# Before you do it , First ask if you can do it , You can do it again
def counter_ap(l):
result = {}
for key in l:
if key in result:
result[key] += 1
else:
result[key] = 1
return result
Whole Python Community to the first Ask for Forgiveness
There is a clear preference for the exception - trapping programming style of . There are many reasons for this , First , stay Python Throwing an exception in is a very lightweight operation . secondly , The first method is also better than the second in performance , Because it doesn't have to do an extra member check every time the loop runs .
however , The two pieces of code in the example are very rare in the real world . Why? ? Because if you want to count the times , Direct use collections.defaultdict
That's all right. :
from collections import defaultdict
def counter_by_collections(l):
result = defaultdict(int)
for key in l:
result[key] += 1
return result
Such code does not use either “ Get permission ”, No need “ Ask for forgiveness ”. The control flow of the whole code is more clear and natural . therefore , If possible , Please try your best to omit those Non core Exception capture logic for . Some tips :
collections.defaultdict
type dict[key] = dict.setdefault(key, 0) + 1
Built-in functions dict.pop(key, None)
dict.get(key, default_value)
IndexError
abnormal :["foo"][100:200]
next()
Is a very practical built-in function , It takes an iterator as a parameter , Then return the next element of the iterator . Use it in conjunction with generator expressions , Can efficiently achieve “ Find the first member in the list that meets the condition ” Such needs .
numbers = [3, 7, 8, 2, 21]
# Get and ** Return immediately ** The first even number in the list
print(next(i for i in numbers if i % 2 == 0))
# OUTPUT: 8
The structural characteristics of dictionaries and collections ensure that their members do not repeat , So they are often used to remove weight . however , The order of the original list will be lost after using both of them . This is made up of the underlying data structure “ Hashtable (Hash Table)” Determined by the characteristics of .
>>> l = [10, 2, 3, 21, 10, 3]
# De duplication but lost order
>>> set(l)
{3, 10, 2, 21}
What if the weight needs to be removed and the order must be preserved ? We can use collections.OrderedDict
modular :
>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys(l).keys())
[10, 2, 3, 21]
Hint: stay Python 3.6 in , The default dictionary type modifies the implementation , It has become orderly . And in Python 3.7 in , This function has been changed from Details of language implementation It became Dependable formal language features . But I think it makes the whole Python It will take some time for the community to get used to this , Now, after all, “ The dictionary is out of order ” It is still printed in countless books Python Book . therefore , I still recommend that you use it wherever you need an organized dictionary OrderedDict.
In front of the article , We mentioned the use of “ lazy ” The benefits of generators . however , All things have two sides . One of the biggest drawbacks of generators is : It will dry up . When you have traversed them completely , After repeated traversal, you can't get any new content
numbers = [1, 2, 3]
numbers = (i * 2 for i in numbers)
# The first cycle will output 2, 4, 6
for number in numbers:
print(number)
# This loop will not output anything , Because iterators have dried up
for number in numbers:
print(number)
And it's not just generator expressions ,Python 3 Inside map、filter Built in functions have the same characteristics . Ignoring this feature can easily lead to some imperceptible problems in the code Bug.
Instagram Just before the project starts Python 2 To Python 3 Encountered this problem during the migration of . They are PyCon 2017 Share the story of how to deal with this problem . Visit article Instagram stay PyCon 2017 Summary of your speech , Search for “ iterator ” Can view details .
This is a lot Python Mistakes beginners make . such as , We need a function to delete all even numbers from the list :
def remove_even(numbers):
""" Remove all even numbers from the list
"""
for i, number in enumerate(numbers):
if number % 2 == 0:
# Code in question
del numbers[i]
numbers = [1, 2, 7, 4, 8, 11]
remove_even(numbers)
print(numbers)
# OUTPUT: [1, 7, 8, 11]
Notice the extra one in the result “8” Did you? ? When you go through a list and modify it , This will happen . Because the object being iterated numbers
Modified during the loop . The subscript of traversal is increasing , The length of the list itself is also shrinking . This will cause some members in the list to not be traversed at all .
So for this kind of operation , Please use a new empty list to save the results , Or make use of yield
Return to a generator . Instead of modifying the iterated list or the dictionary object itself .
In this article , We start with “ Container type ” Starting from the definition of , The container types are discussed at the bottom and top levels . Then follow the tradition of the series , Provides some tips for writing container related code .
Let's finally summarize the main points :
After reading the article, you , What do you want to make complaints about? ? Please leave a message or at project Github Issues Tell me .
1) Python This language has nothing but CPython Outside , There are many other versions that implement . If there is no special instruction , This article and “Python craftsman ” All that appears in the series Python All in particular Python Of C Language implementation CPython
2) Python There is no such thing as in other programming languages “Interface Interface ” type , Only similar “ abstract class ” Concept . For the convenience of expression , The following contents are used uniformly “ Interface ” To replace “ abstract class ”.
3) Have you only realized Mapping But it's not MutableMapping The type of ? try MappingProxyType({})
4) Have you only realized Set But it's not MutableSet The type of ? try frozenset()
Python craftsman : Make good use of variables to improve code quality
Python craftsman : Skills in writing conditional branch code
Python craftsman : Tips for using strings and numbers
This article is edited and released by Tencent blue whale Zhiyun , Tencent blue whale Zhiyun ( Short for blue whale ) The software system is a set of systems based on PaaS Technology solutions for , Committed to building an industry-leading one-stop automatic operation and maintenance platform . At present, the community version has been launched 、 Enterprise Edition , Welcome to experience .