您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Probe into Python standard library ~ [wonderful use of eight built-in modules]

編輯：Python

Preface

Python The loveliest thing about language is that its standard library and tripartite library are so rich , Many tasks in daily development work can be directly solved through these standard libraries or third-party libraries . Let's first introduce Python Some common modules in the standard library , Later, we will continue to introduce to you Python Uses and usages of common third-party libraries .

Catalog

Preface
- base64 - Base64 Codec module
- collections - Container data type module
- hashlib - Hash function module
- heapq - Heap sort module
- itertools - Iteration tool module
- random - Random number and random sampling module
- os.path - Path operation related modules
- uuid - UUID Build module
summary

base64 - Base64 Codec module

Base64 It's based on 64 A printable character to represent binary data . because l o g 2 64 = 6 log _{2}64=6 log264=6, therefore Base64 With 6 A bit （ Binary digit , Can be said 0 or 1） For a unit , Each cell corresponds to a printable character . about 3 byte （24 The bit ） Binary data of , We can process it to correspond to 4 individual Base64 unit , namely 3 Bytes by 4 Three printable characters .
Base64 Encoding can be used as the transmission encoding of e-mail , It can also be used in other scenarios where binary data needs to be converted into text characters , This makes it possible to XML、JSON、YAML It is possible to transfer binary content in these text data formats . stay Base64 Printable characters in include A-Z、a-z、0-9, Here is 62 Characters , The other two printable symbols are usually + and /,= Used in Base64 At the end of the coding, carry out filling .

About Base64 Coding details , You can refer to 《Base64 note 》 One article ,Python In the standard library base64 The module provides b64encode and b64decode Two functions , Dedicated to the realization of Base64 Code and decode , The following is a demonstration of Python The interactive environment The effect of executing these two functions in .

>>> import base64
>>>
>>> content = 'Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure.'
>>> base64.b64encode(content.encode())
b'TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2YgdGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGludWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRoZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4='
>>> content = b'TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2YgdGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGludWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRoZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4='
>>> base64.b64decode(content).decode()
'Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure.'

collections - Container data type module

collections The module provides many very useful data structures , It mainly includes ：

namedtuple： Command tuples , It is a kind of factory , Create a class by accepting the name of the type and the list of properties .
deque： deque , Is an alternative implementation of the list .Python The bottom layer of the list in is based on the array , and deque The bottom layer is a two-way linked list , So when you need to add and remove elements at the beginning and end, it is ,deque Will show better performance , The asymptotic time complexity is O ( 1 ) O(1) O(1).
Counter：dict Subclasses of , Keys are elements , Value is the count of elements , its most_common() Method can help us get the most frequent elements .Counter and dict I think it is worth discussing , according to CARP principle ,Counter Follow dict The relationship should be designed to be more reasonable .
OrderedDict：dict Subclasses of , It records the order in which key value pairs are inserted , It seems that the behavior of the existing dictionary , There is also linked list behavior .
defaultdict： Similar to dictionary type , But you can get the default value corresponding to the key through the default factory function , Compared to the dictionary setdefault() Method , This approach is more efficient .

The following is the Python In an interactive environment Use namedtuple An example of creating a poker class .

>>> from collections import namedtuple
>>>
>>> Card = namedtuple('Card', ('suite', 'face'))
>>> card1 = Card(' Heart ', 5)
>>> card2 = Card(' Grass flower ', 9)
>>> card1
Card(suite=' Heart ', face=5)
>>> card2
Card(suite=' Grass flower ', face=9)
>>> print(f'{
card1.suite}{
card1.face}')
Heart 5
>>> print(f'{
card2.suite}{
card2.face}')
Grass flower 9

Here's how to use Counter Class counts the three elements that appear most frequently in the list .

from collections import Counter
words = [
'look', 'into', 'my', 'eyes', 'look', 'into', 'my', 'eyes',
'the', 'eyes', 'the', 'eyes', 'the', 'eyes', 'not', 'around',
'the', 'eyes', "don't", 'look', 'around', 'the', 'eyes',
'look', 'into', 'my', 'eyes', "you're", 'under'
]
counter = Counter(words)
# Print words The most frequent... In the list 3 Elements and their occurrences 
for elem, count in counter.most_common(3):
print(elem, count)

hashlib - Hash function module

Hash function is also called hash algorithm or hash function , Is a method of creating for existing data “ Digital fingerprinting ”（ Hash Digest ） Methods . The hash function compresses the data into a digest , For the same input , The hash function can generate the same digest （ Digital fingerprinting ）, It should be noted that this process is not reversible （ The input cannot be calculated from the summary ）. A good hash function can generate different summaries for different inputs , Hash conflict occurred （ Different inputs produce the same summary ） The probability is very low ,MD5、SHA Families are such good hash functions .

explain ： stay 2011 In the year ,RFC 6151 It has been forbidden to use MD5 Used as key hash message authentication code , This problem is beyond the scope of our discussion .

Python Standard library hashlib Module provides the encapsulation of hash function , By using md5、sha1、sha256 Such as , We can easily generate “ Digital fingerprinting ”. Let's take a simple example , When a user registers, we want to save the user's password in the database , Obviously, we can't store user passwords directly in the database , This may lead to the disclosure of user privacy , So when saving the user password in the database , Usually the password will be “ The fingerprint ” Save up , When the user logs in, the hash function is used to calculate the password “ The fingerprint ” Then match to determine whether the user login is successful .

import hashlib
# Compute string "123456" Of MD5 Abstract 
print(hashlib.md5('123456'.encode()).hexdigest())
# Calculation file "Python-3.7.1.tar.xz" Of MD5 Abstract 
hasher = hashlib.md5()
with open('Python-3.7.1.tar.xz', 'rb') as file:
data = file.read(512)
while data:
hasher.update(data)
data = file.read(512)
print(hasher.hexdigest())

explain ： Many websites provide hash summaries next to download links , After downloading the file , We can calculate the hash summary of the file and check whether it is consistent with the hash summary provided on the website （ Fingerprint comparison ）. If the calculated hash summary is not consistent with that provided by the website , It is likely that the download error or the file has been tampered with during the transmission process , This file should not be used directly .

heapq - Heap sort module

heapq Module implements the heap sorting algorithm , If you want to use heap sorting , Especially to solve TopK problem （ Find... From the sequence K A maximum or minimum element ）, Use this module directly , The code is as follows .

import heapq
list1 = [34, 25, 12, 99, 87, 63, 58, 78, 88, 92]
# Find the three largest elements in the list 
print(heapq.nlargest(3, list1))
# Find the smallest three elements in the list 
print(heapq.nsmallest(3, list1))
list2 = [
{
'name': 'IBM', 'shares': 100, 'price': 91.1},
{
'name': 'AAPL', 'shares': 50, 'price': 543.22},
{
'name': 'FB', 'shares': 200, 'price': 21.09},
{
'name': 'HPQ', 'shares': 35, 'price': 31.75},
{
'name': 'YHOO', 'shares': 45, 'price': 16.35},
{
'name': 'ACME', 'shares': 75, 'price': 115.65}
]
# Find the top three stocks 
print(heapq.nlargest(3, list2, key=lambda x: x['price']))
# Find the three stocks with the highest number of shares 
print(heapq.nlargest(3, list2, key=lambda x: x['shares']))

itertools - Iteration tool module

itertools It can help us generate various iterators , You can take a look at the following examples .

import itertools
# produce ABCD The whole arrangement 
for value in itertools.permutations('ABCD'):
print(value)
# produce ABCDE Three out of five combinations 
for value in itertools.combinations('ABCDE', 3):
print(value)
# produce ABCD and 123 Cartesian product of 
for value in itertools.product('ABCD', '123'):
print(value)
# produce ABC Infinite cyclic sequence of 
it = itertools.cycle(('A', 'B', 'C'))
print(next(it))
print(next(it))
print(next(it))
print(next(it))

random - Random number and random sampling module

We have used this module many times before , Generate random number 、 Realize random random disorder and random sampling , Here is a list of commonly used functions .

getrandbits(k)： Returns a k An integer of random bits .
randrange(start, stop[, step])： from range(start, stop, step) Returns a randomly selected element , But it doesn't actually build a range object .
randint(a, b)： Returns a random integer N Satisfy a <= N <= b, amount to randrange(a, b+1).
choice(seq)： From non empty sequence seq Returns a random element . If seq It's empty , The cause IndexError.
choices(population, weight=None, *, cum_weights=None, k=1)： from population Choose to replace , The return size is k The list of elements . If population It's empty , The cause IndexError.
shuffle(x[, random])： The sequence of x Randomly disrupt the position .
sample(population, k)： Returns the selection from the overall sequence or set k A list of non repeating element constructions , For random sampling without repetition .
random()： return [0.0, 1.0) The next random floating-point number in the range .
expovariate(lambd)： An index distribution .
gammavariate(alpha, beta)： Gamma distribution .
gauss(mu, sigma) / normalvariate(mu, sigma)： Normal distribution .
paretovariate(alpha)： Pareto distribution .
weibullvariate(alpha, beta)： Weibull distribution .

os.path - Path operation related modules

os.path The module encapsulates the tool functions of the operation path , If the file path needs to be spliced in the program 、 Split 、 Get and get the existence and other properties of the file , This module will be very helpful , Here are some common functions for you .

dirname(path)： Return path path Directory name .
exists(path)： If path Point to an existing path or an open file descriptor , return True.
getatime(path) / getmtime(path) / getctime(path)： return path Last access time for / Last modified / Creation time .
getsize(path)： return path Size , In bytes . If the file does not exist or is inaccessible , Throw out OSError abnormal .
isfile(path)： If path It's an ordinary document , Then return to True.
isdir(path)： If path Is a directory （ Folder ）, Then return to True.
join(path, *paths)： Reasonably splice one or more path parts . The return value is path and paths Connection of all values , Each non empty part is followed by a directory separator (os.sep), Except for the last part . This means that if the last part is empty , The result will end with a delimiter . If a part of the parameter is an absolute path , Then all paths before the absolute path will be discarded , And connect from the absolute path part .
splitext(path)： Route path Split into a pair , namely (root, ext), bring root + ext == path, among ext Empty or start with English period , And contain at most one period .

uuid - UUID Build module

uuid Modules can help us generate globally unique identifiers （Universal Unique IDentity）. This module provides four for generating UUID Function of , Namely ：

uuid1()： from MAC Address 、 Current timestamp 、 Random number generation , It can guarantee the uniqueness in the world .
uuid3(namespace, name)： By calculating the name of the namespace and MD5 Hash Digest （“ The fingerprint ”） Worthy of , It ensures the uniqueness of different names in the same namespace , And the uniqueness of different namespace , But the same name in the same namespace will generate the same UUID.
uuid4()： Generated from pseudo-random numbers UUID, There is a certain probability of repetition , The probability can be calculated .
uuid5()： Algorithm and uuid3 identical , Only the hash function uses SHA-1 To replace the MD5.

because uuid4 There is a probabilistic repetition , It is better not to use a globally unique identifier where it is really needed . In a distributed environment ,uuid1 It's a good choice , Because it can guarantee the generation of ID Global uniqueness of . The following is the Python In an interactive environment Use uuid1 Function to generate a globally unique identifier .

>>> import uuid
>>> uuid.uuid1().hex
'622a8334baab11eaaa9c60f81da8d840'
>>> uuid.uuid1().hex
'62b066debaab11eaaa9c60f81da8d840'
>>> uuid.uuid1().hex
'642c0db0baab11eaaa9c60f81da8d840'

summary

Python There are a large number of modules in the standard library , There are many common tasks in daily development Python There are encapsulated functions or classes available in the standard library , This is also Python The loveliest part of this language .