您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Description of Python encode and decode functions

編輯：Python

Common types of string encoding ：utf-8,gb2312,cp936,gbk etc. .

python in , We use decode() and encode() To decode and encode

stay python in , Use unicode Type as the base type of encoding . namely

decode encode

str ---------> unicode --------->str

u = u' chinese ' # According to specified unicode Type object u
str = u.encode('gb2312') # With gb2312 Code pair unicode Encode images
str1 = u.encode('gbk') # With gbk Code pair unicode Encode images
str2 = u.encode('utf-8') # With utf-8 Code pair unicode Encode images
u1 = str.decode('gb2312')# With gb2312 Code for string str decode , In order to get unicode
u2 = str.decode('utf-8')# If the utf-8 Coding pairs for str The result of decoding , You will not be able to restore the original unicode type

Like the code above ,str\str1\str2 Are of string type （str）, It brings more complexity to string operation .

The good news is here. , That's it python3, In the new version of python3 in , To cancel the unicode type , In its place is the use of unicode Character string type (str), String type （str） Become the base type as follows , After encoding, it becomes byte type (bytes), But the use of the two functions does not change ：

decode encode

bytes ------> str(unicode)------>bytes

u = ' chinese ' # Specifies a string type object u
str = u.encode('gb2312') # With gb2312 Code pair u Encoding , get bytes Type object str
u1 = str.decode('gb2312')# With gb2312 Code for string str decode , Get string type object u1
u2 = str.decode('utf-8')# If the utf-8 Coding pairs for str The result of decoding , You will not be able to restore the original string contents

Inevitably , File reading problem ：

Suppose we read a file , When the file is saved , The encoding format used , Determines the encoding format of the content we read from the file , for example , Let's create a new text file from Notepad test.txt, Edit content , Be careful when saving , The encoding format is optional , For example, we can choose gb2312, So use python Read file contents , The way is as follows ：

f = open('test.txt','r')
s = f.read() # Read file contents , If it is unrecognized encoding Format （ Identification of the encoding The type depends on the system used ）, Here, the read fails
''' Assume that the file is saved in gb2312 Encoding preservation '''
u = s.decode('gb2312') # Decode the content in a file save format , get unicode character string
''' Now we can perform various encoding transformations on the content '''
str = u.encode('utf-8')# Convert to utf-8 Encoded string str
str1 = u.encode('gbk')# Convert to gbk Encoded string str1
str1 = u.encode('utf-16')# Convert to utf-16 Encoded string str1

python Provided us with a package codecs Read the file , The... In this bag open() The function can specify the type of encoding ：

import codecs
f = codecs.open('text.text','r+',encoding='utf-8')# The coding format of the document must be known in advance , Here, the file code is used utf-8
content = f.read()# If open The use of encoding And the document itself encoding In case of disagreement , Then there will be an error
f.write(' The information you want to write ')
f.close()

encode() and decode()

decode English means decode ,encode Original English meaning code
The string is in Python The internal expression is unicode code , therefore , When doing code conversion , It is usually necessary to unicode As an intermediate code , That is, decoding other encoded strings first （decode） become unicode, Again from unicode code （encode） Into another code .
decode Is used to convert other encoded strings into unicode code , Such as str1.decode('gb2312'), It means that you will gb2312 Encoded string str1 convert to unicode code .
encode The role of the unicode The encoding is converted to other encoded strings , Such as str2.encode('gb2312'), It means that you will unicode Encoded string str2 convert to gb2312 code .
Always mean : Want to convert other codes into utf-8 It must first be decoded into unicode Then recode it into utf-8, It is a unicode For the medium of transformation Such as ： s=' chinese ' If it's in utf8 In the file of , The string is utf8 code , If it's in gb2312 In the file of , The code is gb2312. In this case , To do code conversion , You need to use it first decode Method to convert it to unicode code , Reuse encode Method to convert it into other encoding . Usually , When no specific encoding method is specified , They are all code files created using the system default encoding