This article introduces “Python Of struct And how to use formatting characters ” Knowledge about , During the operation of the actual case , Many people will encounter such difficulties , Next, let Xiaobian lead you to learn how to deal with these situations ! I hope you will read carefully , Be able to learn !
There are two ways to store the contents of a file , One is binary , One is the form of text . If it's stored as text in a file , When reading from a file, you will encounter a problem that converts the text to Python The problem of data type in . In fact, even in the form of text , The stored data is also structured , because Python The bottom layer is made of C To write the , Here we also call it C structure .
Lib/struct.py It's the module responsible for this kind of structural transformation .
Let's take a look at struct The definition of :
__all__ = [ # Functions 'calcsize', 'pack', 'pack_into', 'unpack', 'unpack_from', 'iter_unpack', # Classes 'Struct', # Exceptions 'error' ]
Among them is 6 A way ,1 Exceptions .
Let's mainly look at this 6 The use of two methods :
These methods are mainly the operations of packing and unpacking , One of the most important parameters is format, Also known as the format string , It specifies the format in which each string is packaged .
Format strings are the mechanism used to specify the data format when packaging and unpacking data . They are packaged with the specified / Unpacking data type Format characters Build . Besides , There are also special characters to control Byte order , Size and alignment .
By default ,C Types are expressed in the machine's native format and byte order , And align it correctly by padding bytes if necessary ( according to C The rules used by the compiler ).
We can also manually specify the byte order of the format string , Size and alignment :
Big end and small end are two ways of data storage .
The first one is Big Endian Store the high byte in the starting address
The second kind Little Endian Store the byte of status in the starting address
Actually Big Endian More in line with human reading and writing habits , and Little Endian More in line with the machine's reading and writing habits .
At present, the two main trends are CPU Camp ,PowerPC Series adoption big endian How to store data , and x86 The series uses little endian How to store data .
If different CPU Architecture communicates directly , Because of the different reading order, there may be problems .
Padding is only added automatically between consecutive structure members . Padding is not added to the beginning and end of the encoded structure .
When using non primitive byte size and alignment, that is '<', '>', '=', and '!' No padding will be added when .
Let's look at the formats of characters :
for instance , For example, we need to pack one int object , We can write this way :
In [101]: from struct import * In [102]: pack('i',10) Out[102]: b'\n\x00\x00\x00' In [103]: unpack('i',b'\n\x00\x00\x00') Out[103]: (10,) In [105]: calcsize('i') Out[105]: 4
In the example above , We packed one int object 10, And then unpack it . And calculated i The length of this format is 4 byte .
You can see that the output is b'\n\x00\x00\x00' , Let's not go into the meaning of this output , At the beginning b It means byte, And then byte The coding .
The format character can be preceded by an integer repeat count . for example , Format string '4h' Meaning and 'hhhh' Exactly the same .
Let's see how to pack 4 individual short type :
In [106]: pack('4h',2,3,4,5) Out[106]: b'\x02\x00\x03\x00\x04\x00\x05\x00' In [107]: unpack('4h',b'\x02\x00\x03\x00\x04\x00\x05\x00') Out[107]: (2, 3, 4, 5)
White space between formats is ignored , But if it is struct.calcsize Method, there must be no white space in the format character .
When using an integer format ('b', 'B', 'h', 'H', 'i', 'I', 'l', 'L', 'q', 'Q') Packing value x when , If x Outside the valid range of the format, a struct.error.
Besides the numbers , The most commonly used are characters and strings .
Let's see how to use format characters first , Because the length of the character is 1 Bytes , We need to do that :
In [109]: pack('4c',b'a',b'b',b'c',b'd') Out[109]: b'abcd' In [110]: unpack('4c',b'abcd') Out[110]: (b'a', b'b', b'c', b'd') In [111]: calcsize('4c') Out[111]: 4
Before the character b, Indicates that this is a character , Otherwise it will be treated as a string .
Let's look at the format of the string :
In [114]: pack('4s',b'abcd') Out[114]: b'abcd' In [115]: unpack('4s',b'abcd') Out[115]: (b'abcd',) In [116]: calcsize('4s') Out[116]: 4 In [117]: calcsize('s') Out[117]: 1
You can see that for Strings calcsize Returns the length of the byte .
The order of format characters can have an impact on size , Because the padding required to meet the alignment requirements is different :
>>> pack('ci', b'*', 0x12131415) b'*\x00\x00\x00\x12\x13\x14\x15' >>> pack('ic', 0x12131415, b'*') b'\x12\x13\x14\x15*' >>> calcsize('ci') 8 >>> calcsize('ic') 5
In the following example, we will show how to manually affect the fill effect :
In [120]: pack('llh',1, 2, 3) Out[120]: b'\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00'
In the example above , We pack 1,2,3 These three numbers , But the format is different , Namely long,long,short.
because long yes 4 Bytes ,short yes 2 Bytes , So it's essentially misaligned .
If you want to align , We can add... After that 0l Express 0 individual long, This allows manual filling :
In [118]: pack('llh0l', 1, 2, 3) Out[118]: b'\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00' In [122]: unpack('llh0l',b'\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00') Out[122]: (1, 2, 3)
Finally, let's look at the application of a complex point , This application comes directly from unpack The data is read into the tuple :
>>> record = b'raymond \x32\x12\x08\x01\x08' >>> name, serialnum, school, gradelevel = unpack('<10sHHb', record) >>> from collections import namedtuple >>> Student = namedtuple('Student', 'name serialnum school gradelevel') >>> Student._make(unpack('<10sHHb', record)) Student(name=b'raymond ', serialnum=4658, school=264, gradelevel=8)
“Python Of struct And how to use formatting characters ” That's all for , Thanks for reading . If you want to know more about the industry, you can pay attention to Yisu cloud website , Xiaobian will output more high-quality practical articles for you !