python提供處理(解析和創建)XML格式文件的接口:xml.etree.ElementTree(以下簡稱ET) 模塊。
> 注:自version3.3後,xml.etree.cElementTree模塊廢棄。
XML是一種層級數據格式,通常可以用“樹”表示。ET中有兩個類(class)可對XML進行表示:
下文以解析 country_data.xml 文件為例:
<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>
(1)方法一:來自文件
import xml.etree.ElementTree as ET
#方法1:來自文件
tree = ET.parse('XXXX.xml') #文件存儲路徑,獲取整個xml
root = tree.getroot() #獲取xml的根節點
(2) 方法二:來自文件內容(字符串)
#方法2:來自字符串
root = ET.fromstring('XXXX.xml文件的全部字符串')
說明:ET.fromstring() 函數將XML文件內容(字符串格式)直接解析為一個Element對象(節點),這個Element是這個被解析的XML樹的根節點。
import xml.etree.ElementTree as ET
filePath = 'C:\codes\data\country_data.xml'
##method1: reading from a file
tree = ET.parse(filePath)
root = tree.getroot()
print(root.tag)
##method2: importing from a string
root2 = ET.fromstring('''<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>''')
print (root2.tag)
輸出:
輸入:root.tag
輸出:data
2Element.attribelement atrribute's name and value字典輸入:root[0].attrib
輸出:{‘name’:'Liechtenstein'}
3Element.textthe text between the element's start tag and its first child or end tag, or None.(當前element起始tag與下一個鄰近tag之間的文本)通常為字符串輸入:root[0][0].text
輸出:1
4Element.tailthe text between the element's end tag and the next tag, or None.(當前element結束tag與下一個tag之間的文本)通常為字符串輸入:root[0][0].tail
輸出:None
5Element.keys()獲取當前對象/節點屬性的鍵,返回列表list輸入:root[0].keys()
輸出:['name']
6Element.items()獲取當前對象/節點屬性鍵值對,返回列表list[(,)]輸入:root[0][3].items()
輸出:[('name', 'Austria'), ('direction', 'E')]
迭代器查找:Element.iter('tagname')
#獲取當前element對象下所有層級tag為Neighbor的對象
for neighbor in root.iter('neighbor'):
print(neighbor.attrib)
輸出:
#查找當前element對象下一層級
print('Using element.findall:')
ele1 = root.findall('country')
for every in ele1:
print(every.attrib)
print("Using element.iterfind:")
for every in root.iterfind('country'):
print(every.attrib)
print('Using element.itertext:')
for every in root.itertext():
if every.startswith('\n')==False:
print(every)
#查找當前element對象下一層級第一個匹配對象
print('Using element.find:')
ele = root.find('country')
print(ele.attrib)
print('Using element.findtext:')
ranktext = ele.findtext('rank')
print(ranktext)
輸出: