您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Python learning notes: parsing XML (the elementtree XML API)

編輯：Python

python Provide treatment （ Parse and create ）XML Interface of format file ：xml.etree.ElementTree（ hereinafter referred to as ET） modular .

> notes ： since version3.3 after ,xml.etree.cElementTree Module obsolescence .

One 、XML Format

XML Is a hierarchical data format , Usually it can be used “ Trees ” Express .ET There are two classes in （class） But for XML To said ：

ElementTree： Will the whole XML The file is represented as “ Trees ”;（class ET.ElementTree）
Element： Represents a single node in the tree .（class ET.Element）

Two 、 analysis XML

The following is an analysis of country_data.xml File as an example ：

<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>

2.1 Read XML Format file

2.1.1 Reading method

（1） Method 1 ： from file

import xml.etree.ElementTree as ET
# Method 1： from file
tree = ET.parse('XXXX.xml') # File storage path , Get the entire xml
root = tree.getroot() # obtain xml The root node

（2） Method 2 ： From file content （ character string ）

# Method 2： From string
root = ET.fromstring('XXXX.xml All strings of the file ')

explain ：ET.fromstring() Function will XML The contents of the document （ String format ） It is directly parsed into a Element object （ node ）, This Element It is this that is parsed XML Root node of tree .

2.1.2 Code

import xml.etree.ElementTree as ET
filePath = 'C:\codes\data\country_data.xml'
##method1: reading from a file
tree = ET.parse(filePath)
root = tree.getroot()
print(root.tag)
##method2: importing from a string
root2 = ET.fromstring('''<?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>''')
print (root2.tag)

Output ：

2.2 obtain Element Object properties

Serial number attribute Express data type give an example 1Element.tagelement name, It refers to the element Type of object character string

Input ：root.tag

Output ：data

2Element.attribelement atrribute's name and value Dictionaries

Input ：root[0].attrib

Output ：{‘name’:'Liechtenstein'}

3Element.textthe text between the element's start tag and its first child or end tag, or None.（ At present element start tag Adjacent to the next tag Text between ） Usually a string

Input ：root[0][0].text

Output ：1

4Element.tailthe text between the element's end tag and the next tag, or None.（ At present element end tag And the next one tag Text between ） Usually a string

Input ：root[0][0].tail

Output ：None

5Element.keys() Get the current object / Key of node attribute , Returns a list of list

Input ：root[0].keys()

Output ：['name']

6Element.items() Get the current object / Node attribute key value pairs , Returns a list of list[(,)]

Input ：root[0][3].items()

Output ：[('name', 'Austria'), ('direction', 'E')]

2.3 Inquire about subElement Object function

2.3.1 The search scope is current Element Object and all levels below

Iterator lookup ：Element.iter('tagname')

Query the current element Object and all levels below tag by tagname The object of （ Depth first search ）;
if tagname by None or ' * ', Then find the current element Object and all objects at all levels below .

# Get current element All levels under the object tag by Neighbor The object of
for neighbor in root.iter('neighbor'):
print(neighbor.attrib)

Output ：

2.3.2 The search scope is current Element The next level of the object

Element.findall(match)： Get current Element The next level of the object （ This layer only ） List of matching objects .
Element.iterfind(match)： Get current Element The next level of the object （ This layer only ） Matching object iterators .
Element.find(match)： Get current Element The first matching object in the next level of the object .
Element.findtext(match, default=None)： Get current Element Of the first matching object in the next level of the object text（ It fails to work well , There will be a lot of '\n（ Space ）'）.

# Find the current element The next level of the object
print('Using element.findall:')
ele1 = root.findall('country')
for every in ele1:
print(every.attrib)
print("Using element.iterfind:")
for every in root.iterfind('country'):
print(every.attrib)
print('Using element.itertext:')
for every in root.itertext():
if every.startswith('\n')==False:
print(every)
# Find the current element The first matching object at the next level of the object
print('Using element.find:')
ele = root.find('country')
print(ele.attrib)
print('Using element.findtext:')
ranktext = ele.findtext('rank')
print(ranktext)

Output ：