XML Extensible markup language (eXtensible Markup Language), A subset of Standard General Markup Languages , Is a kind of markup language used to mark electronic documents to make them have structure . You can learn from this website XML course
XML Designed to transmit and store data .
XML It's a set of rules for defining semantic tags , These tags divide the document into many parts and identify them .
It's also a meta markup language , That is to say, it is used to define other fields related to specific fields 、 Semantic 、 Structured markup language, syntactic language .
common XML Programming interface has DOM and SAX, These two interfaces handle XML The way of filing is different , Of course, the use occasion is also different .
Python There are three ways to parse XML,SAX,DOM, as well as ElementTree:
Python The standard library contains SAX Parser ,SAX Using event driven models , By parsing XML Trigger events one by one and call user-defined callback functions to handle XML file .
take XML Data is parsed into a tree in memory , By operating the tree XML.
This chapter uses XML Example files movies.xml The contents are as follows :
<collection shelf="New Arrivals">
<movie title="Enemy Behind">
<type>War, Thriller</type>
<format>DVD</format>
<year>2003</year>
<rating>PG</rating>
<stars>10</stars>
<description>Talk about a US-Japan war</description>
</movie>
<movie title="Transformers">
<type>Anime, Science Fiction</type>
<format>DVD</format>
<year>1989</year>
<rating>R</rating>
<stars>8</stars>
<description>A schientific fiction</description>
</movie>
<movie title="Trigun">
<type>Anime, Action</type>
<format>DVD</format>
<episodes>4</episodes>
<rating>PG</rating>
<stars>10</stars>
<description>Vash the Stampede!</description>
</movie>
<movie title="Ishtar">
<type>Comedy</type>
<format>VHS</format>
<rating>PG</rating>
<stars>2</stars>
<description>Viewable boredom</description>
</movie>
</collection>
SAX It's event driven API.
utilize SAX analysis XML The document involves two parts : Parser and Event handler .
The parser is responsible for reading XML file , And send events... To the event handler , Such as element start and element end events .
The event handler is responsible for responding to events , To deliver XML Data processing .
stay Python Use in sax Method handling xml We need to introduce xml.sax Medium parse function , also xml.sax.handler Medium ContentHandler.
characters(content) Method
Timing of invocation :
Start with the line , Before a label is encountered , There are characters ,content The value of is these strings .
From a label , Before meeting the next tag , There are characters ,content The value of is these strings .
From a label , Before the line terminator , There are characters ,content The value of is these strings .
Tags can be start tags , It can also be an end tag .
startDocument() Method
Called when the document starts .
endDocument() Method
Called when the parser reaches the end of the document .
startElement(name, attrs) Method
encounter XML Call... When starting the tag ,name It's the name of the label ,attrs Is the attribute value Dictionary of the tag .
endElement(name) Method
encounter XML Call... At the end of the tag .
The following method creates a new parser object and returns .
xml.sax.make_parser( [parser_list] )
Parameter description :
Here's how to create a SAX Parser and parse xml file :
xml.sax.parse( xmlfile, contenthandler[, errorhandler])
Parameter description :
parseString Method to create a XML Parser and parse xml character string :
xml.sax.parseString(xmlstring, contenthandler[, errorhandler])
Parameter description :
#!/usr/bin/python3
import xml.sax
class MovieHandler( xml.sax.ContentHandler ):
def __init__(self):
self.CurrentData = ""
self.type = ""
self.format = ""
self.year = ""
self.rating = ""
self.stars = ""
self.description = ""
# Element starts calling
def startElement(self, tag, attributes):
self.CurrentData = tag
if tag == "movie":
print ("*****Movie*****")
title = attributes["title"]
print ("Title:", title)
# Element end call
def endElement(self, tag):
if self.CurrentData == "type":
print ("Type:", self.type)
elif self.CurrentData == "format":
print ("Format:", self.format)
elif self.CurrentData == "year":
print ("Year:", self.year)
elif self.CurrentData == "rating":
print ("Rating:", self.rating)
elif self.CurrentData == "stars":
print ("Stars:", self.stars)
elif self.CurrentData == "description":
print ("Description:", self.description)
self.CurrentData = ""
# Call... When reading characters
def characters(self, content):
if self.CurrentData == "type":
self.type = content
elif self.CurrentData == "format":
self.format = content
elif self.CurrentData == "year":
self.year = content
elif self.CurrentData == "rating":
self.rating = content
elif self.CurrentData == "stars":
self.stars = content
elif self.CurrentData == "description":
self.description = content
if ( __name__ == "__main__"):
# Create a XMLReader
parser = xml.sax.make_parser()
# Close namespace
parser.setFeature(xml.sax.handler.feature_namespaces, 0)
# rewrite ContextHandler
Handler = MovieHandler()
parser.setContentHandler( Handler )
parser.parse("movies.xml")
The above code execution results are as follows :
*****Movie***** Title: Enemy Behind Type: War, Thriller Format: DVD Year: 2003 Rating: PG Stars: 10 Description: Talk about a US-Japan war *****Movie***** Title: Transformers Type: Anime, Science Fiction Format: DVD Year: 1989 Rating: R Stars: 8 Description: A schientific fiction *****Movie***** Title: Trigun Type: Anime, Action Format: DVD Rating: PG Stars: 10 Description: Vash the Stampede! *****Movie***** Title: Ishtar Type: Comedy Format: VHS Rating: PG Stars: 2 Description: Viewable boredom
complete SAX API Please refer to the document Python SAX APIs
File object model (Document Object Model, abbreviation DOM), yes W3C The standard programming interface recommended by the organization to handle extensible markup languages .
One DOM The parser is parsing a XML When the document , Read the entire document at once , Save all elements of the document in a tree structure in memory , Then you can use DOM Different functions are provided to read or modify the content and structure of the document , You can also write the modified content into xml file .
Python of use xml.dom.minidom Parsing xml file , Examples are as follows :
#!/usr/bin/python3
from xml.dom.minidom import parse
import xml.dom.minidom
# Use minidom The parser opens XML file
DOMTree = xml.dom.minidom.parse("movies.xml")
collection = DOMTree.documentElement
if collection.hasAttribute("shelf"):
print ("Root element : %s" % collection.getAttribute("shelf"))
# Get all the movies in the collection
movies = collection.getElementsByTagName("movie")
# Print details of each movie
for movie in movies:
print ("*****Movie*****")
if movie.hasAttribute("title"):
print ("Title: %s" % movie.getAttribute("title"))
type = movie.getElementsByTagName('type')[0]
print ("Type: %s" % type.childNodes[0].data)
format = movie.getElementsByTagName('format')[0]
print ("Format: %s" % format.childNodes[0].data)
rating = movie.getElementsByTagName('rating')[0]
print ("Rating: %s" % rating.childNodes[0].data)
description = movie.getElementsByTagName('description')[0]
print ("Description: %s" % description.childNodes[0].data)
The results of the above procedures are as follows :
Root element : New Arrivals *****Movie***** Title: Enemy Behind Type: War, Thriller Format: DVD Rating: PG Description: Talk about a US-Japan war *****Movie***** Title: Transformers Type: Anime, Science Fiction Format: DVD Rating: R Description: A schientific fiction *****Movie***** Title: Trigun Type: Anime, Action Format: DVD Rating: PG Description: Vash the Stampede! *****Movie***** Title: Ishtar Type: Comedy Format: VHS Rating: PG Description: Viewable boredom