程式師世界 >> 編程語言 >> C語言 >> C++ >> 關於C++ >> 淺談使用Rapidxml 庫遇到的問題和分析過程(分享)

淺談使用Rapidxml 庫遇到的問題和分析過程(分享)

編輯：關於C++

淺談使用Rapidxml 庫遇到的問題和分析過程(分享)。本站提示廣大學習愛好者：（淺談使用Rapidxml 庫遇到的問題和分析過程(分享)）文章只能為提供參考，不一定能成為您想要的結果。以下是淺談使用Rapidxml 庫遇到的問題和分析過程(分享)正文

淺談使用Rapidxml 庫遇到的問題和分析過程(分享)

投稿：jingxian

下面小編就為大家帶來一篇淺談使用Rapidxml 庫遇到的問題和分析過程(分享)。小編覺得挺不錯的，現在就分享給大家，也給大家做個參考。一起跟隨小編過來看看吧

C++解析xml的開源庫有很多，在此我就不一一列舉了，今天主要說下Rapidxml，我使用這個庫也並不是很多，如有錯誤之處還望大家能夠之處，謝謝。

附：

官方鏈接：http://rapidxml.sourceforge.net/

官方手冊：http://rapidxml.sourceforge.net/manual.html

之前有一次用到，碰到了個"坑"，當時時間緊迫並未及時查找，今天再次用到這個庫，對這樣的"坑"不能踩第二次，因此我決定探個究竟。

先寫兩段示例：

創建xm：

void CreateXml()
{
  rapidxml::xml_document<> doc;
  
  auto nodeDecl = doc.allocate_node(rapidxml::node_declaration);
  nodeDecl->append_attribute(doc.allocate_attribute("version", "1.0"));
  nodeDecl->append_attribute(doc.allocate_attribute("encoding", "UTF-8"));
  doc.append_node(nodeDecl);//添加xml聲明
  
  auto nodeRoot = doc.allocate_node(rapidxml::node_element, "Root");//創建一個Root節點
  nodeRoot->append_node(doc.allocate_node(rapidxml::node_comment, NULL, "編程語言"));//添加一個注釋內容到Root，注釋沒有name 所以第二個參數為NULL
  auto nodeLangrage = doc.allocate_node(rapidxml::node_element, "language", "This is C language");//創建一個language節點
  nodeLangrage->append_attribute(doc.allocate_attribute("name", "C"));//添加一個name屬性到language
  nodeRoot->append_node(nodeLangrage); //添加一個language到Root節點
  nodeLangrage = doc.allocate_node(rapidxml::node_element, "language", "This is C++ language");//創建一個language節點
  nodeLangrage->append_attribute(doc.allocate_attribute("name", "C++"));//添加一個name屬性到language
  nodeRoot->append_node(nodeLangrage); //添加一個language到Root節點

  doc.append_node(nodeRoot);//添加Root節點到Document
  std::string buffer;
  rapidxml::print(std::back_inserter(buffer), doc, 0);
  std::ofstream outFile("language.xml");
  outFile << buffer;
  outFile.close();
}

結果：

 <?xml version="1.0" encoding="UTF-8"?>
 <Root>
   <!--編程語言-->
   <language name="C">This is C language</language>
   <language name="C++">This is C++ language</language>
 </Root>

修改xml：

void MotifyXml()
{
  rapidxml::file<> requestFile("language.xml");//從文件加載xml
  rapidxml::xml_document<> doc;
  doc.parse<0>(requestFile.data());//解析xml

  auto nodeRoot = doc.first_node();//獲取第一個節點，也就是Root節點
  auto nodeLanguage = nodeRoot->first_node("language");//獲取Root下第一個language節點
  nodeLanguage->first_attribute("name")->value("Motify C");//修改language節點的name屬性為 Motify C
  std::string buffer;
  rapidxml::print(std::back_inserter(buffer), doc, 0);
  std::ofstream outFile("MotifyLanguage.xml");
  outFile << buffer;
  outFile.close();
}

結果：

 <Root>
   <language name="Motify C">This is C language</language>
   <language name="C++">This is C++ language</language>
 </Root>

由第二個結果得出：

第一個language的name屬性確實改成我們所期望的值了，不過不難發現xml的聲明和注釋都消失了。是怎麼回事呢？這個問題也困擾了我一段時間，既然是開源庫，那我們跟一下看看他都干了什麼，從代碼可以看出可疑的地方主要有兩處：print和parse，這兩個函數均需要提供一個flag，這個flag到底都干了什麼呢，從官方給的教程來看均使用的0，既然最終執行的是print我們就從print開始調試跟蹤吧

找到了找到print調用的地方：

template<class OutIt, class Ch> 
   inline OutIt print(OutIt out, const xml_node<Ch> &node, int flags = 0)
   {
     return internal::print_node(out, &node, flags, 0);
   }

繼續跟蹤：

// Print node
    template<class OutIt, class Ch>
    inline OutIt print_node(OutIt out, const xml_node<Ch> *node, int flags, int indent)
    {
      // Print proper node type
      switch (node->type())
      {

      // Document
      case node_document:
        out = print_children(out, node, flags, indent);
        break;

      // Element
      case node_element:
        out = print_element_node(out, node, flags, indent);
        break;
      
      // Data
      case node_data:
        out = print_data_node(out, node, flags, indent);
        break;
      
      // CDATA
      case node_cdata:
        out = print_cdata_node(out, node, flags, indent);
        break;

      // Declaration
      case node_declaration:
        out = print_declaration_node(out, node, flags, indent);
        break;

      // Comment
      case node_comment:
        out = print_comment_node(out, node, flags, indent);
        break;
      
      // Doctype
      case node_doctype:
        out = print_doctype_node(out, node, flags, indent);
        break;

      // Pi
      case node_pi:
        out = print_pi_node(out, node, flags, indent);
        break;

        // Unknown
      default:
        assert(0);
        break;
      }
      
      // If indenting not disabled, add line break after node
      if (!(flags & print_no_indenting))
        *out = Ch('\n'), ++out;

      // Return modified iterator
      return out;
    }

跟進print_children 發現這實際是個遞歸，我們繼續跟蹤

// Print element node
template<class OutIt, class Ch>
inline OutIt print_element_node(OutIt out, const xml_node<Ch> *node, int flags, int indent)
{
  assert(node->type() == node_element);

  // Print element name and attributes, if any
  if (!(flags & print_no_indenting))
  ...//省略部分代碼
  
  return out;
}

我們發現第8行有一個&判斷查看print_no_indenting的定義：

// Printing flags
const int print_no_indenting = 0x1;  //!< Printer flag instructing the printer to suppress indenting of XML. See print() function.

據此我們就可以分析了，按照開發風格統一的思想，parse也應該有相同的標志定義

省略分析parse流程..

我也順便去查看了官方文檔，確實和我預想的一樣，貼一下頭文件中對這些標志的描述，詳細信息可參考官方文檔

// Parsing flags

  //! Parse flag instructing the parser to not create data nodes. 
  //! Text of first data node will still be placed in value of parent element, unless rapidxml::parse_no_element_values flag is also specified.
  //! Can be combined with other flags by use of | operator.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_no_data_nodes = 0x1;      

  //! Parse flag instructing the parser to not use text of first data node as a value of parent element.
  //! Can be combined with other flags by use of | operator.
  //! Note that child data nodes of element node take precendence over its value when printing. 
  //! That is, if element has one or more child data nodes <em>and</em> a value, the value will be ignored.
  //! Use rapidxml::parse_no_data_nodes flag to prevent creation of data nodes if you want to manipulate data using values of elements.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_no_element_values = 0x2;
  
  //! Parse flag instructing the parser to not place zero terminators after strings in the source text.
  //! By default zero terminators are placed, modifying source text.
  //! Can be combined with other flags by use of | operator.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_no_string_terminators = 0x4;
  
  //! Parse flag instructing the parser to not translate entities in the source text.
  //! By default entities are translated, modifying source text.
  //! Can be combined with other flags by use of | operator.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_no_entity_translation = 0x8;
  
  //! Parse flag instructing the parser to disable UTF-8 handling and assume plain 8 bit characters.
  //! By default, UTF-8 handling is enabled.
  //! Can be combined with other flags by use of | operator.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_no_utf8 = 0x10;
  
  //! Parse flag instructing the parser to create XML declaration node.
  //! By default, declaration node is not created.
  //! Can be combined with other flags by use of | operator.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_declaration_node = 0x20;
  
  //! Parse flag instructing the parser to create comments nodes.
  //! By default, comment nodes are not created.
  //! Can be combined with other flags by use of | operator.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_comment_nodes = 0x40;
  
  //! Parse flag instructing the parser to create DOCTYPE node.
  //! By default, doctype node is not created.
  //! Although W3C specification allows at most one DOCTYPE node, RapidXml will silently accept documents with more than one.
  //! Can be combined with other flags by use of | operator.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_doctype_node = 0x80;
  
  //! Parse flag instructing the parser to create PI nodes.
  //! By default, PI nodes are not created.
  //! Can be combined with other flags by use of | operator.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_pi_nodes = 0x100;
  
  //! Parse flag instructing the parser to validate closing tag names. 
  //! If not set, name inside closing tag is irrelevant to the parser.
  //! By default, closing tags are not validated.
  //! Can be combined with other flags by use of | operator.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_validate_closing_tags = 0x200;
  
  //! Parse flag instructing the parser to trim all leading and trailing whitespace of data nodes.
  //! By default, whitespace is not trimmed. 
  //! This flag does not cause the parser to modify source text.
  //! Can be combined with other flags by use of | operator.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_trim_whitespace = 0x400;

  //! Parse flag instructing the parser to condense all whitespace runs of data nodes to a single space character.
  //! Trimming of leading and trailing whitespace of data is controlled by rapidxml::parse_trim_whitespace flag.
  //! By default, whitespace is not normalized. 
  //! If this flag is specified, source text will be modified.
  //! Can be combined with other flags by use of | operator.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_normalize_whitespace = 0x800;

  // Compound flags
  
  //! Parse flags which represent default behaviour of the parser. 
  //! This is always equal to 0, so that all other flags can be simply ored together.
  //! Normally there is no need to inconveniently disable flags by anding with their negated (~) values.
  //! This also means that meaning of each flag is a <i>negation</i> of the default setting. 
  //! For example, if flag name is rapidxml::parse_no_utf8, it means that utf-8 is <i>enabled</i> by default,
  //! and using the flag will disable it.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_default = 0;
  
  //! A combination of parse flags that forbids any modifications of the source text. 
  //! This also results in faster parsing. However, note that the following will occur:
  //! <ul>
  //! <li>names and values of nodes will not be zero terminated, you have to use xml_base::name_size() and xml_base::value_size() functions to determine where name and value ends</li>
  //! <li>entities will not be translated</li>
  //! <li>whitespace will not be normalized</li>
  //! </ul>
  //! See xml_document::parse() function.
  const int parse_non_destructive = parse_no_string_terminators | parse_no_entity_translation;
  
  //! A combination of parse flags resulting in fastest possible parsing, without sacrificing important data.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_fastest = parse_non_destructive | parse_no_data_nodes;
  
  //! A combination of parse flags resulting in largest amount of data being extracted. 
  //! This usually results in slowest parsing.
  //! <br><br>
  //! See xml_document::parse() function.
  const int parse_full = parse_declaration_node | parse_comment_nodes | parse_doctype_node | parse_pi_nodes | parse_validate_closing_tags;

根據以上提供的信息我們改下之前的源代碼：

將

 doc.parse<0>(requestFile.data());//解析xml
 auto nodeRoot = doc.first_node("");//獲取第一個節點，也就是Root節點

改為

 doc.parse<rapidxml::parse_declaration_node | rapidxml::parse_comment_nodes | rapidxml::parse_non_destructive>(requestFile.data());//解析xml
 auto nodeRoot = doc.first_node("Root");//獲取第一個節點，也就是Root節點

這裡解釋一下，parse加入了三個標志，分別是告訴解析器創建聲明節點、告訴解析器創建注釋節點、和不希望解析器修改傳進去的數據，第二句是當有xml的聲明時，默認的first_node並不是我們期望的Root節點，因此通過傳節點名來找到我們需要的節點。

注：

1、這個庫在append的時候並不去判斷添加項（節點、屬性等）是否存在

2、循環遍歷時對項（節點、屬性等）進行修改會導致迭代失效

總結：用別人寫的庫，總會有些意想不到的問題，至今我只遇到了這些問題，如果還有其它問題歡迎補充，順便解釋下"坑"並不一定是用的開源庫有問題，更多的時候可能是還沒有熟練的去使用這個工具。

感謝rapidxml的作者，為我們提供一個如此高效便利的工具。

以上這篇淺談使用Rapidxml 庫遇到的問題和分析過程(分享)就是小編分享給大家的全部內容了，希望能給大家一個參考，也希望大家多多支持。