程式師世界 >> 編程語言 >> .NET網頁編程 >> C# >> C#入門知識 >> C#小程序實現從百度摘取搜索結果

C#小程序實現從百度摘取搜索結果

編輯：C#入門知識

百度不使用xhtml，這樣使得.NET原有的XML功能就不是那麼好用了。

（而且，誰會真正喜歡DOM呢？用起來多累人啊！）

不過百度的頁面很不規則，所以迫不得已使用了大量的硬編碼。

因此，這個程序對百度的頁面設計做了相當多的假設，無法很好的適應百度的頁面結構在未來的改變。

還好這種小程序寫起來輕松，所以沒事改一改也沒事。

另外這個程序使用了大量的正則表達式，這可能會使得它在效率上不適合於用來整合各個搜索引擎的結果。

如果需要在一個頁面同時展示幾個搜索引擎的結果，我建議使用iframe標簽，或者呢，就是讓後台把網頁通過ajax發給前台，然後在前台用js產生頁面。

特別注意，程序中使用了FCL中好用的url編碼的功能，因此必須額外添加對System.Web這個程序集的引用。

代碼——百度機器人

  1 using System;
  2  using System.Collections.Generic;
  3  using System.Text;
  4  using System.Text.RegularExpressions;
  5  using System.Web;
  6  using System.Net;
  7 using System.IO;
  8 namespace baiduRobotStrim
  9 {
 10     struct BaiduEntry
 11     {
 12         public string title, brief, link;
 13     }
 14     class Program
 15     {
 16         static string GetHtml(string keyword)
 17         {
 18             string url = @"http://www.baidu.com/";
 19             string encodedKeyword = HttpUtility.UrlEncode(keyword, Encoding.GetEncoding(936));
 20             //百度使用codepage 936字符編碼來作為查詢串，果然專注於中文搜索……
 21             //更不用說，還很喜歡微軟
 22             //谷歌能正確識別UTF-8編碼和codepage這兩種情況，不過本身網頁在HTTP頭裡標明是UTF-8的
 23             //估計谷歌也不討厭微軟（以及微軟的專有規范）
 24             string query = "s?wd=" + encodedKeyword;
 25 
 26             HttpWebRequest req;
 27             HttpWebResponse response;
 28             Stream stream;
 29             req = (HttpWebRequest)WebRequest.Create(url + query);
 30             response = (HttpWebResponse)req.GetResponse();
 31             stream = response.GetResponseStream();
 32             int count = 0