謝謝了!新人不知道要怎麼爬,這是老師論文中的內容,論文中寫用了11種正則來抓取
請求大家支援QAQ
jsoup
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
public static void main(String[] args) throws IOException {
Document doc = Jsoup.connect(" http://www.ccopyright.com.cn/cpcc/index.jsp")
.get();
Elements es = doc.getElementsContainingOwnText("版權所有");
System.out.println(es.html().replaceAll("<([^>]*)>", ""));
}