用java做抓取的時候免不了要用到多線程的了,因為要同時抓取多個網站或一條線程抓取一個網站的話實在太慢,而且有時一條線程抓取同一個網站的話也比較浪費CPU資源。要用到多線程的等方面,也就免不了對線程的控制或用到線程池。 我在做我們現在的那一個抓取框架的時候,就曾經用過java.util.concurrent.ExecutorService作為線程池,關於ExecutorService的使用代碼大概如下:
java.util.concurrent.Executors類的API提供大量創建連接池的靜態方法:
1.固定大小的線程池:
package BackStage; import java.util.concurrent.Executors; import java.util.concurrent.ExecutorService; public class JavaThreadPool { public static void main(String[] args) { // 創建一個可重用固定線程數的線程池 ExecutorService pool = Executors.newFixedThreadPool(2); // 創建實現了Runnable接口對象,Thread對象當然也實現了Runnable接口 Thread t1 = new MyThread(); Thread t2 = new MyThread(); Thread t3 = new MyThread(); Thread t4 = new MyThread(); Thread t5 = new MyThread(); // 將線程放入池中進行執行 pool.execute(t1); pool.execute(t2); pool.execute(t3); pool.execute(t4); pool.execute(t5); // 關閉線程池 pool.shutdown(); } } class MyThread extends Thread { @Override public void run() { System.out.println(Thread.currentThread().getName() + "正在執行。。。"); } }
後來發現ExecutorService的功能沒有想像中的那麼好,而且最多只是提供一個線程的容器而然,所以後來我用改用了java.lang.ThreadGroup,ThreadGroup有很多優勢,最重要的一點就是它可以對線程進行遍歷,知道那些線程已經運行完畢,還有那些線程在運行。關於ThreadGroup的使用代碼如下:
<span style="font-size:18px;"><span style="background-color: rgb(255, 255, 255);">class MyThread extends Thread { boolean stopped; MyThread(ThreadGroup tg, String name) { super(tg, name); stopped = false; } public void run() { System.out.println(Thread.currentThread().getName() + " starting."); try { for (int i = 1; i < 1000; i++) { System.out.print("."); Thread.sleep(250); synchronized (this) { if (stopped) break; } } } catch (Exception exc) { System.out.println(Thread.currentThread().getName() + " interrupted."); } System.out.println(Thread.currentThread().getName() + " exiting."); } synchronized void myStop() { stopped = true; } } public class Main { public static void main(String args[]) throws Exception { ThreadGroup tg = new ThreadGroup("My Group"); MyThread thrd = new MyThread(tg, "MyThread #1"); MyThread thrd2 = new MyThread(tg, "MyThread #2"); MyThread thrd3 = new MyThread(tg, "MyThread #3"); thrd.start(); thrd2.start(); thrd3.start(); Thread.sleep(1000); System.out.println(tg.activeCount() + " threads in thread group."); Thread thrds[] = new Thread[tg.activeCount()]; tg.enumerate(thrds); for (Thread t : thrds) System.out.println(t.getName()); thrd.myStop(); Thread.sleep(1000); System.out.println(tg.activeCount() + " threads in tg."); tg.interrupt(); } }</span></span>
由以上的代碼可以看出:ThreadGroup比ExecutorService多以下幾個優勢
1.ThreadGroup可以遍歷線程,知道那些線程已經運行完畢,那些還在運行
2.可以通過ThreadGroup.activeCount知道有多少線程從而可以控制插入的線程數
本文來自博客園:http://www.cnblogs.com/yy2011/archive/2011/05/05/2037564.html