程式師世界 >> 編程語言 >> JAVA編程 >> 關於JAVA >> 正則表達式和Java編程語言

正則表達式和Java編程語言

編輯：關於JAVA

應用程序常常需要有文本處理功能，比如單詞查找、電子郵件確認或XML文檔集成。這通常會涉及到模式匹配。Perl、sed或awk等語言通過使用正則表達式來改善模式匹配，正則表達式是一串字符，它所定義的模式可用來查找匹配的文本。為了使用Java^TM編程語言進行模式匹配，需要使用帶有許多charAt子字串的StringTokenizer類，讀取字母或符號以便處理文本。這常常導致復雜或凌亂的代碼。

現在不一樣了。

2平台標准版（J2SE^TM）1.4版包含一個名為java.util.regex的新軟件包，使得使用正則表達式成為可能。目前的功能包括元字符的使用，它賦予正則表達式極大的靈活性

本文概括地介紹了正則表達式的使用，並詳細解釋如何利用java.util.regex軟件包來使用正則表達式，用以下常見情形作為例子：

簡單的單詞替換

電子郵件確認

從文件中刪除控制字符

查找文件

為了編譯這些例子中的代碼和在應用程序中使用正則表達式，需要安裝 J2SE 1.4版。

構造正則表達式

正則表達式是一種字符模式，它描述的是一組字符串。你可以使用java.util.regex軟件包，查找、顯示或修改輸入序列中出現的某個模式的一部分或全部。

正則表達式最簡單的形式是一個精確的字符串，比如“Java”或 “programming”。正則表達式匹配還允許你檢查一個字符串是否符合某個具體的句法形式，比如是不是一個電子郵件地址。

為了編寫正則表達式，普通字符和特殊字符都要使用：

$ ^ . * + ? [' '] .

正則表達式中出現的任何其他字符都是普通字符，除非它前面有個\。

特殊字符有著特別的用處。例如，.可匹配除了換行符之外的任意字符。與s.n這樣的正則表達式匹配的是任何三個字符的、以s開始以n結束的字符串，包括sun和son。

在正則表達式中有許多特殊字符，可以查找一行開頭的單詞，忽略大小寫或大小寫敏感的單詞，還有特殊字符可以給出一個范圍，比如a-e表示從a到e的任何字母。

使用這個新軟件包的正則表達式用法與Perl類似，所以如果你熟悉Perl中正則表達式的使用，就可以在Java語言中使用同樣的表達式語法。如果你不熟悉正則表達式，下面是一些入門的例子：

構造匹配於字符 x 字符 x \\ 反斜線字符 \0n 八進制值的字符0n (0 <= n <= 7) \0nn 八進制值的字符 0nn (0 <= n <= 7) \0mnn 八進制值的字符0mnn 0mnn (0 <= m <= 3, 0 <= n <= 7) \xhh 十六進制值的字符0xhh \uhhhh 十六進制值的字符0xhhhh \t 制表符('\u0009') \n 換行符 ('\u000A') \r 回車符 ('\u000D') \f 換頁符 ('\u000C') \a 響鈴符 ('\u0007') \e 轉義符 ('\u001B') \cx T對應於x的控制字符 x 字符類 [abc] a, b, or c (簡單類) [^abc] 除了a、b或c之外的任意字符（求反） [a-zA-Z] a到z或A到Z ，包含（范圍) [a-z-[bc]] a到z，除了b和c ： [ad-z]（減去） [a-z-[m-p]] a到z，除了m到 p： [a-lq-z] [a-z-[^def]] d, e, 或 f 預定義的字符類 . 任意字符（也許能與行終止符匹配，也許不能） \d 數字: [0-9] \D 非數字: [^0-9] \s 空格符: [ \t\n\x0B\f\r] \S 非空格符: [^\s] \w 單詞字符: [a-zA-Z_0-9] \W 非單詞字符: [^\w]

有關進一步的詳情和例子，請參閱 Pattern類的文檔。

類和方法

下面的類根據正則表達式指定的模式，與字符序列進行匹配。

Pattern類

Pattern類的實例表示以字符串形式指定的正則表達式，其語法類似於Perl所用的語法。

用字符串形式指定的正則表達式，必須先編譯成Pattern類的實例。生成的模式用於創建Matcher對象，它根據正則表達式與任意字符序列進行匹配。多個匹配器可以共享一個模式，因為它是非專屬的。

用compile方法把給定的正則表達式編譯成模式，然後用 matcher方法創建一個匹配器，這個匹配器將根據此模式對給定輸入進行匹配。pattern 方法可返回編譯這個模式所用的正則表達式。

split方法是一種方便的方法，它在與此模式匹配的位置將給定輸入序列切分開。下面的例子演示了：

/* * 用split對以逗號和/或空格分隔的輸入字符串進行切分。 */ import java.util.regex.*; public class Splitter { public static void main(String[] args) throws Exception { // Create a pattern to match breaks Pattern p = Pattern.compile("[,\\s]+"); // Split input with the pattern String[] result = p.split("one,two, three four , five"); for (int i=0; i<result.length; i++) System.out.println(result[i]); } }

Matcher類

Matcher類的實例用於根據給定的字符串序列模式，對字符序列進行匹配。使用CharSequence接口把輸入提供給匹配器，以便支持來自多種多樣輸入源的字符的匹配。

通過調用某個模式的matcher方法，從這個模式生成匹配器。匹配器創建之後，就可以用它來執行三類不同的匹配操作：

matches方法試圖根據此模式，對整個輸入序列進行匹配。

lookingAt方法試圖根據此模式，從開始處對輸入序列進行匹配。

find方法將掃描輸入序列，尋找下一個與模式匹配的地方。

這些方法都會返回一個表示成功或失敗的布爾值。如果匹配成功，通過查詢匹配器的狀態，可以獲得更多的信息

這個類還定義了用新字符串替換匹配序列的方法，這些字符串的內容如果需要的話，可以從匹配結果推算得出。

appendReplacement方法先添加字符串中從當前位置到下一個匹配位置之間的所有字符，然後添加替換值。appendTail添加的是字符串中從最後一次匹配的位置之後開始，直到結尾的部分。

例如，在字符串blahcatblahcatblah中，第一個 appendReplacement添加blahdog。第二個 appendReplacement添加blahdog，然後 appendTail添加blah，就生成了： blahdogblahdogblah。請參見示例簡單的單詞替換。

CharSequence接口

CharSequence接口為許多不同類型的字符序列提供了統一的只讀訪問。你提供要從不同來源搜索的數據。用String, StringBuffer 和CharBuffer實現CharSequence,，這樣就可以很容易地從它們那裡獲得要搜索的數據。如果這些可用數據源沒一個合適的，你可以通過實現CharSequence接口，編寫你自己的輸入源。

Regex情景范例

以下代碼范例演示了java.util.regex軟件包在各種常見情形下的用法：

簡單的單詞替換

/*
* This code writes "One dog, two dogs in the yard."
* to the standard-output stream:
*/
import java.util.regex.*;
public class Replacement {
public static void main(String[] args)
			 throws Exception {
// Create a pattern to match cat
Pattern p = Pattern.compile("cat");
// Create a matcher with an input string
Matcher m = p.matcher("one cat," +
		   " two cats in the yard");
StringBuffer sb = new StringBuffer();
boolean result = m.find();
// Loop through and create a new String
// with the replacements
while(result) {
m.appendReplacement(sb, "dog");
result = m.find();
}
// Add the last segment of input to
// the new String
m.appendTail(sb);
System.out.println(sb.toString());
}
}

電子郵件確認

以下代碼是這樣一個例子：你可以檢查一些字符是不是一個電子郵件地址。它並不是一個完整的、適用於所有可能情形的電子郵件確認程序，但是可以在需要時加上它。

/*
* Checks for invalid characters
* in email addresses
*/
public class EmailValidation {
public static void main(String[] args) 
					 throws Exception {
String input = "@sun.com";
//Checks for email addresses starting with
//inappropriate symbols like dots or @ signs.
Pattern p = Pattern.compile("^\\.|^\\@");
Matcher m = p.matcher(input);
if (m.find())
System.err.println("Email addresses don't start" +
				" with dots or @ signs.");
//Checks for email addresses that start with
//www. and prints a message if it does.
p = Pattern.compile("^www\\.");
m = p.matcher(input);
if (m.find()) {
System.out.println("Email addresses don't start" +
	" with \"www.\", only web pages do.");
}
p = Pattern.compile("[^A-Za-z0-9\\.\\@_\\-~#]+");
m = p.matcher(input);
StringBuffer sb = new StringBuffer();
boolean result = m.find();
boolean deletedIllegalChars = false;
while(result) {
deletedIllegalChars = true;
m.appendReplacement(sb, "");
result = m.find();
}
// Add the last segment of input to the new String
m.appendTail(sb);
input = sb.toString();
if (deletedIllegalChars) {
System.out.println("It contained incorrect characters" +
			   " , such as spaces or commas.");
}
}
}

從文件中刪除控制字符

/* This class removes control characters from a named
*  file.
*/
import java.util.regex.*;
import java.io.*;
public class Control {
public static void main(String[] args) 
					 throws Exception {
//Create a file object with the file name
//in the argument:
File fin = new File("fileName1");
File fout = new File("fileName2");
//Open and input and output stream
FileInputStream fis = 
			  new FileInputStream(fin);
FileOutputStream fos = 
			new FileOutputStream(fout);
BufferedReader in = new BufferedReader(
		   new InputStreamReader(fis));
BufferedWriter out = new BufferedWriter(
		  new OutputStreamWriter(fos));
// The pattern matches control characters
Pattern p = Pattern.compile("{cntrl}");
Matcher m = p.matcher("");
String aLine = null;
while((aLine = in.readLine()) != null) {
m.reset(aLine);
//Replaces control characters with an empty
//string.
String result = m.replaceAll("");
out.write(result);
out.newLine();
}
in.close();
out.close();
}
}

文件查找

/* * Prints out the comments found in a .java file. */ import java.util.regex.*; import java.io.*; import java.nio.*; import java.nio.charset.*; import java.nio.channels.*; public class CharBufferExample { public static void main(String[] args) throws Exception { // Create a pattern to match comments Pattern p = Pattern.compile("//.*$", Pattern.MULTILINE); // Get a Channel for the source file File f = new File("Replacement.java"); FileInputStream fis = new FileInputStream(f); FileChannel fc = fis.getChannel(); // Get a CharBuffer from the source file ByteBuffer bb = fc.map(FileChannel.MAP_RO, 0, (int)fc.size()); Charset cs = Charset.forName("8859_1"); CharsetDecoder cd = cs.newDecoder(); CharBuffer cb = cd.decode(bb); // Run some matches Matcher m = p.matcher(cb); while (m.find()) System.out.println("Found comment: "+m.group()); } }