程式師世界 >> 編程語言 >> .NET網頁編程 >> C# >> C#入門知識 >> 正則表達式入門及備忘，正則表達式備忘

正則表達式入門及備忘，正則表達式備忘

編輯：C#入門知識

正則表達式入門及備忘，正則表達式備忘

概述

正則表達式，主要是用符號描述了一類特定的文本（模式）。而正則表達式引擎則負責在給定的字符串中，查找到這一特定的文本。

本文主要是列出常用的正則表達式符號，加以歸類說明。本文僅僅是快速理解了正則表達式相關元字符，作一個備忘，供以後理解更復雜表達式的參考，以後關於正則表達式的相關內容會持續更新本文。示例語言用C#

概述

普通字符

字符集合

速記的字符集合

指定重復次數的字符

匹配位置字符

分支替換字符

匹配特殊字符

組，反向引用，非捕獲組

貪婪與非貪婪

回溯與非回溯

正向預搜索、反向預搜索

最後

1 普通字符

最簡單的一種文本描述，就是直接給出要匹配內容。如要在”Generic specialization, the decorator pattern, chains of responsibilities, and extensible software.” 找到pattern，那麼正則式就直接是”heels”即可

string input = "Generic specialization, the decorator pattern, chains of responsibilities, and extensible software."; Regex reg = new Regex("pattern", RegexOptions.IgnoreCase); Console.WriteLine(reg.Matches(input).Count); //output 1 View Code

2 字符集合

將字符放在中括號中，即為字符集合。通過字符集合告訴正則式引擎從字符集合中的字符，僅匹配出一個字符。

字符匹配的字符示例 [...] 匹配括號中的任一字符 [abc]可以匹配單個字符a,b或c,但不能匹配其它字符 [^...] 匹配非括號中的任一字符 [^abc]可以匹配任一個除a,b,c的一個字符，如d,e,f

比如單詞灰色gray(英)和grey(美)，在一段文本中匹配出gray或grey,那麼通過正則式gr[ae]y 就可以了;又比如要在一段文本中找到me和my,正則式是m[ey]

我們還可以在字符集合中使用連字號 – 來表示一個范圍，比如 [0-9] 表示匹配一個0到9數字；[a-zA-Z] 表示匹配英文字母；[0-9 a-zA-Z]表示匹配一個0到9的數字或英文字母

string input = "The color of shirt is gray and color of shoes is grey too."; Regex reg = new Regex("gr[ae]y", RegexOptions.IgnoreCase); Console.WriteLine(reg.Matches(input).Count); //output 2 var matchs = reg.Matches(input); foreach (Match match in matchs) { Console.WriteLine(match.Value);//output gray grey } View Code

3 速記的字符集合

我們常常要匹配一個數字，一個字母，一個空白符，雖然可以用普通的字符類來表示，但不夠方便，所以正則式提示了一些常用的字符集合的速記符

字符匹配的字符示例 \d 從0到9的任何一個數字 \d\d 可以匹配72，但不能匹配me或7a \D 非數字符 \D\D 可以匹配me，但不能匹配7a或72 \w 任一個單詞字符，如A-Z, a-z, 0-9和下劃線字符 \w\w\w\w可以匹配aB_2，但不能匹配ab_@ \W 非單詞字符 \W 可以匹配@，但不能匹配a \s 任一空白字符，包括了制表符，換行符，回車符，換頁符和垂直制表符匹配所有傳統的空白字符 \S 任一非空白字符 \S 可以匹配任一非空白字符，如~！@#& . 任一字符匹配任一字符，換行符除外

string input = "1024 hello world&%"; Regex reg1 = new Regex(@"\d\d\d\d"); if (reg1.IsMatch(input)) { Console.WriteLine(reg1.Match(input).Value);//output 1024 } Regex reg2 = new Regex(@"\W\W"); if (reg2.IsMatch(input)) { Console.WriteLine(reg2.Match(input).Value);//output &% } View Code

4 指定重復次數的字符

指定重復匹配前面的字符多少次：匹配重復的次數，不匹配內容。比如說，要在一系列電話號碼中找到以158開始的11位手機號，如果我們沒有學過下面的內容，正則表達式為158\d\d\d\d\d\d\d\d；但如果我們學習了下面的知識，則正則表達式為158\d{8}

字符匹配的字符示例 {n} 匹配前面字符n次 x{2},可以匹配xx,但不能匹配xxx {n,} 匹配前面字符n次或更多 x{2,}可以2個或更多的x,比如可以匹配xx,xxx,xxxx,xxxxxx {n,m} 匹配前面字符最少n次，最多m次。如果n為0，則可以不指定 x{2,4},匹配了xx,xxx,xxxx,但不能匹配x,xxxxx ? 匹配前面的字符0次或1次,相當於{0，1} x? 匹配x或空 + 匹配前面的字符1次或多次, 相當於{1,} x+ 匹配x,xx，或xxx * 匹配前面的字符0次或多次 x* 匹配0個或多個x

string input = "my phone number is 15861327445, please call me sometime."; Regex reg1 = new Regex(@"158\d{8}");//匹配以158為開頭的11位手機號 if (reg1.IsMatch(input)) { Console.WriteLine(reg1.Match(input).Value);//output 15861327445 } string input2 = "November is the 11 month of the year, you can use Nov for short."; Regex reg2 = new Regex(@"Nov(ember)?");//匹配Nov 或者November var matchs = reg2.Matches(input2); foreach (Match match in matchs) { Console.WriteLine(match.Value);//output November Nov } string input3 = "1000, 100, 2003, 9999,10000"; Regex reg3 = new Regex(@"\b[1-9]\d{3}\b");//匹配1000到9999的數字 var matchs3 = reg3.Matches(input3); foreach (Match match in matchs3) { Console.WriteLine(match.Value);//output 1000 2003 9999 其中\b是指匹配單詞的邊界 } string input4 = "1000, 100, 2003, 9999,10000,99,10,1, 99999"; Regex reg4 = new Regex(@"\b[1-9][0-9]{2,4}\b");//匹配100到99999的數字 var matchs4 = reg4.Matches(input4); foreach (Match match in matchs4) { Console.WriteLine(match.Value);//output 1000 100 2003 9999 10000 99999 其中\b是指匹配單詞的邊界 } View Code

5 匹配位置的字符

現在我們已經學會了使用字符集合，字符集合的速記符來匹配大部分的文本了。但是如果我們遇到以下的情況，怎麼辦？

要求匹配文本的第一個單詞為google

要求匹配文本以bye結束

要求匹配文本每一行的第一個單詞為數字

要求匹配一個單詞以hel開頭

上面的這種匹配一個位置，但不匹配任何內容的需求很正常。在正則表達式中也提供了一些特殊的字符來匹配位置(不匹配內容)。如用^匹配文本的開始位置，用$匹配文本的結束位置，\b匹配一個單詞的邊界

字符匹配的字符示例 ^ 其後的模式必須在字符串的開始處，如果是一個多行字符串，應位於任一行的開始。對於多行文本（有回車符），需要設置Multiline標識　 $ 前面的模式必須在字符串的結尾處，如果是一個多行字符串，應該在任一行的末尾　 \b 匹配一個單詞的邊界，　 \B 匹配一個非單詞的邊界，並不在一個單詞的開始或結尾處　 \A 前面的模式必須在字符串的開始，並忽略多行標識　 \z 前面的模式必須在字符串的末尾，並忽略多行標識　 \Z 前面的模式必須在字符串的末尾，或是位於換行符前　

string input1 = "the color of shirt is gray and the color of shoes is grey too."; Regex reg1 = new Regex(@"^the");//匹配 var matchs1 = reg1.Matches(input1); foreach (Match match in matchs1) { Console.WriteLine(string.Format("the value is:{0}; and the index is:{1}", match.Value, match.Index));//output the value is:the; and the index is:0 } string input2 = "the color of shirt shirts is gray and the color of shoes is grey too."; Regex reg2 = new Regex(@"\b\w*irt\b");//匹配shirt單詞,但不會匹配到shirts var matchs2 = reg2.Matches(input2); foreach (Match match in matchs2) { Console.WriteLine(string.Format("the value is:{0}; and the index is:{1}", match.Value, match.Index));//output the value is:shirt; and the index is:13 } View Code

6 分支替換字符

在字符集合中，我們可以在用中括號來指定匹配中括號中的任一字符，即模式中可以列出多種字符情景，被匹配的文本只要有符合其中的任一情景就可以被匹配出來。那有沒有這樣的一種機制，同一個正則式中有多個模式，只有滿足其中的任一模式就可以被匹配出來。再配合分組，就可以把一個復雜的正則式分成多個相對簡單子的正則式來做。類似於邏輯符號OR的意思吧。

字符匹配的字符示例 | 選擇匹配符，匹配前面或後面的任一模式 cat|mouse 可以匹配出cat 或mouse

string input1 = "color: blue, grey, gray, white, black"; Regex reg1 = new Regex(@"(grey)|(gray)");//匹配grey和gray var matchs1 = reg1.Matches(input1); foreach (Match match in matchs1) { Console.WriteLine(match.Value);//output grey, gray } View Code

7 匹配特殊字符

到現在，我們已經知道了字符集合，一些速記的字符集合，匹配位置的字符，指定匹配次數的字符，分支匹配。我們用的這些符號，在正則表達式中代表了各種特定的意義。那當我們要匹配這些字符本身，我們應該怎麼辦？在特殊字符前加上\, 以下是一些常用特殊字符的轉義字符的列表

字符匹配的字符示例 \\ 匹配字符\ 　 \. 匹配字符. 　 \* 匹配字符* 　 \+ 匹配字符+ 　 \? 匹配字符? 　 \| 匹配字符| 　 $ 匹配字符( 　 $ 匹配字符) 　 \{ 匹配字符{ 　 \} 匹配字符} 　 \^ 匹配字符^ 　 \$ 匹配字符$ 　 \n 匹配換行符　 \r 匹配回車符　 \t 匹配tab鍵　 \f 匹配垂直制表符　 \nnn 匹配一個三位八進制數指定的ASCII字符，如\103匹配一個大寫的C 　 \xnn 匹配一個二位十六進制數指定的ASCII字符，如\x43匹配C 　 \xnnnn 匹配一個四位十六進制數指定的unicode字符　 \cV 匹配一個控制字符，如，\cV匹配Ctrl+V 　

string input1 = "2.5+1.5=4"; Regex reg1 = new Regex(@"2\.5\+1\.5=4");//其中.和+在正則式中都是特殊字符，如果想匹配這些特殊字符本身的含義，那麼在前面加上一個\ var matchs1 = reg1.Matches(input1); foreach (Match match in matchs1) { Console.WriteLine(match.Value);//output 2.5+1.5=4 } View Code

8 組，反向引用，非捕獲組

組，可以用圓括號，將正則表達式的部分括起來並獨立使用，在圓括號之間的正則式叫做一個組。可以將匹配次數的字符和分支匹配字符應用到組。

1 示例: public void Set, public void SetValue

正則式Set(Value)? , 其中(Value)是一個組，匹配次數的字符?將應用於整個組(Value)，可以匹配到Set或SetValue

2 示例：Out of sight, out of mind

正則式： “(out of) sight, \1 mind”

正則表達式引擎會將 “()”中匹配到的內容存儲起來，作為一個“組”，並且可以通過索引的方式進行引用。表達式中的“\1”，用於反向引用表達式中出現的第一個組。同時在c#中，也可以通過組來訪問捕獲到的組的內容。注意，Groups[0]是整個匹配的字符串，組的內容從索引1開始

string input1 = "out of sight, out of mind"; Regex reg1 = new Regex(@"(out of) sight, \1 mind"); if (reg1.IsMatch(input1)) { Console.WriteLine(reg1.Match(input1).Value);//output out of sight, out of mind } Console.WriteLine(reg1.Match(input1).Groups[1].Value); // output out of View Code

3 可根據組名進行索引。使用以下格式為標識一個組的名稱(?<groupname>…)

正則式： “(?<Group1>out of) sight, \1 mind”

string input2 = "out of sight, out of mind"; Regex reg2 = new Regex(@"(?<Group1>out of) sight, \1 mind"); if (reg2.IsMatch(input2)) { Console.WriteLine(reg2.Match(input2).Value);//output out of sight, out of mind } Console.WriteLine(reg2.Match(input2).Groups["Group1"].Value); // output out of View Code

4在表達式外引用，對於索外用$索引，或組名用${組名}

示例：Out of of sight, out of mind

正則式 “(?<Group1>[a-z]+) \1”

string input3 = "out of sight, out out of mind mind"; Regex reg3 = new Regex(@"(?<Group1>[a-z]+) \1");//匹配重復的單詞 if(reg3.IsMatch(input3)) { Console.WriteLine(reg3.Replace(input3, "$1"));//output out of sight, out of mind Console.WriteLine(reg3.Replace(input3, "${Group1}"));//output out of sight, out of mind } View Code

5非捕獲組，在組前加上?: 因為有的組表達的僅僅是一個選擇替換，當我們不想用浪費存儲時，以用不捕獲該組

"(?:out of) sight"

string input4 = "out of sight, out of mind"; Regex reg4 = new Regex(@"(?:out of) sight");//使用了非捕獲?:後，在表達式內就不能使用\1去引用了 if (reg4.IsMatch(input4)) { Console.WriteLine(reg4.Match(input4).Groups[1].Value); // output 空 } View Code

字符匹配的字符示例 (?<groupname>exp) 匹配exp,並捕獲文本到名稱為name的組裡　 (?:exp) 匹配exp,不捕獲匹配的文本，也不給此分組分配組號　

9 貪婪與非貪婪

正則表達式的引擎默認是貪婪，只要模式允許，它將匹配盡可能多的字符。通過在“重復描述字符（*,+等）”後面添加“?”，可以將匹配模式改成非貪婪。貪婪與非貪婪與指定重復次數的字符的內容密切相關。

字符匹配的字符示例 ? 如果是跟在量詞（即指定匹配次數的字符)後面,那麼正則表達式則采用非貪婪模式

示例 out of sight, out of mind

貪婪正則式 : .* of 輸出out of sight, out of

非貪婪正則式 : .*? of 輸出 out of

另外一個示例

輸入：The title of cnblog is <h1>code changes the world</h1>

目標：匹配HTML標記

正則式1:<.+>

正則式1的輸出: <h1>code changes the world</h1>

正則式2:<.+?>

正則式2的輸出: <h1> </h1>

string input1 = "out of sight, out of mind"; Regex reg1 = new Regex(@".* of");//默認貪婪模式，盡可能的匹配更多的文本。在遇到第一個of時，正則引擎並沒有停止下來，繼續執行以期望後面還有一個of，這樣就可以匹配到更多的文本。如果後面沒有匹配到新的of，剛執行回溯 if (reg1.IsMatch(input1)) { Console.WriteLine(reg1.Match(input1).Value);//output out of sight, out of } string input2 = "out of sight, out of mind"; Regex reg2 = new Regex(@".*? of");//在指定重復次數的字符後面加上?,正則式則為非貪婪模式。一但遇到第一個符合條件的文本，就匹配結束 if (reg2.IsMatch(input2)) { Console.WriteLine(reg2.Match(input2).Value);//output out of } string input3 = "The title of cnblog is <h1>code changes the world</h1>"; Regex reg3 = new Regex(@"<.+>");//貪婪模式:匹配HTML標記 var matchs3 = reg3.Matches(input3); foreach (Match match in matchs3) { Console.WriteLine(match.Value);//output <h1>code changes the world</h1> } string input4 = "The title of cnblog is <h1>code changes the world</h1>"; Regex reg4 = new Regex(@"<.+?>");//非貪婪模式:匹配HTML標記 var matchs4 = reg4.Matches(input4); foreach (Match match in matchs4) { Console.WriteLine(match.Value);//output <h1> </h1> } View Code

10 回溯與非回溯

使用“(?>…)”方式進行非回溯聲明。由於正則表達式引擎的貪婪特性，導致它在某些情況下，將進行回溯以獲得匹配，請看下面的示例：

示例：Live for nothing, die for something

正則式(默認非回溯): “.*thing,” 輸出Live for nothing, 。“.*”由於其貪婪特性，將一直匹配到字符串的最後，隨後匹配“thing”，但在匹配“,”時失敗，此時引擎將回溯，並在“thing,”處匹配成功

正則式(回溯):”(?>.*)thing,” 匹配不到任何東西。由於強制非回溯，所以整個表達式匹配失敗

string input1 = "Live for nothing, die for something"; Regex reg1 = new Regex(@"(.*)thing,"); var matchs1 = reg1.Matches(input1); foreach (Match match in matchs1) { Console.WriteLine(match.Value);//output } Regex reg2= new Regex(@"(?>.*)thing,"); var matchs2 = reg2.Matches(input1); foreach (Match match in matchs2) { Console.WriteLine(match.Value);//output } View Code

字符匹配的字符示例 (?>...) 匹配組內表達式時，不回溯　

11 正向預搜索、反向預搜索(look around)

匹配特定的模式，並聲明前面或後面的內容。意思跟匹配位置差不多

字符匹配的字符示例 (?=exp) 左邊的模式後面必須緊跟著exp，聲明本身不作為匹配結果的一部分　 (?!exp) 左邊的模式的後面不能緊跟著exp，聲明本身不作為匹配結果的一部分　 (?<=exp) 右邊的模式的前面必須是exp，聲明本身不作為匹配結果的一部分　 (?<!exp) 右邊的模式的前面不能是exp，聲明本身不作為匹配結果的一部分　

string input1 = "hello 1024 world 8080 bye"; Regex reg1 = new Regex(@"\d{4}(?= world)"); if (reg1.IsMatch(input1)) { Console.WriteLine(reg1.Match(input1).Value);//output 1024 } Regex reg2 = new Regex(@"\d{4}(?! world)"); if (reg2.IsMatch(input1)) { Console.WriteLine(reg2.Match(input1).Value);//output 8080 } Regex reg3 = new Regex(@"(?<=world )\d{4}"); if (reg3.IsMatch(input1)) { Console.WriteLine(reg3.Match(input1).Value);//output 8080 } Regex reg4 = new Regex(@"(?<!world )\d{4}"); if (reg4.IsMatch(input1)) { Console.WriteLine(reg4.Match(input1).Value);//output 1024 } View Code