程式師世界 >> 編程語言 >> 更多編程語言 >> Python >> Python模塊學習之re(正則表達式)

Python模塊學習之re(正則表達式)

編輯：Python

re.match
re.match 嘗試從字符串的開始匹配一個模式

import re
text = "My name is Oliver Queen.For five years I was stranded on an island
with only one goal,--survive"

m = re.match(r'(\w+)\s',text)

if m:
print m.group(0)
print m.groups()

else:
print u"沒有匹配項"
#output#
My My
('My',)
#如果匹配成功返回一個match object ，否則返回一個None
#r+字符串表示Python的原生字符串，不會被轉義，常用字正則中
#如果匹配'\\'，可以r'\'
#group()獲得一個或多個分組截獲的字符串;默認是group(0)
#groups()以元組形式返回全部分組截獲的字符串

re.search
re.search函數會在字符串內查找模式匹配,只到找到第一個匹配然後返回，如果字符串沒有匹配，則返回None。

import re
text = "My name is Oliver Queen.For five years I was stranded on an island
with only one goal,--survive"

m = re.search(r'str(\w+)ded\s',text)

if m:
print m.group(0)
print m.groups()

else:
print u"沒有匹配項"

#output#
stranded
('an',)
#這裡如果換成match的話，會匹配不成功
#re.match與re.search的區別：re.match只匹配字符串的開始，如果字符串開始不符合正則表達式，則匹配失敗，函數返回None
#re.search會匹配整個字符串，直到找到一個匹配。

re.sub
re.sub(pattern, repl, string, count)
使用repl替換string中每一個匹配的子串後返回替換後的字符串。
當repl是一個字符串時，可以使用\id或\g、\g引用分組，但不能使用編號0。
當repl是一個方法時，這個方法應當只接受一個參數（Match對象），並返回一個字符串用於替換（返回的字符串中不能再引用分組）。
count用於指定最多替換次數，不指定時全部替換。
eg 將字符串中”替換為’-’

import re
text = "My name is Oliver Queen.For five years I was stranded on an island with only one goal,--survive"

m = re.sub(r'\s','-',text)
print m
#output#
My-name-is-Oliver-Queen.For-five-years-I-was-stranded-on-an-island-with-only-one
-goal,--survive

re.split
split(string[, maxsplit]) | re.split(pattern, string[, maxsplit]):
按照能夠匹配的子串將string分割後返回列表。
maxsplit用於指定最大分割次數，不指定將全部分割。

import re
text = "My name is Oliver Queen.For five years I was stranded on an island with only one goal,--survive"

m = re.split(r'\s',text)
print m

re.findall
findall(string[, pos[, endpos]]) | re.findall(pattern, string[, flags]):
搜索string，以列表形式返回全部能匹配的子串。

import re
text = "My name is Oliver Queen.For five years I was stranded on an island with only one goal,--survive"

m = re.findall(r'(\w*an\w*)',text)
print m

re.compile
re.compile(strPattern[, flag]):
可以把正則表達式編譯成一個正則表達式對象。可以把那些經常使用的正則表達式編譯成正則表達式對象，這樣可以提高一定的效率。
一般常用在 sub替換過濾特殊字符串用到
flag的可選值：
re.I(全拼：IGNORECASE): 忽略大小寫（括號內是完整寫法，下同）
re.M(全拼：MULTILINE): 多行模式，改變’^'和’$'的行為（參見上圖）
re.S(全拼：DOTALL): 點任意匹配模式，改變’.'的行為
re.L(全拼：LOCALE): 使預定字符類 \w \W \b \B \s \S 取決於當前區域設定
re.U(全拼：UNICODE): 使預定字符類 \w \W \b \B \s \S \d \D 取決於unicode定義的字符屬性
re.X(全拼：VERBOSE): 詳細模式。這個模式下正則表達式可以是多行，忽略空白字符，並可以加入注釋。

import re

text = "My name is Oliver Queen.For five years I was stranded on an island with only one goal,--survive"
# 將正則表達式編譯成Pattern對象，注意前面的r的意思是“原生字符串”
regex = re.compile(r'\w*an\w*')
print regex.findall(text) #查找所有包含'an'的單詞
print regex.sub(lambda m: '[' + m.group(0) + ']', text) #將字符串中含有'an'的單詞用[]括起來。
#print regex.sub(r'\n', text) #an 換成回車
#output#
['stranded', 'an', 'island']
My name is Oliver Queen.For five years I was bn on bn bn with only one goal,--su
rvive