您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Highly recommended! Python this treasure house re regular matching

編輯：Python

Python Of re modular （Regular Expression Regular expressions ） Provides a variety of regular expression matching operations .

In text analysis 、 Complex string analysis and information extraction is a very useful tool , The following is a summary re Common methods of modules .

One . Predefined characters

\d Match all decimal numbers 0-9
\D Match all non numbers , Include underscores
\s Match all white space characters （ Space 、TAB etc. ）
\S Match all non white space characters , Include underscores
\w Match all letters 、 Chinese characters 、 Numbers a-z A-Z 0-9
\W Match all non letters 、 Chinese characters 、 Numbers , Include underscores

Two . Special characters

$： Match the end of a line （ Must be placed at the end of the regular expression ）
^： Match the beginning of a line （ Must be placed at the top of the regular expression ）
*： The preceding characters can appear 0 Times or times （0~ Infinite ）（ Greedy matching ）
+： The preceding characters can appear 1 Times or times （1~ Infinite ）（ Greedy matching ）
？： change " Greedy mode " by " Reluctantly mode ", The preceding characters can appear 0 Time or 1 Time （ Non greedy matching ）
remarks ： Symbol .* greedy , Symbol .*? Not greed
.： Match except for line breaks "\n" Any single character other than
|： Both items are matched
[ ]： Represents a collection , There are three situations
[abc]： Can match a single character
[a-z0-9]： Can match a specified range of characters , Desirable reverse （ Join at the front ^）
[2-9] [1-3]： Can do combination matching
{ }： Used to mark the frequency of the preceding character , There are the following situations ：
{n,m}： Represents that the preceding characters appear at least n Time , Most appear m Time
{n,}： Represents that the preceding characters appear at least n Time , Unlimited at most
{,m}： Represents that the preceding characters appear at most n Time , At least unlimited
{n}： The preceding character must appear n Time

3、 ... and . Backslash description

If there is a backslash in the string , You need to escape the backslash

Four . grouping

()： Grouping characters , You can group the matched content , Get the data in the group quickly In regular "()" It means grouping , A bracket represents a grouping , You can only match "()" The content in .
group： Used to view the content matched by the specified group
groups： Returns a tuple , All matched contents in the Group
groupdict： Return a dictionary , Contains grouped key value pairs , You need to name the Group

5、 ... and . Common methods

match： Match at the beginning of the target text
search： Match in the entire target text
findall： Scan the entire target text , Returns a list of all substrings that match the rule , If there is no match, return an empty list
split
re.split(pattern, string[, maxsplit=0, flags=0])
split(string[, maxsplit=0])
effect ： You can cut the part of the string matching the regular expression and return a list

6、 ... and . Regular expression function inside flags Parameter description

flags The definition includes ：

re.I： Ignore case
re.L： Represents a special character set \w, \W, \b, \B, \s, \S Depends on the current environment
re.M： Multi line mode
re.S：’.’ And any character including line breaks （ Be careful ：’.’ Does not include line breaks ）
re.U： Represents a special character set \w, \W, \b, \B, \d, \D, \s, \S Depend on Unicode Character property database
stay Python Before using regular expressions in , First use the following command to import re modular

import re

Example 1： Specific instructions

 for example ：
‘(\d)(a)\1’ Express ： The first match is numbers , The second is characters a, Third \1 Must match the first same number and repeat , That is, it is quoted once .
Such as “9a9” Matched , but “9a8” Will not be matched , Because the third \1 Must be 9 Can only be .
‘(\d)(a)\2’ Express ： The first match is a number , The second is a, Third \2 It must be the second group （） Match the same .
Such as “8aa” Matched , but “8ab”,“7a7” Will not be matched , The third digit must be a copy of the second group of characters , It refers to the second set of regular matching content .

print(re.match(r'(\w{3}).',"abceeeabc456abc789").group())
print(re.match(r'(\w{3}).*',"abceeeabc456abc789").group())#* Greedy matching 
print(re.match(r'(\w{3}).*?',"abceeeabc456abc789").group())#? Non greedy matching 
print(re.search(r'(\d{3})',"abceeeabc456abc789").group())
print(re.search(r'(\w{3})(\d+)(\1)',"abceeeabc456abc789abc").groups())
print(re.search(r'(\w{3})(\d+)(\1)',"abceeeabc456abc789abc").group(1))
print(re.search(r'(\w{3})(\d+)(\1)',"abceeeabc456abc789abc").group(2))
print(re.search(r'(\w{3})(\d+)(\1)',"abceeeabc456abc789abc").group(3))
print(re.search(r'(\w{3})(\d+)(\2)',"abceeeabcs456456abc456789abc").groups())
print(re.search(r'(\w{3})(\d+)(\2)',"abceeeabcs456456abc456789abc").group(1))
print(re.search(r'(\w{3}).*?(\1)',"abceeeabc456abc789abc").group(1))
print(re.search(r'(\w{3}).*?(\1)',"abceeeabc456abc789abc").group(2))
print(re.search(r'(\w{3})(.*?)(\2)',"abceeeabc456abc789").group())
print(re.search(r'(\w{3}).*?(\1)',"abceeeabc456abc789").group(1,2))
print(re.findall(r'\d+','one11two22three33four44'))
print(re.split(r'\W+','192.168.1.1')) #\W Match all non letters 、 Chinese characters 、 Numbers , Include underscores , Then the processing is completed and a list is returned 
print(re.split(r'(\W+)','192.168.1.1')) # After adding parentheses, we do grouping ,. The number is also divided 
print(re.split(r'(\W+)','192.168.1.1',1)) # Added a 1 After this parameter , Indicates that the maximum segmentation depth is 1
str1 = '''goodjobisgood: testisgood welldone '''
res1 = re.findall(r'good(.*?)done',str1)

 If not used re.S Parameters , Match only within each line , If a line doesn't have , Just change the line and start over , Not across lines .
While using re.S After the parameters , Regular expressions take this string as a whole , take “\n” Add to this string as a normal character , Match in the whole .

res2 = re.findall(r'good(.*?)done',str1,re.S)
print(res1)
print(res2)

Example 2： Web page information matching

str1 = '<p>this is a herf<a href="www.baidu.com">goodjob</a></p>'
find = re.search('<a href="(.+)">(\w+)</a>', str1)
find = re.search('<a href="(?P<url>.+)">(?P<name>\w+)</a>', str1)
print(find.groups())
print(find.group(1))
print(find.group(2))
print(find.groupdict())

Example 3： Date match

date1=input(" Please enter the date :")
result1=re.match(r'^(\d{4}-\d{1,2}-\d{1,2})$',date1)
print(result1.group())

Example 4： Regular mailbox matching

re_email = r'^[a-zA-Z0-9_]{0,20}@(163|162|Gmail|yahoo)\.com'
email_address = input(' Please enter email address ')
res = re.search(re_email, email_address)
print(res)
print(email_address)
print(type(res))
print(res.group())

Example 5： Cell phone number matches

phone=input(" Please enter your mobile number :")
result2=re.match(r'1[35678]\d{9}',phone)
print(result2.group())

Welcome to your attention : The way of immeasurable testing official account , reply : Claim resources

Python+Unittest frame API automation 、

Python+Pytest frame API automation 、

Python+Pandas+Pyecharts Big data analysis 、

Python+Selenium frame Web Of UI automation 、

Python+Appium frame APP Of UI automation 、

Python Programming learning resources dry goods 、

Vue Front end component framework development 、

Resources and code Free ~

contain ： Data analysis 、 big data 、 machine learning 、 Test Development 、API Interface automation 、 Test operation and maintenance 、UI automation 、 Performance testing 、 code detection 、 Programming technology, etc .

WeChat search official account : The way of immeasurable testing , Add the attention , Let's grow together !