Python Of re modular (Regular Expression Regular expressions ) Provides a variety of regular expression matching operations .
In text analysis 、 Complex string analysis and information extraction is a very useful tool , The following is a summary re Common methods of modules .
\d Match all decimal numbers 0-9
\D Match all non numbers , Include underscores
\s Match all white space characters ( Space 、TAB etc. )
\S Match all non white space characters , Include underscores
\w Match all letters 、 Chinese characters 、 Numbers a-z A-Z 0-9
\W Match all non letters 、 Chinese characters 、 Numbers , Include underscores
$: Match the end of a line ( Must be placed at the end of the regular expression )
^: Match the beginning of a line ( Must be placed at the top of the regular expression )
*: The preceding characters can appear 0 Times or times (0~ Infinite )( Greedy matching )
+: The preceding characters can appear 1 Times or times (1~ Infinite )( Greedy matching )
?: change " Greedy mode " by " Reluctantly mode ", The preceding characters can appear 0 Time or 1 Time ( Non greedy matching )
remarks : Symbol .* greedy , Symbol .*? Not greed
.: Match except for line breaks "\n" Any single character other than
|: Both items are matched
[ ]: Represents a collection , There are three situations
[abc]: Can match a single character
[a-z0-9]: Can match a specified range of characters , Desirable reverse ( Join at the front ^)
[2-9] [1-3]: Can do combination matching
{ }: Used to mark the frequency of the preceding character , There are the following situations :
{n,m}: Represents that the preceding characters appear at least n Time , Most appear m Time
{n,}: Represents that the preceding characters appear at least n Time , Unlimited at most
{,m}: Represents that the preceding characters appear at most n Time , At least unlimited
{n}: The preceding character must appear n Time
If there is a backslash in the string , You need to escape the backslash
(): Grouping characters , You can group the matched content , Get the data in the group quickly In regular "()" It means grouping , A bracket represents a grouping , You can only match "()" The content in .
group: Used to view the content matched by the specified group
groups: Returns a tuple , All matched contents in the Group
groupdict: Return a dictionary , Contains grouped key value pairs , You need to name the Group
match: Match at the beginning of the target text
search: Match in the entire target text
findall: Scan the entire target text , Returns a list of all substrings that match the rule , If there is no match, return an empty list
split
re.split(pattern, string[, maxsplit=0, flags=0])
split(string[, maxsplit=0])
effect : You can cut the part of the string matching the regular expression and return a list
flags The definition includes :
re.I: Ignore case
re.L: Represents a special character set \w, \W, \b, \B, \s, \S Depends on the current environment
re.M: Multi line mode
re.S:’.’ And any character including line breaks ( Be careful :’.’ Does not include line breaks )
re.U: Represents a special character set \w, \W, \b, \B, \d, \D, \s, \S Depend on Unicode Character property database
stay Python Before using regular expressions in , First use the following command to import re modular
import re
for example :
‘(\d)(a)\1’ Express : The first match is numbers , The second is characters a, Third \1 Must match the first same number and repeat , That is, it is quoted once .
Such as “9a9” Matched , but “9a8” Will not be matched , Because the third \1 Must be 9 Can only be .
‘(\d)(a)\2’ Express : The first match is a number , The second is a, Third \2 It must be the second group () Match the same .
Such as “8aa” Matched , but “8ab”,“7a7” Will not be matched , The third digit must be a copy of the second group of characters , It refers to the second set of regular matching content .
print(re.match(r'(\w{3}).',"abceeeabc456abc789").group())
print(re.match(r'(\w{3}).*',"abceeeabc456abc789").group())#* Greedy matching
print(re.match(r'(\w{3}).*?',"abceeeabc456abc789").group())#? Non greedy matching
print(re.search(r'(\d{3})',"abceeeabc456abc789").group())
print(re.search(r'(\w{3})(\d+)(\1)',"abceeeabc456abc789abc").groups())
print(re.search(r'(\w{3})(\d+)(\1)',"abceeeabc456abc789abc").group(1))
print(re.search(r'(\w{3})(\d+)(\1)',"abceeeabc456abc789abc").group(2))
print(re.search(r'(\w{3})(\d+)(\1)',"abceeeabc456abc789abc").group(3))
print(re.search(r'(\w{3})(\d+)(\2)',"abceeeabcs456456abc456789abc").groups())
print(re.search(r'(\w{3})(\d+)(\2)',"abceeeabcs456456abc456789abc").group(1))
print(re.search(r'(\w{3}).*?(\1)',"abceeeabc456abc789abc").group(1))
print(re.search(r'(\w{3}).*?(\1)',"abceeeabc456abc789abc").group(2))
print(re.search(r'(\w{3})(.*?)(\2)',"abceeeabc456abc789").group())
print(re.search(r'(\w{3}).*?(\1)',"abceeeabc456abc789").group(1,2))
print(re.findall(r'\d+','one11two22three33four44'))
print(re.split(r'\W+','192.168.1.1')) #\W Match all non letters 、 Chinese characters 、 Numbers , Include underscores , Then the processing is completed and a list is returned
print(re.split(r'(\W+)','192.168.1.1')) # After adding parentheses, we do grouping ,. The number is also divided
print(re.split(r'(\W+)','192.168.1.1',1)) # Added a 1 After this parameter , Indicates that the maximum segmentation depth is 1
str1 = '''goodjobisgood: testisgood welldone '''
res1 = re.findall(r'good(.*?)done',str1)
If not used re.S Parameters , Match only within each line , If a line doesn't have , Just change the line and start over , Not across lines .
While using re.S After the parameters , Regular expressions take this string as a whole , take “\n” Add to this string as a normal character , Match in the whole .
res2 = re.findall(r'good(.*?)done',str1,re.S)
print(res1)
print(res2)
str1 = '<p>this is a herf<a href="www.baidu.com">goodjob</a></p>'
find = re.search('<a href="(.+)">(\w+)</a>', str1)
find = re.search('<a href="(?P<url>.+)">(?P<name>\w+)</a>', str1)
print(find.groups())
print(find.group(1))
print(find.group(2))
print(find.groupdict())
date1=input(" Please enter the date :")
result1=re.match(r'^(\d{4}-\d{1,2}-\d{1,2})$',date1)
print(result1.group())
re_email = r'^[a-zA-Z0-9_]{0,20}@(163|162|Gmail|yahoo)\.com'
email_address = input(' Please enter email address ')
res = re.search(re_email, email_address)
print(res)
print(email_address)
print(type(res))
print(res.group())
phone=input(" Please enter your mobile number :")
result2=re.match(r'1[35678]\d{9}',phone)
print(result2.group())
Welcome to your attention : The way of immeasurable testing
official account , reply : Claim resources
Python+Unittest frame API automation 、
Python+Unittest frame API automation 、
Python+Pytest frame API automation 、
Python+Pandas+Pyecharts Big data analysis 、
Python+Selenium frame Web Of UI automation 、
Python+Appium frame APP Of UI automation 、
Python Programming learning resources dry goods 、
Vue Front end component framework development 、
Resources and code Free ~
contain : Data analysis 、 big data 、 machine learning 、 Test Development 、API Interface automation 、 Test operation and maintenance 、UI automation 、 Performance testing 、 code detection 、 Programming technology, etc .
WeChat search official account : The way of immeasurable testing
, Add the attention , Let's grow together !