Catalog
The role of regular expressions
re Module basic usage
1.match And search: Find the first match
re Module basic usage -raw
re Module basic usage -match object
re Module basic usage -findall
Regular substitution
re Module basic usage -compile
Basically regular
1. Section [] Specify the range according to the coding sequence
2. Interval negation
3. Match or
4. “.” Place holder , Express Division \n Any character other than
5. Match start and end ^,$
Shortcut
Regular repetition
1. ? Indicates matching the previous item 0 Time or 1 Time
2. * Indicates that the previous item is matched any time (0-n Time )
3. + Indicates that the previous item is matched at least once
4.{n} n Is a non negative integer . Matched definite n Time .
5.{n,} n Is a non negative integer . Match at least n Time .
6.{n,m} Indicates matching the previous item n-m Time , Least match n Times and at most m Time
Greedy mode and non greedy mode
Regular grouping
1. Capture groups
2. Reference group ( Group backward reference )
3. Non capture grouping (?:regex)
Example
4. Name groups
Regular tags are often used
Inline tag
Regular assertion
1. Zero width forward lookahead assertion
2. Zero width negative look ahead assertion
3. Zero width forward and backward assertion
4. Zero width negative backward assertion
1. Filter text ( data mining )
Specify a matching rule , To identify whether the rule is in a larger text string .
2. Validation of validity
Use regular to confirm whether the obtained data is the expected value
Advantages and disadvantages of regular expressions
• advantage : Improve work efficiency 、 Save code
• shortcoming : complex , Difficult to understand
re.search
• Find a match
• Accept a regular expression and a string , And return the first match found .
• If no match is found at all ,re.search return None
>>> import re
>>> rest=re.search(r'sanle','hello sanle')
>>> print(rest)
<_sre.SRE_Match object; span=(6, 11), match='sanle'>
>>> type(rest)
<class '_sre.SRE_Match'>
re.match • Find a match from the string header • Accept a regular expression and a string , Match from the first character of the main string , And return the first match found . • If the string doesn't start with a regular expression , The match fails ,re.match return None
>>> rest=re.match(r'sanle','hello sanle')
>>> print(rest)
None
>>> type(rest)
<class 'NoneType'>
>>> rest=re.match(r'sanle','sanle sanle hello sanle')
>>> print(rest)
<_sre.SRE_Match object; span=(0, 5), match='sanle'>
>>> type(rest)
<class '_sre.SRE_Match'>
r'sanle' Medium r It stands for raw( Original string )
• The difference between the original string and the normal string is that the original string will not \ The character is interpreted as an escape character
• Regular expressions using primitive characters are common and useful
>>> rest=re.search('\\tsanle','hello\\tsanle')
>>> print(rest)
None
>>> rest=re.search(r'\\tsanle','hello\\tsanle')
>>> print(rest)
<_sre.SRE_Match object; span=(5, 12), match='\\tsanle'>
>>> re.search('\\\\tsanle','hello\\\\tsanle')
<_sre.SRE_Match object; span=(6, 13), match='\\tsanle'>
>>> re.search(r'\\\\tsanle','hello\\\\tsanle')
<_sre.SRE_Match object; span=(5, 13), match='\\\\tsanle'>
match.group(default=0): Returns the matching string .
• group This is because regular expressions can be divided into multiple subgroups that only call out matching subsets .
• 0 Is the default parameter , Represents the entire string of matches ,n It means the first one n A minute
match.start()
• start Method provides the index of the start of the match in the original string
match.end()
• end Method provides the index of the start of the match in the original string
match.groups()
• groups Returns a tuple containing all the group strings , from 1 To Group number included
>>> msg="It's rainning cats and dogs"
>>> match=re.search(r'cats',msg)
>>> print(match)
<_sre.SRE_Match object; span=(14, 18), match='cats'>
>>> print(match.group())
cats
>>> print(match.start())
14
>>> print(match.end())
18
>>> print(match.groups())
()
findall and finditer: Multiple matches found
re.findall
• Find and return a matching string , Return a list
re.finditer
• Find and return a matching string , Returns an iterator
>>> rest=re.findall(r'sanle','hello sanle sanlee sanlee')
>>> print(rest)
['sanle', 'sanle', 'sanle']
>>> msg="It's rainning cats and dogs"
>>> re.findall('a',msg)
['a', 'a', 'a']
>>> re.finditer('a',msg)
<callable_iterator object at 0x7f06f13bc5f8>
# msg="aaaaaa"
# result=re.finditer("a",msg)
# for i in result:
# print(i)
# print(i.group())
re.sub(' Matches a regular ',' replace content ','string')
• take string Replace the matching content in with the new content
print(re.sub("python","Python","I am learning python3"))
print(re.sub("python","Python","I am learning python3 python"))
Features of compiling regular :
• Complex regular reusable .
• It is more convenient to use compiled regular , The parameter is omitted .
• re The module caches its improvised regular expressions , So in most cases , Use compile Not very big Performance advantages
msg1="hello world"
msg2="i am learning python"
msg3="sanle"
print(re.findall("python",msg1))
print(re.findall("python",msg2))
print(re.findall("python",msg3))
reg = re.compile("python") # Compile regular expressions into objects
print(reg.findall(msg1))
print(reg.findall(msg2))
print(reg.findall(msg3))
ret1=re.findall("python","Python on python")
print(ret1)
ret2=re.findall("[Pp]ython","Python on python")
print(ret2)
ret3=re.findall("[A-Za-z0-9-]","abc123ABCD--")
print(ret3)
ret4=re.findall("[a-zA-Z0-9-]","abc123ABCD--")
print(ret4)
ret5=re.findall("[A-z0-9\-]","abc123ABCD--\\")
print(ret5)
The output is as follows
['python']
['Python', 'python']
['a', 'b', 'c', '1', '2', '3', 'A', 'B', 'C', 'D', '-', '-']
['a', 'b', 'c', '1', '2', '3', 'A', 'B', 'C', 'D', '-', '-']
['a', 'b', 'c', '1', '2', '3', 'A', 'B', 'C', 'D', '-', '-', '\\']
ret6=re.findall("[^A-Z]c","Ac111crc#c")
print(ret6)
ret7=re.findall("[^A-Z][0-9]","Ac121crc#c")
print(ret7)
The output is as follows
['1c', 'rc', '#c']
['c1', '21']
msg="welcome to changsha,welcome to hunan"
rest=re.findall("changsha|hunan",msg)
print(rest)
The output is as follows
['changsha', 'hunan']
rest2=re.findall("p.thon","Pythonpthon p thon p-thon p\nthon")
print(rest2)
The output is as follows
['p thon', 'p-thon']
rest3=re.findall("^python","python hello pyth3on1")
print(rest3)
rest4=re.findall("python$","pyth3on hello python")
print(rest4)
The output is as follows
['python']
['python']
ret=re.findall("py?","python p pyy ps")
print(ret)
The output is as follows
['py', 'p', 'py', 'p']
ret=re.findall("py*","python p pyy ps")
print(ret)
The output is as follows
['py', 'p', 'pyy', 'p']
ret=re.findall("py+","python p pyy ps")
print(ret)
The output is as follows
['py', 'pyy']
ret=re.findall("py{2}","python p pyy ps pyyyy")
print(ret)
The output is as follows
['pyy', 'pyy']
ret=re.findall("py{2,}","python p pyy ps pyyyy")
print(ret)
The output is as follows
['pyy', 'pyyyy']
ret=re.findall("py{2,4}","python p pyy ps pyyyy")
print(ret)
The output is as follows
['pyy', 'pyyyy']
Greedy mode :* + ? Are greedy , They will match as long a string as possible Non greedy model : Match to output , Match as short as possible (+? *? ?? {2,4}?)
msg="helloooooo,I am sanchuang,123"
print(re.findall("lo{3,}",msg))
print(re.findall("lo{3,}?",msg))
print(re.findall("lo*?",msg))
print(re.findall("lo?",msg))
print(re.findall("lo??",msg))
msg="cats and dogs , cats1 and dog1"
print(re.findall("cats.*s",msg))
print(re.findall("cats.*?s",msg))
The output is as follows
['loooooo']
['looo']
['l', 'l']
['l', 'lo']
['l', 'l']
['cats and dogs , cats']
['cats and dogs']
When using grouping , Except you can get the whole match , You can also select each individual group , Use () Grouping
match Object's group function , The default parameter is 0, Represents all strings of the output function Parameters n(n>0), Indicates the content matched by the output group
msg="tel:173-7572-2991"
ret=re.search(r"(\d{3})-(\d{4})-(\d{4})",msg)
# ret1=re.search(r"\d{3}-\d{4}-\d{4}",msg)
print(ret.groups())
print(ret.group())
print(ret.group(1))
print(ret.group(2))
print(ret.group(3))
ret=re.search(r"(\d{3})-(\d{4})-(\d{4})",msg)
The output is as follows
('173', '7572', '2991')
173-7572-2991
173
7572
2991
Capture groups -- After grouping, the matched data is temporarily placed in memory , And given an index from the beginning therefore , Capture groups can be referenced backwards \1 \2
ret = re.search(r"(\d{3})-(\d{4})-\2","173-7572-7572")
print(ret.group())
ret = re.search(r"(\d{3})-(\d{4})-\1","173-7572-173")
print(ret.group())
The output is as follows
173-7572-7572
173-7572-173
Group only, do not capture , The matched content will not be temporarily put into memory , Cannot use group backward reference
ret = re.search(r"(?:\d{3})-(\d{4})-\1","173-7572-7572")
print(ret.group(1))
The output is as follows
7572
If there are capture groups ,findall Only the captured group content will be output
ret = re.findall(r"(?:\d{3})-(\d{4})-\1","173-7572-7572")
print(ret)
The output is as follows
['7572']
msg="[email protected]@[email protected]@163.com" find 126.com and qq.com and 163.com Your email address
Code implementation
msg="[email protected]@[email protected]@163.com"
print(re.findall(r"(?:\.com)?(\[email protected](?:126|qq|163)\.com)",msg))
The output is as follows
['[email protected]', '[email protected]', '[email protected]']
import re
ret=re.search(r'(?P<first>\d{3})-\d{3}-(?P<last>\d{3})',"321-123-231")
print(ret.group())
print(ret.groups())
print(ret.groupdict())
ret=re.findall(r'(?P<first>\d{3})-\d{3}-(?P<last>\d{3})',"321-123-231")
print(ret)
The output is as follows
321-123-231
('321', '231')
{'first': '321', 'last': '231'}
[('321', '231')]
re.I GNORECASE, Make match match case insensitive re.M re.MULTILINE, Multi-line matching , influence ^ and $ re.S re.DOTALL, send . Match all characters including line breaks
import re
ret=re.findall("^python$","Python",re.I)
print(ret)
ret=re.findall("^python$","Python\npython",re.I)
print(ret)
ret=re.findall("^python$","Python\npython",re.I|re.M)
print(ret)
The output is as follows
['Python']
[]
['Python', 'python']
# Case insensitive , And multiple lines match
msg="""
python
python
Python
"""
print(re.findall("^python$",msg,re.M|re.I))
print(re.findall(".+",msg,re.S))
The output is as follows
['python', 'python', 'Python']
['\npython\npython\nPython\n']
(?imx) Regular expressions contain three optional flags :i, m, or x . Only the areas in brackets are affected . (?imx: re) Use... In parentheses i, m, or x Optional logo
import re
ret=re.findall("(?i)^python$","Python")
print(ret)
ret=re.findall("(?i)^python$","Python\npython")
print(ret)
ret=re.findall("(?im)^python$","Python\npython")
print(ret)
The output is as follows
['Python']
[]
['Python', 'python']
Inline tags can be valid for only one field , When you use inline markup, you should add a space between it and the following expression
ret=re.findall("(?i:hello) Python","Hello python")
print(ret)
ret=re.findall("(?i:hello) python","Hello python")
print(ret)
The output is as follows
[]
['Hello python']
Regular expression assertions are divided into : Assert ahead (lookahead) And the following assertion (lookbehind) The first assertion and the second assertion of regular expressions have 4 In the form of : n (?=pattern) Zero width forward lookahead assertion (zero-width positive lookahead assertion) n (?!pattern) Zero width negative look ahead assertion (zero-width negative lookahead assertion) n (?<=pattern) Zero width forward and backward assertion (zero-width positive lookbehind assertion) n (?<!pattern) Zero width negative backward assertion (zero-width negative lookbehind assertion)
import re
s='a reguler expression'
print(re.findall(r're(?=guler)',s))
s='a reguller expression'
print(re.findall(r're(?=guler)',s))
The output is as follows
['re']
[]
import re
s='a reguler expression'
print(re.findall(r're(?!guler)',s))
s='a reguller expression'
print(re.findall(r're(?!guler)',s))
The output is as follows
['re']
['re', 're']
import re
s='a reguler expression'
print(re.findall(r'(?<=re)guler',s))
s='a reguller expression'
print(re.findall(r'(?<=re)guler',s))
The output is as follows
['guler']
[]
import re
s='a reguler expression'
print(re.findall(r'(?<!re)guler',s))
s='a reguller expression'
print(re.findall(r'(?<!re)expression',s))
The output is as follows
[]
['expression']