Regular matching is often involved in daily development scenarios , For example, the device collects information , Filter profile , Filter relevant web page elements, etc , All for Python Regular matching in Re modular , There are many places to summarize and sort out . This article mainly summarizes some functions that are often used , And the pits encountered in the process of use .
Regular expressions
Experience in using regular matching :
A large number of data can be preprocessed , Remove some redundant symbols , For example, line changing. , If multiple spaces appear , You can replace it with a single space .
When obtaining data for processing , Get the specified data as much as possible , Reduce other data interference , At the same time, it can also improve the transmission efficiency ;
You can set the start and end characters to filter multiple child elements
About regular expression syntax , I won't repeat , You can refer to the online documentation , There is no summary here ;
Re modular
re The module is Python A module used to handle regular expression matching operations .
Re Module common functions :
re.search(pattern, string, flags=0)
import re
text= "Hello, World!"
re.search("[A-Z]", text)
remarks :
pattern For regular expressions ,string For the string that needs to be matched ,flags Is the flag of the regular expression ;
search The function scans the entire string to find the first position that matches the regular expression , And return the corresponding matching object , If there is no match, it returns None;
import re
text="Hello,World"
re.match("[a-z]", text)
re.fullmatch("\S+", text)
remarks :
match The function matches the regular expression from the beginning , If one or more characters at the beginning match the regular expression style , Returns a matching object , conversely , return None;
Pay attention to distinguish between match and search,match The function checks the beginning of a string ,search The function is to check any position of a string ;
fullmatch Yes, if the whole string All match to regular expressions , Return a corresponding matching object , Otherwise, return one None;
re.split(pattern, string, maxsplit=0, flags=0)
import re
text = "aJ33Sjd3231ssfj22323SSdjdSSSDddss"
re.split("([0-9]+)", text)
re.split("[0-9]+", text)
remarks :
split Functions are separated by regular expressions string, If parentheses can be detected in regular expressions , The split string will remain in the list ;
maxsplit, The maximum number of splits , After splitting, all the remaining strings will be returned to the last element in the list ;
import re
text = "aJ33Sjd3231ssfj22323SSdjdSSSDddss"
a = re.findall("[0-9]+", text)
print(a)
b = re.finditer("[0-9]+", text)
for i in b:
print(i.group())
remarks :
findall() function ,string Scan from left to right , Match regular expression , All matched are arranged in order to form a list and return to ;
finditer() function ,string Scan from left to right , Match regular expression , Arrange the results in order and return to an iterator iterator, The iterator holds
text = "aJ33Sjd3231ssfj22323SSdjdSSSDddss"
a = re.sub("[0-9]", "*", text)
b = re.subn("[0-9]", "*", text)
print(a)
print(b)
remarks :
sub Function USES repl Replace string The result of each match in , Then return the substitute result ,count The parameter represents the number of replacements , Replace all by default ;
subn Function behavior sub identical , But it returns a tuple ( character string , Number of replacements )
Re Regular expression objects
re.compile(pattern, flags=0)
import re
prog = re.compile("\<div[\s\S]*?class=\"([\s\S]*?)\"[\s\S]*?\>")
text = '<div class="tab" >'
prog.search(text)
prog.findall(text)
remarks :
compile Function can compile a regular expression into a regular expression object , Match search through the methods provided by the object ;
Generally, when you need to use this regular expression multiple times , Use re.compile() And save this regular object for reuse , Can make the program more efficient ;
The methods provided by regular expression objects can be seen in the above Re Common functions ;
Re A match object
When a common function or regular expression object matches the returned _sre.SRE_Match Objects are called matching objects
import re
a = "Hello, World, root"
b = re.search("(\w+), (\w+), (?P<name>\w+)", a)
print(b.group(0))
print(b.group(1))
print(b.groups())
print(b.groupdict())
remarks :
group Method returns one or more matching subgroups , That is, brackets () The combination of , The default is to return the entire match ;
groups Method , Returns a tuple , Contains all matching subgroups ;
groupdict Method , Return a dictionary , Contains all named subgroups ;
Regular expressions have a wide range of applications , I hope this article can help you learn Python Help !