您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Python | an article full of regular expressions

編輯：Python

Catalog

The role of regular expressions

re Module basic usage

1.match And search: Find the first match

re Module basic usage -raw

re Module basic usage -match object

re Module basic usage -findall

Regular substitution

re Module basic usage -compile

Basically regular

1. Section [] Specify the range according to the coding sequence

2. Interval negation

3. Match or

4. “.” Place holder , Express Division \n Any character other than

5. Match start and end ^,$

Shortcut

Regular repetition

1. ? Indicates matching the previous item 0 Time or 1 Time

2. * Indicates that the previous item is matched any time (0-n Time )

3. + Indicates that the previous item is matched at least once

4.{n} n Is a non negative integer . Matched definite n Time .

5.{n,} n Is a non negative integer . Match at least n Time .

6.{n,m} Indicates matching the previous item n-m Time , Least match n Times and at most m Time

Greedy mode and non greedy mode

Regular grouping

1. Capture groups

2. Reference group （ Group backward reference ）

3. Non capture grouping (?:regex)

Example

4. Name groups

Regular tags are often used

Inline tag

Regular assertion

1. Zero width forward lookahead assertion

2. Zero width negative look ahead assertion

3. Zero width forward and backward assertion

4. Zero width negative backward assertion

The role of regular expressions

1. Filter text ( data mining )
Specify a matching rule , To identify whether the rule is in a larger text string .
2. Validation of validity
Use regular to confirm whether the obtained data is the expected value

Advantages and disadvantages of regular expressions
• advantage ： Improve work efficiency 、 Save code
• shortcoming ： complex , Difficult to understand

re Module basic usage

1.match And search: Find the first match

re.search
• Find a match
• Accept a regular expression and a string , And return the first match found .
• If no match is found at all ,re.search return None

>>> import re
>>> rest=re.search(r'sanle','hello sanle')
>>> print(rest)
<_sre.SRE_Match object; span=(6, 11), match='sanle'>
>>> type(rest)
<class '_sre.SRE_Match'>

re.match
• Find a match from the string header
• Accept a regular expression and a string , Match from the first character of the main string , And return the first match found .
• If the string doesn't start with a regular expression , The match fails ,re.match return None

>>> rest=re.match(r'sanle','hello sanle')
>>> print(rest)
None
>>> type(rest)
<class 'NoneType'>
>>> rest=re.match(r'sanle','sanle sanle hello sanle')
>>> print(rest)
<_sre.SRE_Match object; span=(0, 5), match='sanle'>
>>> type(rest)
<class '_sre.SRE_Match'>

re Module basic usage -raw

r'sanle' Medium r It stands for raw（ Original string ）

• The difference between the original string and the normal string is that the original string will not \ The character is interpreted as an escape character

• Regular expressions using primitive characters are common and useful

>>> rest=re.search('\\tsanle','hello\\tsanle')
>>> print(rest)
None
>>> rest=re.search(r'\\tsanle','hello\\tsanle')
>>> print(rest)
<_sre.SRE_Match object; span=(5, 12), match='\\tsanle'>
>>> re.search('\\\\tsanle','hello\\\\tsanle')
<_sre.SRE_Match object; span=(6, 13), match='\\tsanle'>
>>> re.search(r'\\\\tsanle','hello\\\\tsanle')
<_sre.SRE_Match object; span=(5, 13), match='\\\\tsanle'>

re Module basic usage -match object

match.group(default=0)： Returns the matching string .

• group This is because regular expressions can be divided into multiple subgroups that only call out matching subsets .

• 0 Is the default parameter , Represents the entire string of matches ,n It means the first one n A minute

match.start()

• start Method provides the index of the start of the match in the original string

match.end()

• end Method provides the index of the start of the match in the original string

match.groups()

• groups Returns a tuple containing all the group strings , from 1 To Group number included

>>> msg="It's rainning cats and dogs"
>>> match=re.search(r'cats',msg)
>>> print(match)
<_sre.SRE_Match object; span=(14, 18), match='cats'>
>>> print(match.group())
cats
>>> print(match.start())
14
>>> print(match.end())
18
>>> print(match.groups())
()

re Module basic usage -findall

findall and finditer： Multiple matches found

re.findall

• Find and return a matching string , Return a list

re.finditer

• Find and return a matching string , Returns an iterator

>>> rest=re.findall(r'sanle','hello sanle sanlee sanlee')
>>> print(rest)
['sanle', 'sanle', 'sanle']
>>> msg="It's rainning cats and dogs"
>>> re.findall('a',msg)
['a', 'a', 'a']
>>> re.finditer('a',msg)
<callable_iterator object at 0x7f06f13bc5f8>
# msg="aaaaaa"
# result=re.finditer("a",msg)
# for i in result:
# print(i)
# print(i.group())

Regular substitution

re.sub(' Matches a regular ',' replace content ','string')
• take string Replace the matching content in with the new content

print(re.sub("python","Python","I am learning python3"))
print(re.sub("python","Python","I am learning python3 python"))

re Module basic usage -compile

Features of compiling regular ：

• Complex regular reusable .

• It is more convenient to use compiled regular , The parameter is omitted .

• re The module caches its improvised regular expressions , So in most cases , Use compile Not very big Performance advantages

msg1="hello world"
msg2="i am learning python"
msg3="sanle"
print(re.findall("python",msg1))
print(re.findall("python",msg2))
print(re.findall("python",msg3))
reg = re.compile("python") # Compile regular expressions into objects
print(reg.findall(msg1))
print(reg.findall(msg2))
print(reg.findall(msg3))

Basically regular

1. Section [] Specify the range according to the coding sequence

ret1=re.findall("python","Python on python")
print(ret1)
ret2=re.findall("[Pp]ython","Python on python")
print(ret2)
ret3=re.findall("[A-Za-z0-9-]","abc123ABCD--")
print(ret3)
ret4=re.findall("[a-zA-Z0-9-]","abc123ABCD--")
print(ret4)
ret5=re.findall("[A-z0-9\-]","abc123ABCD--\\")
print(ret5)

The output is as follows

['python']
['Python', 'python']
['a', 'b', 'c', '1', '2', '3', 'A', 'B', 'C', 'D', '-', '-']
['a', 'b', 'c', '1', '2', '3', 'A', 'B', 'C', 'D', '-', '-']
['a', 'b', 'c', '1', '2', '3', 'A', 'B', 'C', 'D', '-', '-', '\\']

2. Interval negation

ret6=re.findall("[^A-Z]c","Ac111crc#c")
print(ret6)
ret7=re.findall("[^A-Z][0-9]","Ac121crc#c")
print(ret7)

The output is as follows

['1c', 'rc', '#c']
['c1', '21']

3. Match or

msg="welcome to changsha,welcome to hunan"
rest=re.findall("changsha|hunan",msg)
print(rest)

The output is as follows

['changsha', 'hunan']

4. “.” Place holder , Express Division \n Any character other than

rest2=re.findall("p.thon","Pythonpthon p thon p-thon p\nthon")
print(rest2)

The output is as follows

['p thon', 'p-thon']

5. Match start and end ^,$

rest3=re.findall("^python","python hello pyth3on1")
print(rest3)
rest4=re.findall("python$","pyth3on hello python")
print(rest4)

The output is as follows

['python']
['python']

Shortcut

\d Match the Numbers , namely 0-9\D matching ⾮ Numbers , It's not numbers \s Match empty ⽩, That is, the space ,tab key \S matching ⾮ empty ⽩ character \w Match word characters , namely a-z、A-Z、0-9、_\W matching ⾮ Word characters \A Match string start \b Word boundaries , Match empty string , But only at the beginning or end of the word \B Non word boundaries , No Can be at the beginning or end of a word

Regular repetition

1. ? Indicates matching the previous item 0 Time or 1 Time

ret=re.findall("py?","python p pyy ps")
print(ret)

The output is as follows

['py', 'p', 'py', 'p']

2. * Indicates that the previous item is matched any time (0-n Time )

ret=re.findall("py*","python p pyy ps")
print(ret)

The output is as follows

['py', 'p', 'pyy', 'p']

3. + Indicates that the previous item is matched at least once

ret=re.findall("py+","python p pyy ps")
print(ret)

The output is as follows

['py', 'pyy']

4.{n} n Is a non negative integer . Matched definite n Time .

ret=re.findall("py{2}","python p pyy ps pyyyy")
print(ret)

The output is as follows

['pyy', 'pyy']

5.{n,} n Is a non negative integer . Match at least n Time .

ret=re.findall("py{2,}","python p pyy ps pyyyy")
print(ret)

The output is as follows

['pyy', 'pyyyy']

6.{n,m} Indicates matching the previous item n-m Time , Least match n Times and at most m Time

ret=re.findall("py{2,4}","python p pyy ps pyyyy")
print(ret)

The output is as follows

['pyy', 'pyyyy']

Greedy mode and non greedy mode

 Greedy mode ：* + ? Are greedy , They will match as long a string as possible
Non greedy model ： Match to output , Match as short as possible (+? *? ?? {2,4}?)


msg="helloooooo,I am sanchuang,123"
print(re.findall("lo{3,}",msg))
print(re.findall("lo{3,}?",msg))
print(re.findall("lo*?",msg))
print(re.findall("lo?",msg))
print(re.findall("lo??",msg))
msg="cats and dogs , cats1 and dog1"
print(re.findall("cats.*s",msg))
print(re.findall("cats.*?s",msg))

The output is as follows

['loooooo']
['looo']
['l', 'l']
['l', 'lo']
['l', 'l']
['cats and dogs , cats']
['cats and dogs']

Regular grouping

When using grouping , Except you can get the whole match , You can also select each individual group , Use () Grouping

1. Capture groups

match Object's group function , The default parameter is 0, Represents all strings of the output function
Parameters n(n>0), Indicates the content matched by the output group

msg="tel:173-7572-2991"
ret=re.search(r"(\d{3})-(\d{4})-(\d{4})",msg)
# ret1=re.search(r"\d{3}-\d{4}-\d{4}",msg)
print(ret.groups())
print(ret.group())
print(ret.group(1))
print(ret.group(2))
print(ret.group(3))
ret=re.search(r"(\d{3})-(\d{4})-(\d{4})",msg)

The output is as follows

('173', '7572', '2991')
173-7572-2991
173
7572
2991

2. Reference group （ Group backward reference ）

 Capture groups -- After grouping, the matched data is temporarily placed in memory , And given an index from the beginning
therefore , Capture groups can be referenced backwards \1 \2

ret = re.search(r"(\d{3})-(\d{4})-\2","173-7572-7572")
print(ret.group())
ret = re.search(r"(\d{3})-(\d{4})-\1","173-7572-173")
print(ret.group())

The output is as follows

173-7572-7572
173-7572-173

3. Non capture grouping (?:regex)

 Group only, do not capture , The matched content will not be temporarily put into memory , Cannot use group backward reference

ret = re.search(r"(?:\d{3})-(\d{4})-\1","173-7572-7572")
print(ret.group(1))

The output is as follows

 If there are capture groups ,findall Only the captured group content will be output

ret = re.findall(r"(?:\d{3})-(\d{4})-\1","173-7572-7572")
print(ret)

The output is as follows

['7572']

Example

msg="[email protected]@[email protected]@163.com"
find 126.com and qq.com and 163.com Your email address

Code implementation

msg="[email protected]@[email protected]@163.com"
print(re.findall(r"(?:\.com)?(\[email protected](?:126|qq|163)\.com)",msg))

The output is as follows

['[email protected]', '[email protected]', '[email protected]']

4. Name groups

import re
ret=re.search(r'(?P<first>\d{3})-\d{3}-(?P<last>\d{3})',"321-123-231")
print(ret.group())
print(ret.groups())
print(ret.groupdict())
ret=re.findall(r'(?P<first>\d{3})-\d{3}-(?P<last>\d{3})',"321-123-231")
print(ret)

The output is as follows

321-123-231
('321', '231')
{'first': '321', 'last': '231'}
[('321', '231')]

Regular tags are often used

 re.I GNORECASE, Make match match case insensitive
re.M re.MULTILINE, Multi-line matching , influence ^ and $
re.S re.DOTALL, send . Match all characters including line breaks

import re
ret=re.findall("^python$","Python",re.I)
print(ret)
ret=re.findall("^python$","Python\npython",re.I)
print(ret)
ret=re.findall("^python$","Python\npython",re.I|re.M)
print(ret)

The output is as follows

['Python']
[]
['Python', 'python']

# Case insensitive , And multiple lines match

msg="""
python
python
Python
"""
print(re.findall("^python$",msg,re.M|re.I))
print(re.findall(".+",msg,re.S))

The output is as follows

['python', 'python', 'Python']
['\npython\npython\nPython\n']

Inline tag

(?imx) Regular expressions contain three optional flags ：i, m, or x . Only the areas in brackets are affected .
(?imx: re) Use... In parentheses i, m, or x Optional logo

import re
ret=re.findall("(?i)^python$","Python")
print(ret)
ret=re.findall("(?i)^python$","Python\npython")
print(ret)
ret=re.findall("(?im)^python$","Python\npython")
print(ret)

The output is as follows

['Python']
[]
['Python', 'python']

Inline tags can be valid for only one field , When you use inline markup, you should add a space between it and the following expression

ret=re.findall("(?i:hello) Python","Hello python")
print(ret)
ret=re.findall("(?i:hello) python","Hello python")
print(ret)

The output is as follows

[]
['Hello python']

Regular assertion

 Regular expression assertions are divided into ： Assert ahead (lookahead) And the following assertion (lookbehind)
The first assertion and the second assertion of regular expressions have 4 In the form of ：
n (?=pattern) Zero width forward lookahead assertion (zero-width positive lookahead assertion)
n (?!pattern) Zero width negative look ahead assertion (zero-width negative lookahead assertion)
n (?<=pattern) Zero width forward and backward assertion (zero-width positive lookbehind assertion)
n (?<!pattern) Zero width negative backward assertion (zero-width negative lookbehind assertion)

1. Zero width forward lookahead assertion

import re
s='a reguler expression'
print(re.findall(r're(?=guler)',s))
s='a reguller expression'
print(re.findall(r're(?=guler)',s))

The output is as follows

['re']
[]

2. Zero width negative look ahead assertion

import re
s='a reguler expression'
print(re.findall(r're(?!guler)',s))
s='a reguller expression'
print(re.findall(r're(?!guler)',s))

The output is as follows

['re']
['re', 're']

3. Zero width forward and backward assertion

import re
s='a reguler expression'
print(re.findall(r'(?<=re)guler',s))
s='a reguller expression'
print(re.findall(r'(?<=re)guler',s))

The output is as follows

['guler']
[]

4. Zero width negative backward assertion

import re
s='a reguler expression'
print(re.findall(r'(?<!re)guler',s))
s='a reguller expression'
print(re.findall(r'(?<!re)expression',s))

The output is as follows

[]
['expression']

上一篇文章： Python | using Python to implement the tree command of Linux system
下一篇文章： Python | exception inheritance relationship and custom exception implementation code examples

Python

滲透測試-Python安全工具編程基礎

python滲透測試安全工具開發錦集第一章 Python在網

新版selenium4.0 + Python使用詳解

目錄1、selenium簡介2、環境 Python + se

When starting Django project, report attributeerror: STR object has no attribute dcode‘

Wrong presentation ： File

學python，怎麼能不學習scrapy呢！

摘要：本文講述如何編寫scrapy爬蟲。本文分享自華為雲社

ImageAI (三) 使用Python快速簡單實現視頻中物體檢測 Video Object Detection and Tracking

前兩篇已經講解了ImageAI實現圖片預測以及圖片物體檢測的

Python描述 LeetCode 77. 組合

Python描述 LeetCode 77. 組合大家好，我

Django admin uses import_ Export display and import / export data

51job crawler + data visualization Python

[formatting method] Python & String

Python scientific tool -- Matplotlib visualization (histogram, histogram, pie chart)

Python -- data visualization using Matplotlib Library

Python & c++ mixed call programming comprehensive practice -16c++ calls Pythons class instantiation object to access member functions and members

Python & C + + Mixed invoke Programming full Reality - 16c + + invoke Python class instanciated Object Access Member Functions and Members

Python | class foundation

Python | 類基礎

Python & c++ mixed call programming comprehensive practice -17c++ calls Python functions, passes list parameters and gets the return

熱門圖文

Delphi語言學習6-函數參數(2) .Net 巧用COALESCE 動態組成 SQL 查詢條件從DAO轉換到ADO 自制進程管理器貪心 CF 333B Chips 字符轉換為整數的方法 c# 將PPT轉換成HTML 取得inputStream的長度，inputstream長度

欄目導航