程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Getting started with Python regular expressions

編輯:Python

If you are little white , This set of information can help you become a big bull , If you have rich development experience , This set of information can help you break through the bottleneck
2022web Full set of video tutorial front-end architecture H5 vue node Applet video + Information + Code + Interview questions .

?? Today we are going to study python Part of the regular expression of , First, why do you want to learn this part , Of course, it's because regular expressions are so convenient to deal with text data . For later entry nlp Lay a foundation in the field !

First recommend a website : Used for regular expression validation .
It looks like this .

Here's the catalog title

    • 1. Basic knowledge of
    • 2. Greedy mode and non greedy mode
    • 3. The use of backslashes
    • 4. Usage of brackets
    • 5. Match start and end positions
    • 6. Use of parentheses — Group selection
    • 7. Regular expressions cut characters
    • summary

1. Basic knowledge of

  • Ordinary character : The meaning of ordinary characters is that bytes match them .
  • Special characters : They appear in regular expressions , Not directly match them , But express some special meaning .

. Means to match any single character except the newline character

For example, match ‘’. company ‘’( Match three characters )

# Here's to show python How to use regular expressions
import re # Regular expression library
content='''
The apple is red
Bananas are yellow
The leaves are green
The sky is blue
'''
# Convert the expression to pattern object , You can call the following find Something like that
p=re.compile(r'. color ')
for i in p.findall(content):
print(i)

give the result as follows :

* Means to match the previous subexpression any number of times , Include 0 Time

for example : matching ,.* Represents a match , And all the following characters

Of course _ It can be preceded by ordinary characters “ good _” matching “ Good good …”

+ Indicates that the previous subexpression is matched one or more times , barring 0 Time

The difference is that it does not include 0 Time .
+ The no. ( You can't 0 Time )

* The no. ( Sure 0 Time )

{ } Matches the number of times specified by the preceding character

for example : expression " oil {2,4}" Indicates that the matching oil word is the least 2 Most times 4 Time

2. Greedy mode and non greedy mode

I wonder if you understand the following picture ?
We just want to match one by one

<head><title>

label , But did it help me? It all matched , This is because it only sees the first “< ” And the back left “>” The middle part is all regarded as arbitrary characters , This is the greedy model , It will match the characters as much as possible .

To become a non greedy model , Need to be in ‘+’,”*“ Add one at the end ‘?’ This is the match 4 Characters .

3. The use of backslashes

The backslash Multiple uses in regular expressions , Like escape

for example : We need to look for . All the elements before Need to use . / . .*/. ./. The slash is to tell the program that the next character represents an ordinary character . The meaning of

Backslash can be combined with some characters to represent some special characters

4. Usage of brackets

Brackets can be used to indicate conditions or [0123] perhaps [0-3] This character can be 0,1,2,3
You can also store characters [ Yellow red green blue ]、[a-z] This kind of

for example : Matching inclusion ‘’ yes [ Yellow red green blue ] color ‘’ The characters of

It should be noted that some metacharacters are in [] In life, you lose your meaning , Become ordinary characters
for example . + * There is no need to escape .‘

For example, find “. yes ” The characters of

If in [] Use in ^ character , Represents the concept of non

for example : Match non numeric characters

5. Match start and end positions

^ Indicates the starting position of the matching text, but the effect is different in different modes
Regular table expressions mainly include 2 Patterns : Single line mode and multi line mode
One way mode : It means that the whole text is regarded as a set of data , Match only the beginning of all data
Multi line mode : It refers to treating each row as a set of data , Match the beginning of each line

for example : We use single line pattern matching , Only match the first line 001

for example : We use multiline matching , It's a match 001、002、003

The problem is coming. , stay python How to decide whether to use single line mode or multi line mode in ?
stay compile Add parameters re.M perhaps re.MULTILINE Will do .

give the result as follows :

$ Indicates the end of the file , Usage and ^ similar , It is also divided into multi line mode and single line mode

One way mode

Multi line mode

6. Use of parentheses — Group selection

Group selection : It refers to selecting the characters we need from the regular expression matching results , for example : We need to match the characters before the comma , We might write “.*,” But the matched characters contain commas , But we don't want this comma , In this case, you need to use group selection .

Have a look python Writing

If you encounter multiple groups , Then each row of data becomes a tuple , You can extract the corresponding character by tuple subscript .

Let's have a little practice : Here's a set of data , Please choose a person's name and telephone number

 Apple , Telephone 123131
Banana , Telephone 234241
leaf , Telephone 245363
sky , Telephone 124234

python The implementation is as follows :

7. Regular expressions cut characters

String object split() Method is only suitable for very simple string segmentation , When you need more flexibility to cut characters , You need to use regular expressions
for example :

# We have a set of data here
names=‘ Guan yu ; Zhang Fei , d , The professor , Li Yuanfang Judge dee ’

How should this be cut ?
We can use re.split Use the symbols of regular expressions to specify separators .

summary

This chapter is mainly about the basic study of regular expressions , As a small introductory tutorial, it's still very good , I will continue to add... When I encounter complex usage in the future .


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved