您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

How does python regular match Chinese characters

編輯：Python

Regular expressions match Chinese characters, which are very common in practical applications.
For example: crawler web page text extraction, validation of user input standards, etc.
Take the following text string as an example, match all Chinese characters in the string asstr.

import reastr = '''aaaaaWhen when, see Nanxue snow, I am with plum blossomTwo white heads'''

Two methods are described below (the environment of this article is python3)
1. Use Unicode encoding to match Chinese
Common Chinese Unicode encoding range: \u4e00-\u9fa5
Implement matching code: re.findall('[\u4e00-\u9fa5]', astr)

import reastr = '''aaaaaWhen when, see Nanxue snow, I am with plum blossomTwo white heads'''res = re.findall('[\u4e00-\u9fa5]', astr)print(res)

Matches:

Second,Directly use Chinese characters to achieve Chinese matching
If you haven’t used it, you may not know it. Chinese matching can also be done in this way
The matching code is: re.findall('[一-羥]', asstr)

import reastr = '''aaaaaWhen when, see Nanxue snow, I am with plum blossomTwo white heads'''res = re.findall('[一-羥]', astr)print(res)

Match result:

Note: Actually here"The Unicode code corresponding to "1" is "\u4e00", and the Unicode code corresponding to "徥" (yù) is "\u9fa5".

Common non-English characters Unicode encoding range:
u4e00-u9fa5 (Chinese)
u0800-u4e00 (Japanese)
uac00-ud7ff (Korean)

上一篇文章： 5 common deduplication methods for python lists
下一篇文章： python placeholder %s%d%f

Python

Python基礎語法（四）之列表

Python基礎語法（四）之列表目錄3.0 概述1list.

Python code question, confused about the knowledge points inside

Not familiar with string relat

python基礎之函數嵌套定義

博主簡介：原互聯網大廠tencent員工，網安巨頭Venus

【邂逅Django】——（二）數據庫配置

邂逅Django - 目錄 Part 1：【邂逅

10 application examples of Python office automation processing

Catalog introduction 1、Python

Python 拼接C＃字典格式對象

依據一個Excel中的2列創建一個字典格式的數據。例如：將這