您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

python正則如何匹配中文漢字

編輯：Python

正則表達式匹配中文漢字，在實際應用中十分常見。
比如：爬蟲網頁文本提取、驗證用戶輸入標准等。
以下面文本字符串為例，匹配出astr這個字符串中的所有漢字。

import re
astr = '''aaaaa何時when 杖爾看see南雪snow，我me與梅花plum blossom兩白頭'''

下面介紹兩種方法（本文環境為python3）
一、使用Unicode編碼來匹配中文
常見的中文Unicode編碼范圍：\u4e00-\u9fa5
實現匹配代碼：re.findall(’[\u4e00-\u9fa5]’, astr)

import re
astr = '''aaaaa何時when 杖爾看see南雪snow，我me與梅花plum blossom兩白頭'''
res = re.findall('[\u4e00-\u9fa5]', astr)
print(res)

匹配結果：

二、直接使用中文漢字實現中文匹配
沒使用過可能還真不知道，中文匹配還可以這樣
實現匹配代碼：re.findall(’[一-龥]’, astr)

import re
astr = '''aaaaa何時when 杖爾看see南雪snow，我me與梅花plum blossom兩白頭'''
res = re.findall('[一-龥]', astr)
print(res)

匹配結果：

注：其實這裡“一”對應的Unicode編碼就是“\u4e00”,“龥”（yù）對應的Unicode編碼就是“\u9fa5”。

常見非英文字符Unicode編碼范圍：
u4e00-u9fa5 (中文)
u0800-u4e00 (日文)
uac00-ud7ff（韓文）

Python

Resource download address ：htt

Reasonable use of exceptions i

Python3.7.2環境安裝1. 前言本章節我們將向大家介

一，進入Python官網下載安裝包如果嫌棄在官網下載麻煩的

p{margin:10px 0}.markdown-body

目錄環境搭建安裝Python檢查安裝情況添加環境變量安裝Py