您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Basic operations of Python Chinese word segmentation: Jieba Thesaurus (basic knowledge + examples)

編輯：Python

jieba【 Chinese word segmentation 】

Catalog

jieba【 Chinese word segmentation 】
- jieba What is the library
- - jieba Installation and import of library
- jieba Library usage
- - 1） Accurate model ：
  - 2） All model ：
  - 3） Search engine model ：
  - 4）jieba Library common functions ：

———————————————————————————————————————————————————————————————

jieba What is the library

Jieba library Is an excellent third-party library of Chinese word segmentation , Chinese text needs to get a single word through word segmentation .
Jieba The principle of word segmentation ： Use a Chinese vocabulary , Determine the... Between Chinese characters Correlation probability , The phrases with high probability between Chinese characters , Form participle result . Except for participles , Users can also add custom phrases .

jieba Installation and import of library

jieba The installation of the library uses pip Install or use

# Use pip Installation , Enter... At the console 
pip install jieba

jieba Library usage

Jieba Library participle you 3 Patterns

1） Accurate model ：

Accurate model ： It is to accurately cut a paragraph of text into several Chinese words , Several Chinese words are combined , Exactly revert to the previous text . There are no redundant words .

jieba.icut(s) # Accurate model

2） All model ：

All model ： Scan out all possible words in a text , There may be a piece of text that can be cut into different patterns , Or there are different angles to segment into different words , In full mode ,Jieba The library will dig out various combinations . The information after word segmentation will be redundant when combined , No longer the original text .

jieba.icut(s,cut_all = ture) # All model

3） Search engine model ：

Search engine model ： On the basis of the precise model , For those long words found , We'll segment it again , Then it is suitable for the index and search of short words by search engines . There is also redundancy .

jieba.icut_for_sear(s) # Search engine model

4）jieba Library common functions ：

Jieba Library common functions ： Focus on what type of input （ character string ？ list ？）、 What type of output （ character string ？ list ？）;
Add user thesaurus method ： Add words confirmed by the user that do not want to be segmented

jieba.load_userdict（user.txt）

Add inactive Thesaurus ： Delete the words that the user does not want to include in the statistics

def stopwordslist(): # Create a stop phrase 
stopwords = [line.strip() for line in open('stop_words.txt', encoding='UTF-8').readlines()]
return stopwords # Return to inactive Thesaurus

 stopwords = stopwordslist() # Call the inactive Thesaurus

上一篇文章： python變量
下一篇文章： Python learning (I) basic syntax and input / output functions

Python

SIFT關鍵點檢測和特征匹配（python實現）

文章目錄1.Harris和Shi-Tomasi特征檢測2.S

怎麼使用Python操作Redis數據庫

怎麼使用Python操作Redis數據庫本篇內容主要講解“怎

基於Python實現的口罩佩戴檢測

口罩佩戴檢測一題目背景1.1 實驗介紹今年一場席卷全球的新

【邂逅Django】——（一）創建項目

邂逅Django - 目錄 Part 1：【邂逅Django

html+css+js+python(QtWebEngineWidgets) 實現微信聊天界面-包括時間，文件，純文本等

文章目錄展示參考文章html + js + csspytho

Python+adb screenshot, screen recording, log capture

brief introduction ： In some s

The problem of sorted and reversed in Python

The use of str() and repr() methods in Python

How to add the same character to each element of Python list

Pandas custom change the order of columns in dataframe

Pandas uses the split function to split the specific string data column of dataframe into two new data columns and generate a new dataframe

pandas自定義改變dataframe數據列的前後次序 (change the order of columns in dataframe)

Leetcode solution (1672): total assets of the richest customers (Python)

Python and fractal 0019 - [tutorial] stack of circles

python與分形0019 - 【教程】Stack of Circles

leetcode 2305. Fair Distribution of Cookies（python）

熱門圖文

C#檢測能否有風險字符的SQL字符串過濾辦法 math-簡單的一元二次方程 Python Windows下XDebug 手工配置與使用說明 C++BUILDER的文件操作 uva 1291 dp C++ Builder編程指南2 HDU 2830 Matrix Swapping II (預處理的線性dp) Delphi安裝控件時file not found:designIntf.dcu時的解決方法

欄目導航