您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Python common string operations

編輯：Python

攜手創作,共同成長！這是我參與「掘金日新計劃 · 8 月更文挑戰」的第4天,點擊查看活動詳情

前言

在處理文本數據時,We usually need to do a number of different operations on it,For example appending a new string after the text、Split text into multiple strings,Or modify the capitalization of letters, etc;當然,除此之外,We will also need to use more advanced text parsing or other methods;但是,Divide text into sentences or words、Operations such as deleting or replacing certain words are the most common.

字符串操作

接下來,We will introduce common basic string operations with some examples.首先,define a piece of text,對其進行拆分,And make some usual edits,Finally concatenate the edited strings together for merging.

常用字符串操作

After defining the input text,Split it into individual words.Text is split with spaces、Newline as default delimiter,使用split()method to split text into individual words,Spaces do not appear in words、Newline or other specified delimiter：

>>> input_text = 'Never regret falling in love with you. The longer you go, the more you cherish it. If time can flow back to the past, I must make a love song with you again, because you are the only one in my life.'
>>> words = input_text.split()
>>> words
['Never', 'regret', 'falling', 'in', 'love', 'with', 'you.', 'The', 'longer', 'you', 'go,', 'the', 'more', 'you', 'cherish', 'it.', 'If', 'time', 'can', 'flow', 'back', 'to', 'the', 'past,', 'I', 'must', 'make', 'a', 'love', 'song', 'with', 'you', 'again,', 'because', 'you', 'are', 'the', 'only', 'one', 'in', 'my', 'life.']
復制代碼

用 “x” Characters replace capital letters that appear in sentences.Iterate over each character of each word,對於每一個字符,if it's a capital letter,則返回一個 “x”.This process is done with two list comprehensions,One operates on a list,The other runs on each word,and check with a conditional statement to only replace characters if they are uppercase —— 'x' if w.isupper() else w for w in word,Use these characters at the end join() 方法連接在一起：

>>> replaced = [''.join('x' if w.isupper() else w for w in word) for word in words]
>>> replaced
['xever', 'regret', 'falling', 'in', 'love', 'with', 'you.', 'xhe', 'longer', 'you', 'go,', 'the', 'more', 'you', 'cherish', 'it.', 'xf', 'time', 'can', 'flow', 'back', 'to', 'the', 'past,', 'x', 'must', 'make', 'a', 'love', 'song', 'with', 'you', 'again,', 'because', 'you', 'are', 'the', 'only', 'one', 'in', 'my', 'life.']
復制代碼

對文本進行編碼,Convert text to plain ASCII 編碼格式,This is very important in practical applications,If not properly encoded,Unexpected errors occur when displaying.Each word is encoded as ASCII 字節序列,Then decode back again Python 字符串類型,and used when converting errors parameter to force substitution of unknown characters：

>>> ascii_text = [word.encode('ascii',errors='replace').decode('ascii') for word in replaced]
>>> ascii_text
['xever', 'regret', 'falling', 'in', 'love', 'with', 'you.', 'xhe', 'longer', 'you', 'go,', 'the', 'more', 'you', 'cherish', 'it.', 'xf', 'time', 'can', 'flow', 'back', 'to', 'the', 'past,', 'x', 'must', 'make', 'a', 'love', 'song', 'with', 'you', 'again,', 'because', 'you', 'are', 'the', 'only', 'one', 'in', 'my', 'life.']
復制代碼

將單詞進行分組,And each group has at most 80 個字符,Each group as a row.Adds an extra newline to all words ending in a period,As a logo for different groups,After that create a new line and add words one by one;If a line has more than words 80 個字符,will end the line and start a new line,同樣,當遇到一個換行符時,Also starts a new line,We also need to add an extra space to separate words：

>>> newlines = [word + '\n' if word.endswith('.') else word for word in ascii_text]
>>> newlines
['xever', 'regret', 'falling', 'in', 'love', 'with', 'you.\n', 'xhe', 'longer', 'you', 'go,', 'the', 'more', 'you', 'cherish', 'it.\n', 'xf', 'time', 'can', 'flow', 'back', 'to', 'the', 'past,', 'x', 'must', 'make', 'a', 'love', 'song', 'with', 'you', 'again,', 'because', 'you', 'are', 'the', 'only', 'one', 'in', 'my', 'life.\n']
>>> line_size = 80
>>> lines = []
>>> line = ''
>>> for word in newlines:
...     if line.endswith('\n') or len(line) + len(word) + 1 > line_size:
...             lines.append(line)
...             line = ''
...     line = line + ' ' + word
復制代碼

最後,Format each row as a header(每個單詞的第一個字母大寫),and concatenate them into a piece of text：

>>> lines = [line.title() for line in lines]
>>> result = ''.join(lines)
>>> print(result)
 Xever Regret Falling In Love With You.
 Xhe Longer You Go, The More You Cherish It.
 Xf Time Can Flow Back To The Past, X Must Make A Love Song With You Again,
復制代碼

其它字符串操作

除了上述操作外,Some other useful operations you can perform on strings.例如,Strings can be sliced just like any other list,'love'[0:3] 將返回 lov.類似於 title() 方法,可以使用 upper() 方法和 lower() 方法,Can be used to return uppercase and lowercase versions of the string, respectively:

>>> print('unicode'[0:3])
uni
>>> print('unicode'.upper())
UNICODE
>>> print('UNicode'.lower())
unicode
復制代碼

上一篇文章： [Python from entry to practice] Why learn Python?
下一篇文章： Example of sorted() function in Python

Python

Python -- exception handling, assertion and path handling, simple and clear version

1. Path processing 1. Find mod

Python Tkinter - Chapter 4.2 tag attributes

4.2 Tag attributes This secti

Python design pattern - creation pattern - prototype pattern

Catalog List of articles Cat

Python3 security --41--ssh tunnel tool development

This blog address ：https://blo

[Python] function topic (knowledge summary, exercise refinement and programming practice)

Personal home page ： Huang Xia

解密Python C++庫打包到wheel的正確方法

如果你用CPython寫了一個擴展，然後要打包到wheel中

Pandas uses the split function to split the specific string data column of dataframe into two new data columns and generate a new dataframe

Python common modules

Python common statements

LeetCode-1790. Can a string exchange be performed only once to make two strings equal_ Python

Python number to string (%)

Python string function

Format() method for Python string formatting

Leetcode question solution (interview 01.05): judge whether a string can be transformed into a specified string through one edit (Python)

Leetcode question solution (interview 01.03): URL string (adjust the string length and replace the space with $20) (Python)

Static methods, common methods and class methods of classes in python3

熱門圖文

C#--第2周實驗--任務12--輸入10個數存入數組中，然後實現冒泡排序是什麼造成PHP遠程文件包含漏洞產生非常實用的php彈出錯誤警告函數擴展性強 asp.net 2.0裡當readonly遇上enableviewstate=false [LeetCode] Linked List Cycle vs-vc++6.0個Visual studio是同一個軟件嗎？（）20個字符夠了吧 SQLCLR（三）觸發器組件：UDDI4J

欄目導航