程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

Python common string operations

編輯:Python

攜手創作,共同成長!這是我參與「掘金日新計劃 · 8 月更文挑戰」的第4天,點擊查看活動詳情

前言

在處理文本數據時,We usually need to do a number of different operations on it,For example appending a new string after the text、Split text into multiple strings,Or modify the capitalization of letters, etc;當然,除此之外,We will also need to use more advanced text parsing or other methods;但是,Divide text into sentences or words、Operations such as deleting or replacing certain words are the most common.

字符串操作

接下來,We will introduce common basic string operations with some examples.首先,define a piece of text,對其進行拆分,And make some usual edits,Finally concatenate the edited strings together for merging.

常用字符串操作

After defining the input text,Split it into individual words.Text is split with spaces、Newline as default delimiter,使用split()method to split text into individual words,Spaces do not appear in words、Newline or other specified delimiter:

>>> input_text = 'Never regret falling in love with you. The longer you go, the more you cherish it. If time can flow back to the past, I must make a love song with you again, because you are the only one in my life.'
>>> words = input_text.split()
>>> words
['Never', 'regret', 'falling', 'in', 'love', 'with', 'you.', 'The', 'longer', 'you', 'go,', 'the', 'more', 'you', 'cherish', 'it.', 'If', 'time', 'can', 'flow', 'back', 'to', 'the', 'past,', 'I', 'must', 'make', 'a', 'love', 'song', 'with', 'you', 'again,', 'because', 'you', 'are', 'the', 'only', 'one', 'in', 'my', 'life.']
復制代碼

用 “x” Characters replace capital letters that appear in sentences.Iterate over each character of each word,對於每一個字符,if it's a capital letter,則返回一個 “x”.This process is done with two list comprehensions,One operates on a list,The other runs on each word,and check with a conditional statement to only replace characters if they are uppercase —— 'x' if w.isupper() else w for w in word,Use these characters at the end join() 方法連接在一起:

>>> replaced = [''.join('x' if w.isupper() else w for w in word) for word in words]
>>> replaced
['xever', 'regret', 'falling', 'in', 'love', 'with', 'you.', 'xhe', 'longer', 'you', 'go,', 'the', 'more', 'you', 'cherish', 'it.', 'xf', 'time', 'can', 'flow', 'back', 'to', 'the', 'past,', 'x', 'must', 'make', 'a', 'love', 'song', 'with', 'you', 'again,', 'because', 'you', 'are', 'the', 'only', 'one', 'in', 'my', 'life.']
復制代碼

對文本進行編碼,Convert text to plain ASCII 編碼格式,This is very important in practical applications,If not properly encoded,Unexpected errors occur when displaying.Each word is encoded as ASCII 字節序列,Then decode back again Python 字符串類型,and used when converting errors parameter to force substitution of unknown characters:

>>> ascii_text = [word.encode('ascii',errors='replace').decode('ascii') for word in replaced]
>>> ascii_text
['xever', 'regret', 'falling', 'in', 'love', 'with', 'you.', 'xhe', 'longer', 'you', 'go,', 'the', 'more', 'you', 'cherish', 'it.', 'xf', 'time', 'can', 'flow', 'back', 'to', 'the', 'past,', 'x', 'must', 'make', 'a', 'love', 'song', 'with', 'you', 'again,', 'because', 'you', 'are', 'the', 'only', 'one', 'in', 'my', 'life.']
復制代碼

將單詞進行分組,And each group has at most 80 個字符,Each group as a row.Adds an extra newline to all words ending in a period,As a logo for different groups,After that create a new line and add words one by one;If a line has more than words 80 個字符,will end the line and start a new line,同樣,當遇到一個換行符時,Also starts a new line,We also need to add an extra space to separate words:

>>> newlines = [word + '\n' if word.endswith('.') else word for word in ascii_text]
>>> newlines
['xever', 'regret', 'falling', 'in', 'love', 'with', 'you.\n', 'xhe', 'longer', 'you', 'go,', 'the', 'more', 'you', 'cherish', 'it.\n', 'xf', 'time', 'can', 'flow', 'back', 'to', 'the', 'past,', 'x', 'must', 'make', 'a', 'love', 'song', 'with', 'you', 'again,', 'because', 'you', 'are', 'the', 'only', 'one', 'in', 'my', 'life.\n']
>>> line_size = 80
>>> lines = []
>>> line = ''
>>> for word in newlines:
...     if line.endswith('\n') or len(line) + len(word) + 1 > line_size:
...             lines.append(line)
...             line = ''
...     line = line + ' ' + word
復制代碼

 

最後,Format each row as a header(每個單詞的第一個字母大寫),and concatenate them into a piece of text:

>>> lines = [line.title() for line in lines]
>>> result = ''.join(lines)
>>> print(result)
 Xever Regret Falling In Love With You.
 Xhe Longer You Go, The More You Cherish It.
 Xf Time Can Flow Back To The Past, X Must Make A Love Song With You Again,
復制代碼

其它字符串操作

除了上述操作外,Some other useful operations you can perform on strings.例如,Strings can be sliced ​​just like any other list,'love'[0:3] 將返回 lov.類似於 title() 方法,可以使用 upper() 方法和 lower() 方法,Can be used to return uppercase and lowercase versions of the string, respectively:

>>> print('unicode'[0:3])
uni
>>> print('unicode'.upper())
UNICODE
>>> print('UNicode'.lower())
unicode
復制代碼

  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved