目錄
前提提要
要求
思路
代碼
運行結果
分析
1)讀取文件
2)讀取數據
3)數據整理
4)Regular expression matching plus data deduplication
6)Data export and save
python2.0There is a problem that the Chinese path cannot be directly read,Need to write another function.python3.0在2018can not be read directly.
when using it now,發現python3.0It can directly read the Chinese path.
You need to bring your own or create a fewtxt文件,It is best to write some data in it(姓名,手機號,住址)
Best when writing code,Set some requirements yourself,Clarify the purpose.
1)讀取文件
2)讀取數據
3)數據整理
4)正則表達式匹配
5)數據去重
6)Data export and save
import glob
import re
import xlwt
filearray=[]
data=[]
phone=[]
filelocation=glob.glob(r'Classroom training/*.txt')
print(filelocation)
for i in range(len(filelocation)):
file =open(filelocation[i])
file_data=file.readlines()
data.append(file_data)
print(data)
combine_data=sum(data,[])
print(combine_data)
for a in combine_data:
data1=re.search(r'[0-9]{11}',a)
phone.append(data1[0])
phone=list(set(phone))
print(phone)
print(len(phone))
#存到excel中
f=xlwt.Workbook('encoding=utf-8')
sheet1=f.add_sheet('sheet1',cell_overwrite_ok=True)
for i in range(len(phone)):
sheet1.write(i,0,phone[i])
f.save('phonenumber.xls')
會生成一個excel文件
import glob
import re
import xlwt
globeUsed to locate files,re正則表達式,xlwt用於excel
filelocation=glob.glob(r'Classroom training/*.txt')
指定目錄下的所有txt文件
for i in range(len(filelocation)):
file =open(filelocation[i])
file_data=file.readlines()
data.append(file_data)
print(data)
將路徑下的txt文件循環讀取,Read files sequentially by sequence number Open the file corresponding to each loop 將每一次循環的txtThe data of the file is read line by line 使用append()method to add each row of data todata列表中 輸出一下,Several will be seentxtfile data in the same list in the form of word columns
combine_data=sum(data,[])
Lists are combined into a single list
print(combine_data)
for a in combine_data:
data1=re.search(r'[0-9]{11}',a)
phone.append(data1[0])
phone=list(set(phone))
print(phone)
print(len(phone))
set()函數:無序去重,創建一個無序不重復元素集
#存到excel中
f=xlwt.Workbook('encoding=utf-8')
sheet1=f.add_sheet('sheet1',cell_overwrite_ok=True)
for i in range(len(phone)):
sheet1.write(i,0,phone[i])
f.save('phonenumber.xls')
Workbook('encoding=utf-8'):Set the encoding of the workbook
add_sheet('sheet1',cell_overwrite_ok=True):創建對應的工作表
write(x,y,z):參數對應行、列、值