python The types of files that can be opened are :txt,xlsx,csv,zip,json,xml,html,images,hdf,pdf,docx,mp3,mp4
1.txt file
f=open(filepath,'r') or with open(filepath,'r') or codecs.open() or io.open()
lines=f.read() or for line in f.read() or f.readline() or f.readlines()
Common parameters
1)filepath
2)mode, Yes 'r','w','a','wb','rb'
3)encoding, There are common utf-8
How to write f.write(), Write string
f.writelines(), Parameters can be list write to multiple lines
f.seek()
(1) Options =0, Indicates to point the file pointer from the file header to “ Offset ” Byte
(2) Options =1, Indicates to point the file pointer to the current location of the file , Move backward “ Offset ” byte
(3) Options =2, Indicates to point the file pointer from the end of the file , Move forward “ Offset ” byte
f.flush() Write changes to a file ( No need to close files )
f.tell() Get pointer position
2.xlsx file
pf = pd.read_excel('train.xlsx',sheetname = 'xx')
read_excel Common parameters
1)io, File path
2)sheetname, The default is 0, That first table ,None Indicates that the full table is returned , The format is dict of Dataframe
3)header, The first row is the default column name
4)skiprows, Number of omitted lines
5)skip_footer, Omit the number of lines starting from the tail
6)index_col, Specify a column as the row index
7)names, Specifies the name of the column
to_excel Common parameters
3.csv file
import pandas as pd
pf = pd.read_csv('train.csv')
read_csv Common parameters
1)filepath_or_buffer , It could be a file handle,StringIO object , File path string or URL
2)sep, Separator , If yes ‘,’
3)header, Is the number of the row used as the column name ,header=0 Indicates that the first row is used as the column name ,header=None when , Automatically add column index
4)names, As listed list, Will do reprocessing
5)dtype, Column data type
6)nrows, How many lines to read
7)chunksize, When reading by block , Specify the number of rows in the block
8)index_col, Specify a column as the row index , You can also specify multiple columns , Form a hierarchical index . Default does not specify , Plus from 0 The starting number index .
9)parse_dates=True, The string can be parsed into time format .
import pandas as pd
pf = pd.to_csv('train.csv')
to_csv Common parameters
1)path_or_buf
2)sep
3) columns, Optional column write
4)encoding
4.zip file
import zipfile
archive = zipfile.ZipFile('T.zip', 'r')
df = archive.read('train.csv')
5.json
import pandas as pd
df = pd.read_json('train.json')
Common parameters
1)path_or_buf
2)orient,json String format
https://blog.csdn.net/qq_24499417/article/details/81428594
6. xml
import xml.etree.ElementTree as ET
tree = ET.parse('/home/sunilray/Desktop/2 sigma/train.xml')
7.html
Use BeautifulSoup Library to read HTML file
import urllib2 #if you are using python3+ version, import urllib.request
wiki = "https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India"
page = urllib2.urlopen(wiki) #For python 3 use urllib.request.urlopen(wiki)
from bs4 import BeautifulSoup
#Parse the html in the 'page' variable, and store it in Beautiful Soup format soup = BeautifulSoup(page)
https://www.analyticsvidhya.com/blog/2015/10/beginner-guide-web-scraping-beautiful-soup-python/
8.images
from scipy import misc
f = misc.face()
misc.imsave('face.png', f) # uses the Image module (PIL)
import matplotlib.pyplot as plt
plt.imshow(f)
plt.show()
https://www.analyticsvidhya.com/blog/2014/12/image-processing-python-basics/
9.hdf
import pandas as pd
df = pd.read_hdf('train.h5')
10.pdf
install pdfminer library
python setup.py install
pdf2txt.py train.pdf # Test read pdf
11.docx
install docx2txt library :
pip install docx2txt
Read docx file :
import docx2txt
text = docx2txt.process("file.docx")
12.mp3
http://pymedia.org/tut/index.html
13.mp4
http://zulko.github.io/moviepy/
from moviepy.editor import VideoFileClip
clip = VideoFileClip(‘<video_file>.mp4’)
Citations
https://blog.csdn.net/hellocsz/article/details/79623142
This blog post is very good , There are many examples , Easy to understand
https://blog.csdn.net/sinat_35562946/article/details/81058221
This one is also very good
https://www.cnblogs.com/hackpig/p/8215786.html
speak txt file
https://blog.csdn.net/u010801439/article/details/80033341
https://www.jianshu.com/p/03e3cfd5519e
https://www.analyticsvidhya.com/blog/2017/03/read-commonly-used-formats-using-python/
The kind of graphic editing so
1、cmd Direct installation &nbs