您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Python upgrade path (Lv9) file operation

編輯：Python

Python List of articles

Chapter one Python introduction
Chapter two Python Basic concepts
The third chapter Sequence
Chapter four Control statement
The fifth chapter function
Chapter six Object oriented fundamentals
Chapter vii. Object oriented depth
Chapter viii. Exception mechanism
Chapter nine File operations

File operations

Python List of articles
Preface
One 、 What is file operation
- 1. Document classification
- 2. Common codes
- - ASCII
  - GBK
  - Unicode
  - UTF-8
Two 、 File operations
- 1. Create a file object
- 2. Writing files
- - Write operation of basic file
  - Chinese garbled
  - - When coding
    - Console output
  - write()/writelines() Write data
  - close() Close file stream
  - with sentence ( Context manager )
- 3. File reading
- - Reading and writing binary files
- 4. Common properties and methods of file objects
- - File anywhere operation
3、 ... and 、 File operation expansion module
- 1. pickle Serialization module
- 2. csv Operation of file
- - csv File read
  - csv File is written to
- 3. os and os.path modular
- - os modular - Call the operating system command
  - os modular - File and directory operations
  - os.path modular
  - walk() Recursively traverse all files and directories
  - Recursively traverses all files in the directory
- 4. shutil modular ( Copy and compress )

Preface

In this chapter , It mainly introduces the related to file operation API Methods use
First we will learn what is file manipulation , And the classification of documents IO Introduction to common codes used in operation
Then we learned the process of file operation , establish -> write in -> close
Then we learned the expansion of documents , Serialization module pickle, File operation module csv, System operation call module os and os.path And file copy compression module shutil

One 、 What is file operation

A complete program generally includes data storage and reading ; The program data we wrote above is not actually stored , therefore python After the interpreter executes, the data disappears
In development , We often need external storage media （ Hard disk 、 Compact disc 、U Plate, etc. ） Reading data , Or store the data generated by the program in a file , Realization “ Persistence ” preservation

1. Document classification

According to the data organization form in the file , We divide files into text files and binary files :

text file
The text file stores ordinary “ character ” Text ,python The default is unicode Character set , You can use the Notepad program to open
Binary
Binary files use the data content as “ byte ” For storage , Can't open with Notepad , Special software must be used to decode .
Common are ：MP4 Video file 、MP3 Audio file 、JPG picture 、doc Document, etc.

2. Common codes

When working with text files , Often operate Chinese , At this time, we often encounter the problem of garbled code .
In order to solve the problem of Chinese garbled code , You need to learn the problems before each coding .

The relationship between common codes is as follows :

ASCII

Its full name is American Standard Code for Information Interchange , American standard code for information exchange ,
This is the earliest and most common single byte coding system in the world , It is mainly used to show modern English and other Western European languages

matters needing attention :

ASCII Code 7 Who said , It can only mean 128 Characters . It only defines 2^7=128 Characters , use 7bit You can completely code ,
And a byte 8bit Is the capacity of the 256, So one byte ASCII The highest bit of the code is always 0
ASCll The code table corresponding to the code is as follows : ASCll clock

GBK

GBK That is, the Chinese character internal code extension specification , English full name Chinese Internal Code Specification.
GBK Coding standards are compatible GB2312, Collection of Chinese characters 21003 individual 、 Symbol 883 individual , And provide 1894 A character code , Jane 、 Traditional Chinese characters are integrated into a library .
GBK Double byte representation , The overall coding range is 8140-FEFE, The first byte is in 81-FE Between , The last byte is in 40-FE Between

Unicode

Unicode The coding is designed to fix two bytes , All characters use 16 position 2^16=65536 Express , Including before only 8 Bit English characters, etc , So it's a waste of space
Unicode Completely redesigned , Are not compatible iso8859-1 , Nor is it compatible with any other encoding

UTF-8

For English letters , unicode It also needs two bytes to represent , therefore unicode Not easy to transfer and store .
As a result, there is UTF code , UTF-8 The full name is （ 8-bit UnicodeTransformation Format ）

matters needing attention

UTF Encoding compatibility iso8859-1 code , It can also be used to represent characters in all languages
UTF Encoding is Indefinite length coding , The length of each character is from 1-4 Different bytes .
English letters are represented by a byte , Chinese characters use three bytes
General projects will use UTF-8
The reason why we tend to use UTF-8 , Because its variable length encoding can save memory and be fully compatible with Chinese

Two 、 File operations

1. Create a file object

open() Function to create a file object , The basic syntax is as follows ：open( file name [, Open mode ])

Be careful :

If it's just the file name , Represents the file in the current directory . The file name can be entered in the full path , such as ： D:\a\b.txt
You can use the original string r“d:\b.txt” Reduce \ The input of , So the above code can be rewritten as f = open(r"d:\b.txt","w")
The opening method as an input parameter is as follows ( Often use !!!)
Creation of text file objects and binary file objects
If it's binary mode b , Then you create a binary file object , The basic unit of processing is “ byte ”
If there is no add mode b , Text file objects are created by default , The basic unit of processing is “ character ”

2. Writing files

Writing a text file is generally a three-step process ：

Create a file object
Write data
Close file object

Write operation of basic file

Practical code

# 1. Use open() The way
f = open(r"d:\a.txt", "a")
s = "TimePause\n Time is still \n"
f.write(s)
f.close()

Result display

Chinese garbled

When coding

windows The operating system default code is GBK , Linux The operating system default code is UTF- 8 .
When we use open() when , What is called is operating system related api To open the file , And the default code is GBK
But because we are usually used to setting all code to UTF- 8 ., Therefore, there will be a garbled code problem when opening , As shown in the figure below

Solution :
According to the picture above , Set the text encoding to GBK Read the format

Be careful :
We can also solve the problem of Chinese garbled code by specifying the code . Because we're gonna pycharm Text read / write codes are set to utf-8,
So long as We specify the code as utf-8( Default gbk), Then we won't have garbled code when reading . The following code

Practical code

# 【 Example 】 Solve the problem of Chinese garbled code by specifying the file code 
f = open(r"d:\bb.txt", "w", encoding="utf-8")
f.write(" A small station with warmth \n Time stillness is not a brief history ")
f.close()

Console output

Problem description

We are usually used to pycharm All character encodings are set to utf-8 when . When we make a network request , Sometimes it will return the problem of garbled code , Here's the picture

Problem analysis

Because we are pycharm Set all character codes to UTF-8, however Get... Through network request GBK Formatted text , Then we continue with UTF-8 There will be garbled code when encoding and decoding

Solution

You can set the item code to GBK The format is just ; The obtained data can also be manipulated through text operation codes GBK Format read
Or when writing , Directly declare the code as UTF-8

write()/writelines() Write data

write(a) ： Put the string a Write to a file
writelines(b) ： Write the string list to the file , Do not add line breaks

Practical code

# 【 operation 】 Add string list data to file 
f = open(r"d:\bb.txt", 'w', encoding="utf-8")
s = [" What the hell? \n"] * 3 # adopt \n Realize manual line feed 
f.writelines(s)
f.close()

close() Close file stream

Because the bottom of the file is controlled by the operating system , So the file object we open must explicitly call close() Method to close the file object .
When calling close() When the method is used , First, the buffer data will be written to the file ( It can also be called directly flush() Method ), Close the file again , Release file object

Be careful :

close() Generally combined with the exception mechanism finally Use it together
It's fine too adopt with Keyword implementation can close open file objects in any case ( recommend )

Practical code

# 【 operation 】 Combined with the exception mechanism finally , Make sure to close the file object 
# "a" Set the opening mode to append mode 
try:
f = open(r"d:\c.txt", "a")
s = " From The Abyss "
f.write(s)
except BaseException as e:
print(e)
finally:
f.close()

with sentence ( Context manager )

with keyword （ Context manager ） Context resources can be managed automatically , Jump out for whatever reason with block , Can ensure that the file is closed correctly ,
And it can automatically restore the scene when entering the code block after the code block is executed

Practical code

# 【 operation 】 Use with Manage file write operations 
s = [" Zigfei "] * 3
with open(r"d:\cc.txt", "w") as f:
f.writelines(s)

3. File reading

File reading steps :

Open the text file object
Write data

The following three methods are generally used to read files ：

read([size]): Read from file size Characters , And return as a result
without size Parameters , Read the entire file . Read to the end of the file , Will return an empty string
readline(): Read a line and return as a result
Read to the end of the file , Will return an empty string
readlines() : In the text file , Each line is stored in the list as a string , Return to the list

Code format

with open(r"d:\a.txt", "r"[, encoding="utf-8"]) as f:
f.read(4)

Be careful :

When the file is read , It is necessary to pay attention to the consistency of character coding when reading and writing , If no code is specified when writing ( Default GBK), It is not necessary to specify the code when reading
But if the code is not specified when reading , Specify when writing , May be an error .
for example : Specify when writing encoding="utf-8",
The console will report UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbc in position 13: invalid start byte

Practical code

# 【 operation 】 Before reading a file 4 Characters 
import pickle
with open(r"d:\a.txt", "r") as f:
print(f.read(4))
# 【 operation 】 Smaller files , Read the contents of the file into the program at one time 
with open(r"d:\aa.txt", "r") as f:
print(f.read())
# 【 operation 】 Read a file by line 
with open(r"d:\b.txt") as f:
while True:
lines = f.readline()
if not lines: # stay python in if not The following objects will be implicitly converted to True perhaps False Judge , Therefore, an empty string is returned False
break
else:
print(lines, end="")
print()
# 【 operation 】 Using Iterators （ Return one line at a time ） Read text file 
# Write and read codes should correspond 
with open(r"d:\bb.txt", "r", encoding="utf-8") as f:
for a in f:
print(a, end="")
# 【 operation 】 Add a line number to the end of each line in the text file 
with open(r"d:\c.txt", "r") as f:
lines = f.readlines()
lines2 = [line.rstrip() + " # " + str(index) + "\n" for index, line in zip(range(1, len(lines) + 1), lines)]
with open(r"d:\c.txt", "w") as ff:
ff.writelines(lines2)

Reading and writing binary files

The processing flow of binary file is consistent with that of text file . First, create a file object ,
After creating the binary file object , Still usable write() 、 read() Read and write files

When creating a file object , First, you need to specify the binary schema , Then the binary file object can be created . for example

f = open(r"d:\a.txt", 'wb') Writable 、 Rewrite the binary object of the schema
f = open(r"d:\a.txt", 'ab') Writable 、 Append mode binary object
f = open(r"d:\a.txt", 'rb') Readable binary object

Practical code

# Reading and writing binary files ( This operation is equivalent to copying )
# f = open(r"d:\a.txt", 'wb') # Writable 、 Rewrite the binary object of the schema 
# f = open(r"d:\a.txt", 'ab') # Writable 、 Append mode binary object 
# f = open(r"d:\a.txt", 'rb') # Readable binary object 
with open(r"d:\aaa.png", "rb") as scrFile, open(r"d:\bbb.png", "wb") as destFile:
for l in scrFile:
destFile.write(l)

4. Common properties and methods of file objects

Properties of the file object

Open mode of file object

Common methods of file objects

File anywhere operation

utilize seek() You can move the pointer to read the file to the specified byte position
A Chinese character stands for two bytes , English only takes up one byte

Practical code

print("================= File anywhere operation ======================")
# 【 Example 】 seek() Example of moving a file pointer 
with open(r"d:\cc.txt", "r") as f:
print(" The file name is {0}".format(f.name)) # The file name is d:\cc.txt
print(f.tell()) # 0
print(" Read the contents of the file ", str(f.readline())) # Read the contents of the file Ziegfei ziegfei ziegfei 
print(f.tell()) # 18
f.seek(4, 0) # Chinese accounts for 2 Bytes , So in seek It needs to be 2 Multiple 
print(" What the file reads ", str(f.readline())) # What the file reads Fly zig zag fly zig zag fly 
print(f.tell()) # 18

3、 ... and 、 File operation expansion module

1. pickle Serialization module

serialize refer to ： Convert an object to “ Serialization ” Data form , Store on hard disk or transfer to other places through network .
Deserialization It means the opposite process , Will read to “ Serialized data ” Into objects
have access to pickle Functions in modules , Implement serialization and deserialization operations

Serialization we use ：

pickle.dump(obj, file) obj Is the object to be serialized , file Refers to the stored files
pickle.load(file) from file Reading data , Anti serialization into objects

Practical code

import pickle
print("================= Use pickle serialize =======================")
# 【 operation 】 Serialize the object into a file 
with open("student.info", "wb") as f:
name = " Time is still "
age = 18
score = [90, 80, 70]
resume = {
"name": name, "age": age, "score": score}
pickle.dump(resume, f)
# 【 operation 】 Deserialize the obtained data into objects 
with open("student.info", "rb") as f:
resume = pickle.load(f)
print(resume)

2. csv Operation of file

csv Is the comma separator text format , Commonly used for data exchange 、Excel Import and export of file and database data

And Excel Different documents ,CSV In file ：

There is no type of value , All values are strings
Styles such as font color cannot be specified
You cannot specify the width and height of a cell , Can't merge cells
There are no multiple worksheets
You can't embed an image chart

Python Modules of the standard library csv Provides read and write csv The object of the format file

We are excel Create a simple table in and save as csv( Comma separated ) , Let's open it and look at this csv The contents of the document

csv File read

Practical code

import csv
with open(r"d:\workBook.csv") as a:
o_csv = csv.reader(a) # # establish csv object , It's a list of all the data , Each behavior is an element 
headers = next(o_csv) # # Get list objects , Contains information about the title line 
print(headers)
for row in o_csv: # Cycle through the lines 
print(row)

Result display

csv File is written to

Practical code

# 【 operation 】 csv.writer Object write a csv file 
headers = [' full name ', ' Age ', ' Work ', ' address ']
rows = [('JOJO', '18', ' massagist ', ' The British '), (' dior ', '19', ' Boss ', ' Egypt '), (' Joruno chobana ', '20', ' Gangster ', ' YIDELI ')]
with open(r"d:\workBook3.csv", "w") as b:
b_scv = csv.writer(b) # establish csv object 
b_scv.writerow(headers) # Write to a row （ title ）
b_scv.writerows(rows) # Write multiple rows （ data ）

Result display

3. os and os.path modular

os modular It can help us operate the operating system directly .
We can directly call the executable of the operating system 、 command , Direct operation of documents 、 Catalogue, etc
os modular It is a very important foundation for system operation and maintenance

os modular - Call the operating system command

Practical code

# 【 Example 】 os.system call windows System Notepad program
os.system("notepad.exe")
# 【 Example 】 os.system call windows In the system ping command
# If there's a mess , Please have a look at File operations -> Writing files -> Chinese garbled -> Console output To configure
os.system("ping www.baidu.com")
# 【 Example 】 Run the installed wechat
os.startfile(r"C:\Program Files (x86)\Tencent\WeChat\WeChat.exe")

os modular - File and directory operations

You can read and write the file content through the file object mentioned above .
If you still You need to do other operations on files and directories , have access to os and os.path modular .

os Common methods of operating files under the module
os Related methods of directory operation under module

Practical code

import os
# 【 Example 】 os modular ： establish 、 Delete directory 、 Get file information, etc 
print(" System name :", os.name) # windows-->nt linux-->posix
print(" The path separator used by the current operating system :", os.sep) # windows-->\ linux-->/
print(" Line separator :", repr(os.linesep)) # windows-->\r\n linux-->\n
print(" Current directory :", os.curdir)
a = "3"
print(a)
# Returns the canonical string representation of the object 
print(repr(a))
# Get information about files and folders 
print(os.stat("MyPy08-FileRead.py"))
# Operation of working directory 
print(os.getcwd()) # Get the current working directory 
os.chdir("D:") # Switch current working directory 
os.mkdir(" Complete learning materials ") # Create directory 
os.rmdir(" Complete learning materials ") # Delete directory 
# os.makedirs(" race / The yellow race / Chinese ") # Create multi-level directory , After a successful call , Calling again will report an error 
# os.rename(" race ", " Asian ") # This method can only be called once 
print(os.listdir(" Asian ")) # Subdirectories under the current directory

matters needing attention
Calling os.rename() when , If an error is reported PermissionError: [WinError 5] Access denied ,
You need to You can configure the user's permissions on the folder to be renamed . After modification, you can rename . As shown in the figure below

os.path modular

os.path The module provides directory related information （ Path judgment 、 Path segmentation 、 Path connection 、 Folder traversal ） The operation of

Practical code

# 【 Example 】 test os.path Common methods in 
print(" Is it an absolute path :", os.path.isabs("d:/a.txt"))
print(" Is it a directory : ", os.path.isdir(r"d:\a.txt"))
print(" Does the file exist : ", os.path.exists("a.txt"))
print(" file size : ", os.path.getsize("a.txt"))
print(" Output absolute path :", os.path.abspath("a.txt"))
print(" Output directory :", os.path.dirname("d:/a.txt"))
# Get the creation time 、 Access time 、 Last modified 
print(" Output creation time :", os.path.getctime("a.txt"))
print(" Output last access time :", os.path.getatime("a.txt"))
print(" Output last modification time ", os.path.getmtime("a.txt"))
# Split the path 、 Connection operation 
path = os.path.abspath("a.txt") # Return to absolute path 
print(" Return a tuple ： Catalog 、 file :", os.path.split(path))
print(" Return a tuple ： route 、 Extension ", os.path.splitext(path))
print(" Return path ：aa\bb\cc", os.path.join("aa", "bb", "cc"))

List all... In the specified directory .py file , And output the file name

# List all... In the specified directory .py file , And output the file name 
import os
path = os.getcwd()
file_list = os.listdir(path)
for filename in file_list:
pos = filename.rfind(".")
if filename[pos + 1:] == "py":
print(filename, end="\t")
print()

walk() Recursively traverse all files and directories

os.walk() Method is a simple and easy to use file 、 Directory traverser , It can help us process files efficiently 、 About the catalogue

The format is as follows ：os.walk(top[, topdown=True[, onerror=None[, followlinks=False]]])

top : Is the directory to traverse . topdown ： Optional , True , First traversal top Directory and then traverse subdirectories
Return triples （ root 、 dirs 、 files ）：
root ： Currently traversing the folder itself
dirs ： A list of , Names of all directories in this folder
files ： A list of , Names of all files in this folder

Practical code

# 【 Example 】 Use walk() Recursively traverse all files and directories 
path = os.getcwd()[:os.getcwd().rfind("\\")] # Get the superior path , The function is to output more files 
file_list = os.walk(path, topdown=False)
for root, dirs, files in file_list:
for name in files:
print(os.path.join(root, name))
for name in dirs:
print(os.path.join(root, name)) # For splicing directories

Output results

Recursively traverses all files in the directory

Practical code

# 【 Example 】 Use recursive algorithm to traverse all files in the directory 
def my_print_file(path, level):
child_files = os.listdir(path)
for file in child_files:
file_path = os.path.join(path, file)
print("\t" * level + file_path[file_path.rfind(os.sep)+1:])
if os.path.isdir(file_path):
my_print_file(file_path, level + 1)
my_print_file(path, 0)

4. shutil modular ( Copy and compress )

shutil The module is python Provided in the standard library , It is mainly used for copying files and folders 、 Move 、 Delete etc. ;
You can also compress files and folders 、 Decompression operation . os Module provides general operations on directories or files .
shutil Module as a supplement , Provides mobile 、 Copy 、 Compress 、 Decompression and other operations , these os None of the modules provide

Practical code - Copy

import shutil
# 【 Example 】 Copy files 
os.chdir("D:") # Switch current working directory 
shutil.copyfile("a.txt", "a_copy.txt")
# 【 Example 】 Copy folder contents recursively ( Use shutil modular )
shutil.copytree(" Asian / The yellow race ", " race ", ignore=shutil.ignore_patterns("*.html", "*htm")) # " music " Folder does not exist to use

Practical code - Compress and decompress

# 【 Example 】 Compress all contents of the folder ( Use shutil modular )
# take " Asian / The yellow race " All contents in the folder are compressed to " Biological data " Generate under folder race.zip
shutil.make_archive(" Biological data /race", "zip", " Asian / The yellow race ")
# Compress : Compress the specified multiple files into one zip file 
z = zipfile.ZipFile("a.zip", "w")
z.write("a.txt")
z.write("b.txt")
z.close()
# 【 Example 】 Decompress the compressed package to the specified folder ( Use shutil modular )
z2 = zipfile.ZipFile("a.zip", "r")
z2.extractall("d:/ Biological data ")
z2.close()