您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Python crawler - data processing and storage

編輯：Python

Python Reptiles - Data processing and storage

Data processing

have access to pandas Module to realize data processing ,pandas Is an open source and through BSD Licensed Library . It is mainly Python Language provides high performance 、 Easy to use data structures and data analysis tools , In addition, it also provides a variety of data operation and data processing methods . because pandas It is a third-party module, so you need to install and import this module before using it .

pandas data structure

pandas There are two cores in the data structure of , Namely Series And DataFrame. among Series It's a one-dimensional array , It is associated with Python Basic data structure List Close .Series You can save data of multiple data types , Such as Boolean value 、 character string 、 Number type, etc .DataFrame Is a tabular data structure , Be similar to Excel form , It is a two-dimensional tabular data structure .

1.Series object

§ establish Series object

Creating Series Object time , Just pass the data in the form of array into Series() Constructor . The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/13/22 7:30 PM
# file ：Series object .py
# IDE ：PyCharm
# Import pandas modular 
import pandas as pd
# Create data array 
data = ['A', 'B', 'C']
# establish Series object 
series = pd.Series(data)
# Output Series Object content 
print(series)

The program runs as follows ：

0 A
1 B
2 C
dtype: object
Process finished with exit code 0

say bright

In the running results of the above program , The number column on the left is the index column , The right letter column is the element corresponding to the index .Series Object when no index is specified , Generate the default build from 0 Start incrementing the index values in turn .

§ Creating Series Object time , You can specify the index name , For example, specify that the index entry is a、b、c. The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/13/22 7:48 PM
# file ： Specify the Series object .py
# IDE ：PyCharm
# Import pandas modular 
import pandas as pd
# Create data array 
data = ['A', 'B', 'C']
# Create an array of Index Names 
index = ['a', 'b', 'c']
# Creates the specified index Series object 
series = pd.Series(data, index=index )
# Output the... Of the specified index Series Object content 
print(series)

The program runs as follows ：

a A
b B
c C
dtype: object
Process finished with exit code 0

§ Access data

During a visit to Series Object , You can access the index array or the element array separately . The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/13/22 7:52 PM
# file ： Access data .py
# IDE ：PyCharm
# Import pandas modular 
import pandas as pd
# Create data array 
data = ['A', 'B', 'C']
# Create an array of Index Names 
index = ['a', 'b', 'c']
# Creates the specified index Series object 
series = pd.Series(data, index=index )
# Output index array 
print(' The index array is ：', series.index)
# Print element array 
print(' The element array is ：', series.values)

The result of program running is ：

 The index array is ： Index(['a', 'b', 'c'], dtype='object')
The element array is ： ['A' 'B' 'C']
Process finished with exit code 0

When you need to get the array element of the specified subscript , You can go directly through “Series object [ Subscript ]“ To get array elements , Array index from 0 Start . The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/13/22 7:57 PM
# file ： Gets the array element of the specified subscript .py
# IDE ：PyCharm
import pandas as pd
# Create data array 
data = ['A', 'B', 'C']
# Create an array of Index Names 
index = ['a', 'b', 'c']
# Creates the specified index Series object 
series = pd.Series(data, index=index )
# Output the array element with the specified index 
print(' Specifies that the array element of the subscript is ：', series[1])
# Output the array elements of the specified index 
print(' The array element of the specified index is ：', series['a'])

The program runs as follows ：

 Specifies that the array element of the subscript is ： B
The array element of the specified index is ： A
Process finished with exit code 0

When it is necessary to obtain the corresponding Series Object time , The subscript range can be specified , The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/13/22 8:03 PM
# file ： Specify the subscript range to get data .py
# IDE ：PyCharm
import pandas as pd
# Create data array 
data = ['A', 'B', 'C']
# Create an array of Index Names 
index = ['a', 'b', 'c']
# Creates the specified index Series object 
series = pd.Series(data, index=index)
# The output subscript is 0,1,2 Corresponding Series object 
print(' Get the corresponding of multiple subscripts Series object ：')
print(series[0:3])

The program runs as follows ：

 Get the corresponding of multiple subscripts Series object ：
a A
b B
c C
dtype: object
Process finished with exit code

Except by specifying the subscript range Series Beyond the object , It can also be obtained by specifying multiple indexes Series object . The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/13/22 8:13 PM
# file ： Specify multiple index methods to obtain Series object .py
# IDE ：PyCharm
import pandas as pd
# Create data array 
data = ['A', 'B', 'C']
# Create an array of Index Names 
index = ['a', 'b', 'c']
# Creates the specified index Series object 
series = pd.Series(data, index=index)
# Output index a、b Corresponding Series object 
print(' Get the corresponding of multiple indexes Series object ：')
print(series[['a', 'b']])

The program runs as follows ：

 Get the corresponding of multiple indexes Series object ：
a A
b B
dtype: object
Process finished with exit code 0

§ Modify element values

modify Series Object , It can also be achieved by specifying a subscript or an index . The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/13/22 8:19 PM
# file ： modify Series The element value of the object .py
# IDE ：PyCharm
# Import pandas modular 
import pandas as pd
# Create data array 
data = ['A', 'B', 'C']
# Create an array of Index Names 
index = ['a', 'b', 'c']
# Creates the specified index Series object 
series = pd.Series(data, index=index )
# Change the subscript to 0 The element value of 
series[0] = 'D'
print(' Change the subscript to 0 The element value of ：\n')
print(series)
# Change the index to b Element value of 
series['b'] = 'A'
print(' Change the index to b The element value of ：')
# Print after modifying the element value Series object 
print(series)

The running result of the program is ：

 Change the subscript to 0 The element value of ：
a D
b B
c C
dtype: object
Change the index to b The element value of ：
a D
b A
c C
dtype: object
Process finished with exit code 0

2. DataFrame object

Creating DataFrame Object time , It needs to be implemented through a dictionary . The name of each column is the key , Each key corresponds to an array , This array as a value . The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/13/22 10:09 PM
# file ：DataFrame object .py
# IDE ：PyCharm
# Import pandas modular 
import pandas as pd
data = {
'A':[1, 2, 3, 4, 5],
'B':[6, 7, 8, 9, 10],
'C':[11, 12, 13, 14, 15]}
# establish DataFrame object 
data_frame = pd.DataFrame(data)
# Output DataFrame Object content 
print(data_frame)

The program runs as follows ：

 A B C
0 1 6 11
1 2 7 12
2 3 8 13
3 4 9 14
4 5 10 15
Process finished with exit code 0

say bright

We can see from the above running results , A separate number on the left （0-4） Index column , When no specific index is specified ,DataFrame The default index of the object will be from 0 Began to increase . On the right side A、B、C The column name is key , The value corresponding to the column name is array .

DataFrame Object can also specify the index name separately , Specify in the same way as Series Objects are similar , The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/13/22 10:29 PM
# file ： Specify index name separately .py
# IDE ：PyCharm
import pandas as pd
data = {
'A':[1, 2, 3, 4, 5],
'B':[6, 7, 8, 9, 10],
'C':[11, 12, 13, 14, 15]}
# Custom index 
index = ['a', 'b', 'c', 'd', 'e']
# Create a custom index DataFrame object 
data_frame = pd.DataFrame(data, index = index)
# Output DataFrame Object content 
print(data_frame)

The program runs as follows ：

A B C
a 1 6 11
b 2 7 12
c 3 8 13
d 4 9 14
e 5 10 15
Process finished with exit code 0

If the data contains unnecessary data columns , You can create DataFrame Object to specify the required data column name . The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/13/22 10:33 PM
# file ： Specify the required data column name .py
# IDE ：PyCharm
import pandas as pd
data = {
'A':[1, 2, 3, 4, 5],
'B':[6, 7, 8, 9, 10],
'C':[11, 12, 13, 14, 15]}
# Create a for the specified column name DataFrame object 
data_frame = pd.DataFrame(data, columns=['B', 'C'])
# Output DataFrame Object content 
print(data_frame)

The program runs as follows ：

 B C
0 6 11
1 7 12
2 8 13
3 9 14
4 10 15
Process finished with exit code 0

The increase of data 、 Delete 、 Change 、 check

1. Add data

When it comes to DataFrame Object to add a column of data , You can create column names first , Then assign data to it . The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/13/22 10:39 PM
# file ： Add data .py
# IDE ：PyCharm
# Import pandas modular 
import pandas as pd
data = {
'A':[1, 2, 3, 4, 5],
'B':[6, 7, 8, 9, 10],
'C':[11, 12, 13, 14, 15]}
# establish DataFrame object 
data_frame = pd.DataFrame(data)
# increase D Column data 
data_frame['D'] = [10, 20, 30, 40, 50]
# Output DataFrame Object content 
print(data_frame)

The program runs as follows ：

 A B C D
0 1 6 11 10
1 2 7 12 20
2 3 8 13 30
3 4 9 14 40
4 5 10 15 50
Process finished with exit code 0

2. Delete data

pandas Module provides drop() function , Used to delete DataFrame A row or column of data in an object , The function provides multiple parameters , The commonly used parameters and their meanings are shown in the following table ：

drop() Function common parameters and their meanings

ginseng Count name contain The righteous labels The name of the row or column to be deleted , receive string or arrayaxis The default is 0, Indicates to delete the row , When axis = 1 To delete a column index Specify the row to delete columns Specify the columns to delete inplace Set to False Indicates that the original data will not be changed , Returns a new... After deletion DataFrame object ; Set to True The original data will be deleted

Implementation delete DataFrame Object the row data of the specified column and index in the original data . The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/13/22 10:48 PM
# file ： Delete data .py
# IDE ：PyCharm
# Import pandas modular 
import pandas as pd
data = {
'A':[1, 2, 3, 4, 5],
'B':[6, 7, 8, 9, 10],
'C':[11, 12, 13, 14, 15]}
# establish DataFrame object 
data_frame = pd.DataFrame(data)
# Delete the index in the original data 0 That line of data 
data_frame.drop([0], inplace=True)
# Delete the column named in the original data A That column of data 
data_frame.drop(labels='A', axis=1, inplace=True)
# Output DataFrame Object content 
print(data_frame)

The program runs as follows ：

 B C
1 7 12
2 8 13
3 9 14
4 10 15
Process finished with exit code 0

repair charge

Delete in the implementation DataFrame Object to specify the data of the column name , It can also be done through del Keyword to implement , For example, delete the column named in the original data A The data of , It can be used del data_frame[‘A’] Code

drop() Function can delete the specified column or row data , You can also specify the range of the row index , Realize the function of deleting multiple rows of data . The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/13/22 11:01 PM
# file ： Delete multiline data .py
# IDE ：PyCharm
# Import pandas modular 
import pandas as pd
data = {
'A':[1, 2, 3, 4, 5],
'B':[6, 7, 8, 9, 10],
'C':[11, 12, 13, 14, 15]}
# establish DataFrame object 
data_frame = pd.DataFrame(data)
# Delete the row index from the original data 0 to 2 The first three rows of data 
data_frame.drop(labels=range(0, 3), axis=0, inplace=True)
# Output DataFrame Object content 
print(data_frame)

The program runs as follows ：

 A B C
3 4 9 14
4 5 10 15
Process finished with exit code 0

3. Modifying data

When you need to modify DataFrame Object in a column , You can modify elements by assigning values . The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/13/22 11:12 PM
# file ： Modifying data .py
# IDE ：PyCharm
# Import pandas modular 
import pandas as pd
data = {
'A':[1, 2, 3, 4, 5],
'B':[6, 7, 8, 9, 10],
'C':[11, 12, 13, 14, 15]}
# establish DataFrame object 
data_frame = pd.DataFrame(data)
# take A The data in the third row of the column is modified to 10
data_frame['A'][2] = 10
# Output DataFrame Object content 
print(data_frame)

The program runs as follows ：

 A B C
0 1 6 11
1 2 7 12
2 10 8 13
3 4 9 14
4 5 10 15
Process finished with exit code 0

In the revision DataFrame When all the data of a column in the object , You need to know the number of elements contained in the element array corresponding to the current modified column name , Then modify the corresponding elements according to the number of original elements . The code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/13/22 11:21 PM
# file ： Modify all data of a column .py
# IDE ：PyCharm
# Import pandas modular 
import pandas as pd
data = {
'A':[1, 2, 3, 4, 5],
'B':[6, 7, 8, 9, 10],
'C':[11, 12, 13, 14, 15]}
# establish DataFrame object 
data_frame = pd.DataFrame(data)
# take B All data in the column is modified 
data_frame['B'] = [5, 4, 3, 2, 1]
# Output DataFrame Object content 
print(data_frame)

The program runs as follows ：

 A B C
0 1 5 11
1 2 4 12
2 3 3 13
3 4 2 14
4 5 1 15
Process finished with exit code 0

notes It means

When modifying B All data in the column and the number of modified elements does not match the original number of elements , Will be an error .

say bright

When a column is assigned a single element , for example ,data_frame[‘B’] = 1, here B The data corresponding to the column will be modified to 1

data_frame['B'] = 1

give the result as follows ：

 A B C
0 1 1 11
1 2 1 12
2 3 1 13
3 4 1 14
4 5 1 15
Process finished with exit code 0

4. Query data

In obtaining DataFrame Object , You can obtain the data of the specified column by directly specifying the column name or directly calling the column name attribute . The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/13/22 11:34 PM
# file ： Query data .py
# IDE ：PyCharm
# Import pandas modular 
import pandas as pd
data = {
'A':[1, 2, 3, 4, 5],
'B':[6, 7, 8, 9, 10],
'C':[11, 12, 13, 14, 15]}
# establish DataFrame object 
data_frame = pd.DataFrame(data)
print(' The data of the specified column name is ：\n', data_frame['A'])
print(' The data of the specified column name attribute is ：\n', data_frame.B)

The program runs as follows ：

 The data of the specified column name is ：
0 1
1 2
2 3
3 4
4 5
Name: A, dtype: int64
The data of the specified column name attribute is ：
0 6
1 7
2 8
3 9
4 10
Name: B, dtype: int64
Process finished with exit code 0

In obtaining DataFrame Object from the 1 To the first 3 Row range data , Data can be obtained by specifying the index range , Line index from 0 Start , Row index 0 The corresponding is DataFrame Object 1 Row data . The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/13/22 11:39 PM
# file ： Get the data of the specified row index range .py
# IDE ：PyCharm
# Import pandas modular 
import pandas as pd
data = {
'A':[1, 2, 3, 4, 5],
'B':[6, 7, 8, 9, 10],
'C':[11, 12, 13, 14, 15]}
# establish DataFrame object 
data_frame = pd.DataFrame(data)
print(' Specify the data of the row index range ：\n', data_frame[0:3])

The program runs as follows ：

 Specify the data of the row index range ：
A B C
0 1 6 11
1 2 7 12
2 3 8 13
Process finished with exit code 0

say bright

In the sample code to get the specified row index range ,0 Index the starting row ,3 Is the end row index position , Therefore, the content obtained this time does not contain a row index of 3 The data of .

In obtaining DataFrame Object in a column , You can specify column names in turn 、 Row index is used to obtain data . The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/13/22 11:44 PM
# file ： Get an element of a column .py
# IDE ：PyCharm
# Import pandas modular 
import pandas as pd
data = {
'A':[1, 2, 3, 4, 5],
'B':[6, 7, 8, 9, 10],
'C':[11, 12, 13, 14, 15]}
# establish DataFrame object 
data_frame = pd.DataFrame(data)
print(' Get a data in the specified column ：\n', data_frame['B'][2])

The program runs as follows ：

 Get a data in the specified column ：
8
Process finished with exit code 0

NaN Data processing

1. Modify element to NaN

NaN The data is in numpy The module represents the vacancy data , Therefore, in data analysis, it is occasionally necessary to modify an element in the data structure to NaN value , All you need to do is call numpy.NaN, The purpose of modifying an element can be achieved by assigning a value to the element to be modified . The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/14/22 8:14 AM
# file ： Modify element to NaN.py
# IDE ：PyCharm
# Import pandas modular 
import pandas as pd, numpy
data = {
'A':[1, 2, 3, 4, 5],
'B':[6, 7, 8, 9, 10],
'C':[11, 12, 13, 14, 15]}
# establish DataFrame object 
data_frame = pd.DataFrame(data)
# Name the column in the data as A, The row index is 0 The element of is changed to NaN
data_frame['A'][0] = numpy.nan
# Output DataFrame Object content 
print(data_frame)

The program runs as follows ：

 A B C
0 NaN 6 11
1 2.0 7 12
2 3.0 8 13
3 4.0 9 14
4 5.0 10 15
Process finished with exit code 0

2. Statistics NaN data

pandas The module provides two methods to quickly identify the missing values ,isnull() Method is used to determine whether a value is a null value , If it is an empty value, it will return True.notnull() Method is used to identify non - null values , This method will return... When it detects data that is not a null value True. Through these two methods and statistical functions sum() Method to obtain the specific number of empty values and non empty values in the data . The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/14/22 8:32 AM
# file ： Statistics NaN data .py
# IDE ：PyCharm
# Import pandas modular 
import pandas as pd, numpy
data = {
'A':[1, 2, 3, 4, 5],
'B':[6, 7, 8, 9, 10],
'C':[11, 12, 13, 14, 15]}
# establish DataFrame object 
data_frame = pd.DataFrame(data)
# Name the column in the data as A, The row index is 0 The element of is changed to NaN
data_frame['A'][0] = numpy.nan
# Print the number of empty values in the data 
print(' The number of blank values in each column is ：\n', data_frame.isnull().sum())
# Print the number of non vacancy values in the data 
print(' The number of non vacancy values in each column is ：\n', data_frame.notnull().sum())

The program runs as follows ：

 The number of blank values in each column is ：
A 1
B 0
C 0
dtype: int64
The number of non vacancy values in each column is ：
A 4
B 5
C 5
dtype: int64
Process finished with exit code 0

3. Screening NaN Elements

In the realization of NaN When filtering elements , have access to dropna() function , for example , Will include NaN Delete the entire row of data where the element is located . The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/14/22 8:38 AM
# file ： Screening NaN Elements .py
# IDE ：PyCharm
# Import pandas modular 
import pandas as pd, numpy
data = {
'A':[1, 2, 3, 4, 5],
'B':[6, 7, 8, 9, 10],
'C':[11, 12, 13, 14, 15]}
# establish DataFrame object 
data_frame = pd.DataFrame(data)
# Name the column in the data as A, The row index is 0 The element of is changed to NaN
data_frame['A'][0] = numpy.nan
# Will include NaN The entire row of data of the element is deleted 
data_frame.dropna(axis=0, inplace=True)
# Output DataFrame Object content 
print(data_frame)

The program runs as follows ：

 A B C
1 2.0 7 12
2 3.0 8 13
3 4.0 9 14
4 5.0 10 15
Process finished with exit code 0

say bright

If you need to include... In the data NaN Delete the entire column of data where the element is located , Can be axis Parameter set to 1

dropna() The function provides a how Parameters , If the parameter is set to all, dropna() Function will delete a row or column. All elements are NaN Value . The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/14/22 8:49 AM
# file ：dropna() function how Parameter application .py
# IDE ：PyCharm
#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/14/22 8:14 AM
# file ： Modify element to NaN.py
# IDE ：PyCharm
# Import pandas modular 
import pandas as pd, numpy
data = {
'A':[1, 2, 3, 4, 5],
'B':[6, 7, 8, 9, 10],
'C':[11, 12, 13, 14, 15]}
# establish DataFrame object 
data_frame = pd.DataFrame(data)
# Name the column in the data as A, The row index is 0 The element of is changed to NaN
data_frame['A'][0] = numpy.nan
# Name the column in the data as A, The row index is 1 The element of is changed to NaN
data_frame['A'][1] = numpy.nan
# Name the column in the data as A, The row index is 2 The element of is changed to NaN
data_frame['A'][2] = numpy.nan
# Name the column in the data as A, The row index is 3 The element of is changed to NaN
data_frame['A'][3] = numpy.nan
# Name the column in the data as A, The row index is 4 The element of is changed to NaN
data_frame['A'][4] = numpy.nan
# Delete include NaN The entire row of data corresponding to the element 
data_frame.dropna(how='all', axis=1,inplace=True)
# Output DataFrame Object content 
print(data_frame)

The program runs as follows ：

 B C
0 6 11
1 7 12
2 8 13
3 9 14
4 10 15
Process finished with exit code 0

say bright

because axis The default value of 0, So only the row data is deleted , And all elements are NaN Is a column , So we're specifying how Parameter, you also need to specify the deletion target column axis=1

4. NaN Element replacement

When processing... In data NaN Element time , In order to avoid deleting important reference data in the data . have access to fillna() Function to put data in NaN Element with the same element , In this way, during data analysis , It is clear which elements are useless （ That is to say NaN Elements ）. The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/14/22 9:53 AM
# file ：NaN Element replacement .py
# IDE ：PyCharm
# Import pandas modular 
import pandas as pd
data = {
'A':[1, None, 3, 4, 5],
'B':[6, 7, 8, None, 10],
'C':[11, 12, None, 14, None]}
# establish DataFrame object 
data_frame = pd.DataFrame(data)
# Output the data before modification 
print(' Data before modification ：\n', data_frame)
# All... In the data NaN The element is modified to 0
data_frame.fillna(0, inplace=True)
# Output the modified data 
print(' Modify descendant data ：\n', data_frame)

The program runs as follows ：

 Data before modification ：
A B C
0 1.0 6.0 11.0
1 NaN 7.0 12.0
2 3.0 8.0 NaN
3 4.0 NaN 14.0
4 5.0 10.0 NaN
Modify descendant data ：
A B C
0 1.0 6.0 11.0
1 0.0 7.0 12.0
2 3.0 8.0 0.0
3 4.0 0.0 14.0
4 5.0 10.0 0.0
Process finished with exit code 0

Remove duplicate data

pandas Module provided drop_duplicates() Method is used to remove duplicate data in a specified column . The syntax is as follows ：

pandas.dataFrame.drop_duplicates(subset=None, keep='first', inplace=False)

drop_duplicates() The common parameters and meanings of the method are shown in the following table ：

drop_duplicates() Common parameters and meanings in methods

ginseng Count name contain The righteous subset Specifies the column name to be de duplicated , It can also be a list of multiple column names . Default None, Represents all columns keep Indicates which piece of data to save duplicate data ,first Indicates that the first item is reserved 、last Means to keep the last item 、False Indicates that duplicate data is not left . Default firstinplace Indicates whether to operate in the original data , The default is False

When specifying to remove duplicate data in a column , Need to be in subset The parameter position specifies the column name . The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/14/22 11:00 AM
# file ：drop_duplicate() Method to specify duplicate data in a column .py
# IDE ：PyCharm
# Import pandas modular 
import pandas as pd
# Create data 
data = {
'A':['A1','A1','A3'],
'B':['B1','B2','B1']}
# establish DataFrame object 
data_frame = pd.DataFrame(data)
# Specify the column name as A
data_frame.drop_duplicates('A', inplace=True)
# Print the removed data 
print(data_frame)

The program runs as follows ：

 A B
0 A1 B1
2 A3 B1
Process finished with exit code 0

notes It means

Removing DataFrame Object , The entire row of data corresponding to duplicate data in the specified column will be deleted .

say bright

drop_duplicates() Method except delete DataFrame Object , You can also be right about DataFrame Object to delete duplicate data . for example , Delete DataFrame In the object A Duplicate data in column , You can use this line of code ：new_data = data_frame[‘A’].drop_duplicates()

drop_duplicates() Methods can not only implement DataFrame Object to repeat a single column , You can also implement multiple column de duplication operations . The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/14/22 12:00 PM
# file ：drop_duplicates() Method realization DataFrame Object multi column de duplication operation .py
# IDE ：PyCharm
# Import pandas modular 
import pandas as pd
# Create data 
data = {
'A':['A1', 'A1', 'A1', 'A2', 'A2'],
'B':['B1', 'B1', 'B3', 'B4', 'B5'],
'C':['C1', 'C2', 'C3', 'C4', 'C5']}
# establish DataFrame object 
data_frame = pd.DataFrame(data)
# Perform multi column de duplication operation 
data_frame.drop_duplicates(subset=['A', 'B'], inplace=True)
# Output the removed data 
print(data_frame)

The program runs as follows ：

 A B C
0 A1 B1 C1
2 A1 B3 C3
3 A2 B4 C4
4 A2 B5 C5
Process finished with exit code 0

Access to documents

Basic file operations TXT

1. TXT File store

If you want to simply TXT File storage work , Can pass open() Function operation file implementation , That is, you need to create or open the specified file and create the file object first .open() The basic syntax format of the function is as follows ：

file = open(filename[, mode[,buffering]])

Parameter description

§ file: The created file object

§ filename: The name of the file to create or open , It needs to be enclosed in single or double quotation marks . If the file to be opened is in the same directory as the current file , Then write the file name directly , Otherwise, you need to specify the full path . for example , The name under the current path to open is status.txt The file of , have access to "status.txt".

§ mode： Optional parameters , Used to specify the open mode of the file . Its parameters are shown in the following table . The default open mode is read-only （ namely r)

mode Parameter value description of the parameter

value say bright notes It means r Open the file in read-only mode . The pointer to the file will be placed at the beginning of the file The file must exist rb Open the file in binary format , And in read-only mode . The pointer to the file will be placed at the beginning of the file , Generally used for non text files , Such as images 、 Voice etc. The file must exist r+ After opening the file , Can read file contents , You can also write new content to overwrite the original content （ Overwrite from beginning of file ） The file must exist rb+ Open the file in binary format , And use read-write mode , The pointer to the file will be placed at the beginning of the file , Generally used for non text files 、 Such as images 、 Voice etc. The file must exist w Open the file in write only mode File exists , Then cover it , Otherwise, create a new file wb Open the file in binary format , And use write only mode . Generally used for non text files , Such as images 、 Voice etc. File exists , Then cover it , Otherwise, create a new file w+ After opening the file , Empty the original content first , Make it an empty file , Have read and write access to this empty file File exists , Then cover it , Otherwise, create a new file wb+ Open the file in binary format , And use read-write mode . Generally used for non text files , Such as images 、 Voice etc. File exists , Then cover it , Otherwise, create a new file value say bright notes It means a Open a file in append mode . If the file already exists , The file pointer will be placed at the end of the file （ That is, the new content will be written after the existing content ）, otherwise , Create a new file for writing ab Open the file in binary format , And the additional mode is adopted . If the file already exists , The file pointer will be placed at the end of the file （ That is, the new content will be written after the existing content ）, otherwise , Create a new file for writing a+ Open the file in read-write mode . If the file already exists , The file pointer will be placed at the end of the file （ That is, the new content will be written after the existing content ）, otherwise , Create a new file for reading and writing ab+ Open the file in binary format , And the additional mode is adopted . If the file already exists , The file pointer will be placed at the end of the file （ That is, the new content will be written after the existing content ）, otherwise , Create a new file for reading and writing

§ buffering: Optional parameters , Used to specify the buffer mode for reading and writing files , The value is 0 Expressions are not cached ; The value is 1 Represent cache ; Greater than 1, Indicates the size of the buffer . The default is cache mode .

Take the famous inspirational sentences in a web page as an example , First, through requests Send network request , Then receive the response result and pass BeautifulSoup analysis HTML Code , Then extract all the information , Finally, write the information one by one data.txt The file of . The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/14/22 1:06 PM
# file ：TXT File store .py
# IDE ：PyCharm
# Import network request module 
import requests
# Import etree Sub module 
from lxml import etree
# Define request headers 
headers = {

"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.80 Safari/537.36"
}
for i in range(2):
# Define the request address 
url = f'http://quotes.toscrape.com/tag/inspirational/page/{
i+1}/'
# Send network request 
resp = requests.get(url, headers=headers)
# If the network request is successful 
if resp.status_code == 200:
# analysis html file 
html = etree.HTML(resp.text)
# Get all the inspirational sentences displayed on the current page span label 
text_all = html.xpath('//span[@class="text"]')
# establish open object 
txt_file = open('data.txt', 'a', encoding='utf-8')
# Loop through crawling content 
for i1, value in enumerate(text_all):
# Write each inspirational sentence you crawl through and wrap it at the end 
txt_file.write(str(i1)+value.text+'\n')
txt_file.close()

After running the above example code , The current directory will automatically generate data.txt file , Open the file as shown in the following figure ：

2. Read TXT file

stay Python Open in TXT After the document , In addition to writing or appending to it , You can also read the contents of the file . Reading the contents of the file is mainly divided into the following situations .

§ Read the specified character

The file object provides read() Method to read a specified number of characters . The syntax is as follows ：

file.read([size])

among ,file For open file objects ;size Is an optional parameter , Used to specify the number of characters to read , If omitted, all contents are read at once . Read... In the above example data.txt The sample code in the file is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/14/22 2:27 PM
# file ： Read data.txt Before in the document 46 Characters .py
# IDE ：PyCharm
# Open file 
with open('data.txt', 'r') as f:
# Before reading 50 Characters 
string = f.read(46)
print(string)

The results are as follows ：

0. “There are only two ways to live your life.
Process finished with exit code 0

Use read() Method to read the file , Is read from the beginning of the file . If you want to read part of the content , You can use the... Of the file object first seek() Method to move the pointer of the file to a new location , And then apply read() Method reading .seek() The basic syntax format of the method is as follows ：

file.seek(offset[,where])

Parameter description ：

file： Represents an open file object
offset： Specifies the number of characters to move , Its specific location and whence of
whence： Used to specify where the calculation starts . The value is 0 Indicates that the calculation starts from the file header ;1 Indicates that the calculation starts from the current position ;2 Indicates that the calculation starts from the end of the file , The default is 0

Want to start from the... Of the file 49 Characters start reading 38 Characters , The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/14/22 2:56 PM
# file ： Read a specific number of characters from the specified characters of the file .py
# IDE ：PyCharm
# Open file 
with open('data.txt', 'r') as f:
# Move the file pointer to the new location 
f.seek(49)
string = f.read(38)
print(string)

The execution result of the above program is as follows ：

One is as though nothing is a miracle.
Process finished with exit code 0

say bright

Use seek() When the method is used , If the GBK code , that offset The value of is according to a Chinese character （ Including Chinese punctuation marks ） Take two characters to calculate , And use UTF-8 code , Then one Chinese character accounts for 3 Characters , However, no matter what code is used, English and numbers are calculated by one character . This is related to read() The method is different .

§ Read a line

In the use of read() Method to read the file , If the file is large , Reading all the contents to the memory at one time is easy to cause insufficient memory , Therefore, line by line reading is usually used . The file object provides readline() Method is used to read one row of data at a time .readline() The basic syntax format of the method is as follows ：

file.readline()

among ,file For open file objects . Same as read() The method is the same , When opening a file , You also need to specify the opening mode as r( read-only ) perhaps r+( Reading and writing ). Read line by line Python Classic application files , The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/14/22 3:27 PM
# file ： Read line by line Python Classic application files .py
# IDE ：PyCharm
print("\n","="*20,"Python Classic application ","="*20,"\n")
with open('message.txt','r') as file: # Open save Python Classic application information files 
number = 0 # Record line number 
while True:
number += 1
line = file.readline()
if line =='':
break # Out of the loop 
print(number,line,end= "\n") # Output a line of content 
print("\n","="*20,"over","="*20,"\n")

The program runs as follows ：

==================== Python Classic application ====================
1 1 Zope ： application server
2 2 Plone ： Content management system
3 3 Django : Encourage rapid development of Web Application framework
4 4 Python Wikipedia Robot Framework：MediaWiki Robot program for
5 5 tornado ： Non blocking servers
6 6 Reddit ： Social sharing sites
7 7 Dropbox ： File sharing service
8 8 Trac ： Use Python Compiling BUG In pipe system
9 9 Blender ： With C And Python Open source for development 3D Drawing software
==================== over ====================
Process finished with exit code 0

§ Read all lines

Reading all rows is the same as calling readd() Method is not specified size similar , Just when reading all the rows , Returns a list of strings , Each element is a line of the file . Read all lines , Using a file object **readlines()** Method , The syntax is as follows ：

file.readlines()

among ,file For open file objects . Same as read() The method is the same , When opening a file , You also need to specify the opening mode as r( read-only ) perhaps r+( Reading and writing ), adopt readlines() Method reading message.txt Everything in the document , And output the reading result , The code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/14/22 4:13 PM
# file ： adopt readlines() Method to read all the contents .py
# IDE ：PyCharm
print("\n","="*20,"Python Classic application ","="*20,"\n")
with open('message.txt','r') as file: # Open save Python Classic application information files 
message = file.readlines() # Read all the information 
print(message) # Output information 
print("\n","="*25,"over","="*25,"\n")

The program runs as follows ：

==================== Python Classic application ====================
['1 Zope ： application server \n', '2 Plone ： Content management system \n', '3 Django : Encourage rapid development of Web Application framework \n', '4 Python Wikipedia Robot Framework：MediaWiki Robot program for \n', '5 tornado ： Non blocking servers \n', '6 Reddit ： Social sharing sites \n', '7 Dropbox ： File sharing service \n', '8 Trac ： Use Python Compiling BUG In pipe system \n', '9 Blender ： With C And Python Open source for development 3D Drawing software ']
========================= over =========================

It can be seen from the operation results that readlines() Method returns a list of strings . In this string list , Each element records one line . If the file is large , Using this method to output the contents of the read file will be very slow . At this time, the contents of the list can be output line by line . The code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/14/22 4:26 PM
# file ：deadlines() Method to output the contents line by line .py
# IDE ：PyCharm
print("\n","="*20,"Python Classic application ","="*20,"\n")
with open('message.txt','r') as file: # Open save Python Classic application information files 
messageall = file.readlines() # Read all the information 
for message in messageall:
print(message) # Output information 
print("\n","="*25,"over","="*25,"\n")

The program runs as follows ：

==================== Python Classic application ====================
1 Zope ： application server
2 Plone ： Content management system
3 Django : Encourage rapid development of Web Application framework
4 Python Wikipedia Robot Framework：MediaWiki Robot program for
5 tornado ： Non blocking servers
6 Reddit ： Social sharing sites
7 Dropbox ： File sharing service
8 Trac ： Use Python Compiling BUG In pipe system
9 Blender ： With C And Python Open source for development 3D Drawing software
========================= over =========================
Process finished with exit code 0

access CSV file

CSV File is a kind of text file , The elements of each row of data in the file are separated by commas . In fact, access CSV File can also be used open() function , But there are better ways , That's using pandas Module implementation CSV Access to documents .

1.CSV File storage

Pandas Provides to_csv() The function is used to implement CSV File storage , The common parameters in this function and their meanings are shown in the following table ：

to_csv() Function common parameters and their meanings

ginseng Count name contain The righteous filepath_or_buffer A string representing a file path sepstr type , Represents a separator , Default to comma “,”na_repstr type , Used to replace missing values , The default is “” empty float_formatstr type , Specify the format of floating-point data , for example ,’%.2f’ Means to keep two decimal places columns Indicates the column name that specifies which column of data to write , The default is Noneheader Indicates whether to write the column name in the data , The default is False, Indicates no writing index Indicates whether to write the row index to the file , The default is Truemodestr type , Indicates that the default write mode is ’w’encodigstr type , Indicates the encoding format of the written file

for example , establish A,B,C Three columns of data , Then write the data to CSV In file , The code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/14/22 5:14 PM
# file ： establish CSV file .py
# IDE ：PyCharm
# Import pandas modular 
import pandas as pd
# Create three columns of data 
data = {
'A':[1, 2, 3], 'B':[4, 5, 6], 'C':[7, 8, 9]}
# establish DataFrame object 
df = pd.DataFrame(data)
# Storage CSV file 
df.to_csv('test.csv')

After running the code , A... Will be generated automatically in the folder directory test.csv file , stay Pycharm Open the file in , The contents shown in the following figure will be displayed ：

say bright

The first column in the figure is the index column generated by default , If the default index column is not required when writing data , Can be in to_csv() Function index Parameter is False that will do .

2. CSV File reading

pandas The module provides read_csv() Function for CSV File reading .read_csv() The common parameters and meanings in the function are shown in the following table ：

read_csv() Common parameters and meanings of functions

ginseng Count name contain The righteous filepath_or_buffer A string representing a file path sepstr type , Represents a separator , Default to comma “,”header Indicates which row of data is used as the column name index_col Specify the position of the column through the column index , The default is Nonenames Set the column name for the read data , The default is Noneskiprowsint type , Line number to skip , From the beginning of the data in the file skipfooterint type , Please skip , From the end of the data in the file na_values Set the specified value to NaNnrowsint type , Set the first... In the data to be read n Row data ginseng Count name contain The righteous encodingstr type , Used to format text encoding . for example ,“utf-8“ Express UTF-8 code squeeze Set to True, Indicates that if the parsed data contains only one column , Returns a Series. The default is Falseengine Represents the engine for data parsing , Can be specified as c or python, The default is c

In the implementation of a simple read CSV When you file , Call directly pandas.read_csv() function , Then specify the file path . The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/14/22 5:50 PM
# file ： Read csv file .py
# IDE ：PyCharm
# Import pandas modular 
import pandas as pd
# Read csv file information 
data = pd.read_csv('test.csv')
# Output the read file contents 
print(' Read the CSV The content of the document is ：\n', data)

The program runs as follows ：

 Read the CSV The content of the document is ：
Unnamed: 0 A B C
0 0 1 4 7
1 1 2 5 8
2 2 3 6 9
Process finished with exit code 0

You can also specify columns for the read data , Write to new file . The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/14/22 5:59 PM
# file ： Will read out csv The data specified column is written to the new file .py
# IDE ：PyCharm
# Import pandas modular 
import pandas as pd
# Read csv file information 
data = pd.read_csv('test.csv')
# Specify the column in the information to be read , Write to a new file 
data.to_csv('new_test.csv', columns=['B', 'C'], index=False)
# Read newly written CSV file information 
new_data = pd.read_csv('new_test.csv')
# Print new file information 
print(' Read the new csv The content of the document is ：\n', new_data)

The program runs as follows ：

 Read the new csv The content of the document is ：
B C
0 4 7
1 5 8
2 6 9
Process finished with exit code 0

access Excel file

1. Excel File storage

Excel File is a common office form file that we are familiar with , It is a component of Microsoft office software .Excel There are currently two file extensions , One is .xls, The other is .xlsx, Its extension is mainly determined by the office software version .

adopt DataFrame The data object directly calls to_excel() Method can be realized Excel File writing function , The parameter meaning of this method is the same as to_csv() The method is similar to . adopt to_excel() Method direction Excel The code for writing information to the file is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/14/22 6:18 PM
# file ：to_excel() Method write excel file .py
# IDE ：PyCharm
# Import pandas modular 
import pandas as pd
# Create three columns of data 
data = {
'A':[1, 2, 3],'B':[4, 5, 6], 'C':[7, 8, 9]}
# establish DataFrame object 
df = pd.DataFrame(data)
# Storage Excel file 
df.to_excel('test.xlsx')

2. Excel File reading

pandas Modular read_excel() Function for Excel File reading , The common parameters in this function and their meanings are shown in the following table ：

read_excel() Function common parameters and their meanings

ginseng Count name contain The righteous io A string representing a file path sheet_name To designate Excel Location of the sub table in the document , To return multiple tables, you can use sheet_name=[0,1], The default is 0header Indicates which row of data is specified as the column name , The default is 0skiprowsint type , Line number to skip , From the beginning of the data in the file skip footerint type , Please skip , From the end of the data in the file index_col Specify the position of the column through the column index , The default is Nonenames Specifies the name of the column

In no special case , Read Excel File content and reading CSV The contents of the document are the same , Call directly pandas.read_excel() Function . The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/14/22 6:31 PM
# file ： Read excel file .py
# IDE ：PyCharm
# Import pandas modular 
import pandas as pd
# Read Excel The contents of the document 
data = pd.read_excel('test.xlsx')
print(' Read the Excel The content of the document is ：\n', data)

The program runs as follows ：

 Read the Excel The content of the document is ：
Unnamed: 0 A B C
0 0 1 4 7
1 1 2 5 8
2 2 3 6 9
Process finished with exit code 0

SQLite database

Unlike many other database management systems ,SQLite Not a client / Server structured database engine , It's a kind of embedded database , Its database is a file .SQLite The entire database , Including definition 、 surface 、 The index and the data itself , As a separate 、 Files that can be used across platforms are stored on the host . because SQLite Itself is C language-written , And it's very small , So it is often integrated into various applications .Python It's built in SQLite3, therefore , stay Python Use in SQLite database , No modules need to be installed , You can use it directly .

Create a database file

because Python Built in SQLite3, So you can use it directly import Statement import SQLite3 modular .Python The general process of operating the database is shown in the figure ：

for example , Create a name mySoft.db Of SQLite Database files , And then execute SQL Statement to create a user( User table ),user The table contains id and name Two fields , The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/14/22 8:27 PM
# file ： establish mySoft.db Of SQLite database .py
# IDE ：PyCharm
import sqlite3
# Connect to SQLite database 
# The database file is mySoft.db, If the file doesn't exist , Will automatically create... In the current directory 
conn = sqlite3.connect('mySoft.db')
# Create a Cursor
cursor = conn.cursor()
# To perform a SQL sentence , establish user surface 
cursor.execute('create table user(id int(10) primary key, name varchar(20))')
# Close cursor 
cursor.close()
# close connection
conn.close()

say bright

The above code can only be run once , When running again , Error messages will be prompted ：sqlite3.OperationalError:table user already exists. This is because user The table already exists

operation SQLite

1. Add user data information

In order to add data to the data table , You can use the following SQL sentence ：

insert into Table name ( Field name 1, Field name 2,...., Field name n) values( field value 1, field value 2,....., field value n)

for example , stay user Table has 2 A field , The field names are id and name. and The field value needs to be assigned according to the data type of the field , for example ,id Is a length of 10 The integer of ,name It's a length of 20 String data . towards user Insert in table 3 User information records ,SQL The statement is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/15/22 6:26 AM
# file ： Add user data information .py
# IDE ：PyCharm
import sqlite3
# Connect to SQLite database 
# The database file is mySoft.db, If the file doesn't exist , Will automatically create... In the current directory 
conn = sqlite3.connect('mySoft.db')
# Create a Cursor
cursor = conn.cursor()
# To perform a SQL sentence , establish user surface 
cursor.execute('create table user (id int(10) primary key, name varchar(20))')
# Insert 3 Data 
cursor.execute('insert into user (id, name) values("1", "Bruce_liu")')
cursor.execute('insert into user (id, name) values("2", "Andy")')
cursor.execute('insert into user (id, name) values("3", " Glorious years ")')
# Submit commit()
cursor.commit()
# Close cursor 
cursor.close()
# close connection
conn.close()

2. View user data information

lookup user The data in the table can be used as follows SQL sentence ：

selec Field name 1, Field name 2, Field name 3,.... from Table name where Query criteria

The code for viewing user information is roughly the same as that for inserting data information , The difference lies in the use of SQL The statements are different . Besides , When querying data, you usually use the following 3 Ways of planting ：

§ fetching()： Get the next record in the query result set

§ fetch many(size)： Get a specified number of records

§ fetch all()： Get all the records of the result set

Let's practice this by example 3 The difference between the two query methods ：

for example , Separate use fetchone()、fetch many(size)、fetchall() this 3 There are three ways to query user information , The code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/15/22 6:38 AM
# file ： View user data information .py
# IDE ：PyCharm
# The import module 
import sqlite3
# Connect to SQLite database 
# The database file is mySoft.db, If the file doesn't exist , Will automatically create... In the current directory 
conn = sqlite3.connect('mySoft.db')
# Create a Cursor
cursor = conn.cursor()
# Execute query statement 
cursor.execute('select * from user')
# Get query results 
result1 = cursor.fetchone() # Use fetchone() Get a message 
print(result1)
result2 = cursor.fetchmany(2) # Use fetchmany(size) Method to query a specified amount of data 
print(result2)
result3 = cursor.fetchall() # Use fetchall() Method to query all data 
print(result3)
# Close cursor 
cursor.close()
# Close links 
conn.close()

The program runs as follows ：

(1, 'Bruce_liu')
[(2, ' Wei '), (3, ' as boundless as the sea and sky ')]
[(1, 'Bruce_liu'), (2, ' Wei '), (3, ' as boundless as the sea and sky ')]

Modify the above code as condition query , Examples are as follows ：

cursor.execute('select * from user where id > ?', (1, ))
result3 = cursor.fetchall()
print(result3)

The program runs as follows ：

[(2, ' Wei '), (3, ' as boundless as the sea and sky ')]
Process finished with exit code 0

stay select In the query statement , Use question marks as placeholders for specific values , Then replace the question mark with a tuple （ Be careful , Don't ignore the last comma in a tuple ）. The above conditional query statement is equivalent to ：

cursor.execute('select * from user where id > 1')

say bright

Using placeholders can avoid SQL The risk of Injection , It is recommended to use this way

3. Modify user data information

modify user The data in the table can be used as follows SQL sentence ：

update Table name set Field name = field value where Query criteria

for example , take SQLite In the database user surface id by 1 The data of name Change the field value to ’ Bruce Lee ’, The code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/15/22 7:33 AM
# file ： Delete user data information .py
# IDE ：PyCharm
# The import module 
import sqlite3
# Connect to database 
conn = sqlite3.connect('mySoft.db')
# establish cursor
cursor = conn.cursor()
# Execute statement 
cursor.execute('delete from user where id = ?', (1, ))
# Submit 
conn.commit()
# Close cursor 
cursor.close()
# Close the connection 
conn.close()

MySQL database

MySQL Server download and installation

My operating environment is macOS System , stay terminal perform brew install [email protected] Installation , No more details here .

[email protected] ~ % brew install [email protected]
==> Downloading https://ghcr.io/v2/homebrew/core/mysql/5.7/manifests/5.7.36
Already downloaded: /Users/liuxiaowei/Library/Caches/Homebrew/downloads/3bd8201c[email protected]5.7-5.7.36.bottle_manifest.json
==> Downloading https://ghcr.io/v2/homebrew/core/mysql/5.7/blobs/sha256:5289b664
Already downloaded: /Users/liuxiaowei/Library/Caches/Homebrew/downloads/e6da9b8d[email protected]5.7--5.7.36.monterey.bottle.tar.gz
==> Pouring [email protected]
==> Caveats
We've installed your MySQL database without a root password. To secure it run: mysql_secure_installation MySQL is configured to only allow connections from localhost by default To connect run: mysql -uroot [email protected] is keg-only, which means it was not symlinked into /usr/local, because this is an alternate version of another formula. If you need to have [email protected] first in your PATH, run: echo 'export PATH="/usr/local/opt/[email protected]/bin:$PATH"' >> ~/.zshrc
For compilers to find [email protected] you may need to set:
export LDFLAGS="-L/usr/local/opt/[email protected]/lib"
export CPPFLAGS="-I/usr/local/opt/[email protected]/include"
To restart [email protected] after an upgrade:
brew services restart [email protected]
Or, if you don't want/need a background service you can just run:
/usr/local/opt/[email protected]/bin/mysqld_safe --datadir=/usr/local/var/mysql
==> Summary
/usr/local/Cellar/[email protected]/5.7.36: 320 files, 234.3MB
==> Running `brew cleanup [email protected]`...
Disable this behaviour by setting HOMEBREW_NO_INSTALL_CLEANUP.
Hide these hints with HOMEBREW_NO_ENV_HINTS (see `man brew`).

say bright

If there is a problem with the installation , I have published a blog about solving problems before , Link address ：

https://blog.csdn.net/weixin_41905135/article/details/122583513?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522164488355516780261997164%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fblog.%2522%257D&request_id=164488355516780261997164&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2_blogfirst_rank_ecpm_v1~rank_v31_ecpm-1-122583513.nonecase&utm_term=%E5%AE%89%E8%A3%85mysql5.7&spm=1018.2226.3001.4450

start-up SQL

Use MySQL Before database , You have to start MySQL. stay terminal Window , Enter the command ：

MacBookAir:~ root# /usr/local/opt/mysql\@5.7/bin/mysql.server start

The startup results are as follows ：

Starting MySQL
. SUCCESS!

Use Navicat Operation management MySQL

§ 1. open Navicat, newly build MySQL Connect , Enter the connection information .( This case ） Connection name ：python-mysql, After entering the host name IP Address “localhost“ or "127.0.0.1", Input password , Here's the picture ：

§ 2. Once I've created it , double-click python-mysql, Access to database . Here's the picture ：

§ 3. Create a file called “myPython“ The database of , The steps are ： Right click python-mysql, choice “ New database “, Fill in the database information , Here's the picture ：

install PyMySQL

because MySQL Server independent process line , And through the network of external services , therefore , Need to support Python Of MySQL The driver is connected to MySQL The server . stay Python Chinese support MySQL There are many database modules , We choose to use PyMySQL modular .

PyMySQL The module installation is relatively simple , stay CMD Run the following commands in the command line window ：

pip install PyMySQL

Connect to database

The first step in using a database is to connect to the database . Next use PyMySQL Module connection database . because PyMySQL The module page follows Python Database API2.0 standard , So operation MySQL The way of database is the same as SQLite be similar . I've created a MySQL Connect “python-mysql“, The user name of the database is set when installing the database "root" And password “12345”. The following is the above information , Use connect() Methods the connection MySQL database , The code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/15/22 12:19 PM
# file ： Use connect() Methods the connection MySQL database .py
# IDE ：PyCharm
# The import module 
import pymysql
# Open database connection , Parameters 1： Host name or IP; Parameters 2： user name ; Parameters 3： password ; Parameters 4： Database name 
db = pymysql.connect(host='localhost', database='myPython', port=3306, user='root', password='12345')
# Use cursor() Method to create a cursor object cursor
cursor = db.cursor()
# Use execute() Method execution SQL Inquire about 
cursor.execute('SELECT VERSION()')
# Use fetchone() Method to get a single piece of data 
data = cursor.fetchone()
print('Database version :%s '% data)
# Close database connection 
db.close()

In the above code , use first connect() Method to connect to the database , And then use cursor() Method to create a cursor , Then use execute() Method execution SQL Statement view MySQL Database version , And then use fetchone() Method to get data , Finally using close() Method to close the database connection , The operation results are as follows ：

Database version :5.7.36
Process finished with exit code 0

Create database tables

After the database connection is successful , Then you can create data tables for the database . To create a database table, you need to use excute() Method , This example uses this method to create a books The book list ,books The table contains id( Primary key )、name( The name of the book )、category( Book classification ）、price( The book price ) and publish_time( Publication date )5 A field . establish books surface SQL The statement is as follows ：

CREATE TABLE books(
id int(8) NOT NULL AUTO_INCREMENT,
name varchar(50) NOT NULL,
category varchar(50) NOT NULL,
price decimal(10,2) DEFAULT NULL,
publish_time date DEFAULT NULL,
PRIMARY KEY(id)
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET = utf8;

remarks Before you create the table , Use the following statement to delete the current database table when it exists ：

DROP TABLE IF EXISTS 'books';

The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/15/22 12:44 PM
# file ： Create data table .py
# IDE ：PyCharm
# The import module 
import pymysql
# Open database connection 
db = pymysql.connect(host='localhost', database='myPython', port=3306, user='root', password='12345')
# Use cursor() Method to create a cursor object 
cursor = db.cursor()
# Use preprocessing statements to create tables 
sql = ''' CREATE TABLE books( id int(8) NOT NULL AUTO_INCREMENT, name varchar(50) NOT NULL, category varchar(50) NOT NULL, price decimal(10,2) DEFAULT NULL, publish_time date DEFAULT NULL, PRIMARY KEY(id) ) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET = utf8; '''
# perform SQL sentence 
cursor.execute(sql)
# Close database connection 
db.close()

After running the above code ,myPython A... Will be created under the database books surface . open Navicat（ If it is already on, press the key to refresh ）, Find out myPython There is one more in the database books surface , Right click books, Select design table , As shown in the figure below ：

operation MySQL Data sheet

MySQL Data table operations mainly include data addition 、 Delete 、 Change 、 check , And operation SQLite similar , We use executemany() Method to batch add multiple records to the data table ,executemany() The format of the method is as follows ：

exeutemany(operation, seq_of_params)

§ operation： Operation of the SQL sentence

§ seq_of_params： Parameter sequence

The sample code is as follows ：

#_*_coding:utf-8_*_
# author ：liuxiaowei
# Creation time ：2/15/22 1:04 PM
# file ： Batch add multiple records to the data table .py
# IDE ：PyCharm
# The import module 
import pymysql
# Open database connection 
db = pymysql.connect(host="localhost", user="root", password="12345", database="myPython",charset="utf8")
# Use cursor() Method get operation cursor 
cursor = db.cursor()
# Data list 
data = [(" Zero basis to learn Python",'Python','79.80','2018-5-20'),
("Python From entry to mastery ",'Python','69.80','2018-6-18'),
(" Zero basis to learn PHP",'PHP','69.80','2017-5-21'),
("PHP Introduction to project development ",'PHP','79.80','2016-5-21'),
(" Zero basis to learn Java",'Java','69.80','2017-5-21'),
]
try:
# perform sql sentence , Insert multiple data 
cursor.executemany("insert into books(name, category, price, publish_time) values (%s,%s,%s,%s)", data)
# Submit data 
db.commit()
except:
# Rollback on error 
db.rollback()
# Close database connection 
db.close()

Run the above code , stay Navicat View in books Table data , Here's the picture ：