程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

After Python batch processes Excel data, import it into SQL Server

編輯:Python

Python Batch processing excel After the data , Import sqlserver

  • 1、 Preface
  • 2、 Start thinking
    • 2.1 Take apart + Clear requirements
    • 2.2 Install third party package
    • 2.3 Read excel data
    • 2.4 Special data processing
    • 2.5 Other needs
    • 2.6 Complete calling code

1、 Preface

Following yesterday's article Windows Download installation configuration SQL Server、SSMS, Use Python Connect to read and write data , We've installed and configured sqlserver, Also successfully tested how to use Python Connect 、 Read and write data to the database .

Today we officially begin to meet the demand : There's a lot of Excel, It needs batch processing , Then store it in different data tables .

2、 Start thinking

2.1 Take apart + Clear requirements

1) excel What data needs to be modified ?

  • There's a column of data DocketDate yes excel Short time value , Need to change to the normal format ; eg. 44567 --> 2022/1/6
  • Part of the data needs to be according to SOID Carry out repeated processing , according to DocketDate Keep recent data ;
  • There is a column of data that needs date format conversion . eg. 06/Jan/2022 12:27 --> 2022-1-6

Mainly involves : Date format processing 、 Data De reprocessing

2) every last Excel Do they all correspond to a different data table ? The name of the table and Excel Whether the attachment names are consistent ?

  • There are some Excel Corresponding to the same table , Some are separate
  • The name of the table and Excel Inconsistent attachment names , But there is a corresponding relationship eg. The attachment test1 and test2 Corresponding table testa, The attachment test3 Corresponding testb

Mainly involves : Data consolidation processing

2.2 Install third party package

pip3 install sqlalchemy pymssql pandas xlrd xlwt
  • sqlalchemy: You can map the table structure of a relational database to objects , Then process the database content by processing objects ;
  • pymssql:python Connect sqlserver Database driver , You can also directly use it to connect to the database and read and write ;
  • pandas: Processing all kinds of data , Built in many data processing methods , Very convenient ;
  • xlrd xlwt: Reading and writing excel file ,pandas Reading and writing excel Will call them .

Import package :

import pandas as pd
from datetime import date, timedelta, datetime
import time
import os
from sqlalchemy import create_engine
import pymssql

2.3 Read excel data

Reading data is relatively simple , Call directly pandas Of read_excel Function , If the file has any special format , Like coding , You can also customize settings .

# Read excel data
def get_excel_data(filepath):
data = pd.read_excel(filepath)
return data

2.4 Special data processing

“1) Date days are shortened ”

This is difficult ,excel It's easy to turn directly in , Directly select the data to be transferred , And then at the beginning - Select a short date in the data format column .

At that time, I didn't know the transformation law at first sight , Searched for a long time , No similar problems or explanations were found , First of all, it's definitely not a timestamp , It always feels a little related , Finally, it was found that the number of days , To calculate the Day calculation start date You can solve other data transformation problems .

First, we need to judge the null value , Set the start date... And then calculate the start date , utilize datetime Modular timedelta The function converts the number of days of time into a time difference , Then directly calculate with the start date to get the date it represents .

# Date days are shortened
def days_to_date(days):
# Handle nan value
if pd.isna(days):
return
# 44567 2022/1/6
# Extrapolate excel Days are shortened It's from 1899.12.30 Start calculating
start = date(1899,12,30)
# take days convert to timedelta type , It can be calculated directly with the date
delta = timedelta(days)
# Start date + Time difference Get the corresponding short date
offset = start + delta
return offset

What's hard to think about here is Day calculation start date , But after you figure it out , Actually, it's good , from excel In, we can directly convert the date days into short dates , The equation already exists , There is only one unknown x, We only need to formulate a univariate equation to solve the unknowns x.

from datetime import date, timedelta
date_days = 44567
# Convert days to date type interval
delta = timedelta(date_days)
# Result date
result = date(2022,1,6)
# Calculate unknown start date
x = result - delta
print(x)
'''
Output :1899-12-30
'''

“2) Turn the English of the date into numbers ”

At first I wanted to use regular matching , Take out the month, year and day , Then turn the English month into a number , Later, it was found that the English month can be directly recognized in the date .

The code is as follows , First, convert the string into date type data in format , The original data is 06/Jan/2022 12:27( Digital day / English month / Digital year Digital hours : Digital minutes ), Format the symbol according to the date and replace the corresponding relationship in the explanation table .

# Convert official date format to common format
def date_to_common(time):
# Handle nan value
if pd.isna(time):
return
# 06/Jan/2022 12:27 2022-1-6
# test print(time,':', type(time))
# Convert string to date
time_format = datetime.strptime(time,'%d/%b/%Y %H:%M')
# Convert to specified date format
common_date = datetime.strftime(time_format, '%Y-%m-%d')
return common_date

Date formatting symbol interpretation table @CSDN- The messenger of rowing

“3) By order number SOID duplicate removal ”

To repeat here, in addition to removing the duplicate according to the specified column , You also need to keep the latest data by date .

My idea is , First call pandas Of sort_values The function sorts all data in ascending order according to the date column , then , call drop_duplicates The function specifies to press SOID Train for de duplication , And designate keep The value is last, Indicates that the last row of data is retained in the duplicate data .

The code is as follows :

# Remove duplicate values SOID repeat Remove the earliest data by date
def delete_repeat(data):
# Start by date Docket Rec.Date & Time Sort Default descending Make sure the date left is the latest
data.sort_values(by=['Docket Rec.Date & Time'], inplace=True)
# Press SOID Delete duplicate lines
data.drop_duplicates(subset=['SOID #'], keep='last', inplace=True)
return data

2.5 Other needs

“ Multiple Excel The data corresponds to a database table ”

You can write a dictionary , To store database tables and corresponding data Excel Data name , Then store them one by one in the corresponding database table ( Or after processing the data in advance , Re merger ).

  • Merge the same type Excel surface
# Merge data with the same table Incoming merge excel list
def merge_excel(elist, files_path):
data_list = [get_excel_data(files_path+i) for i in elist]
data = pd.concat(data_list)
return data

The same type is passed in here Excel List of filenames (elist) And data storage folders / Relative paths (files_path) that will do , Absolutely... Through the file / Relative paths +Excel You can get the file name Excel Absolute of data table file / Relative paths , Call again get_excel_data Function to read and fetch data .

Traversal read Excel Table data uses list derivation , The use of pandas Of concat Function to merge the corresponding data .

  • Data stored in sqlserver
# Initialize the database connection engine
# create_engine(" Database type + Database driven :// Database user name : Database password @IP Address : port / database ", The other parameters )
engine = create_engine("mssql+pymssql://sa:[email protected]/study?charset=GBK")
# Store the data
def data_to_sql(data, table_naem, columns):
# A little more processing of the data , Select the specified column and save it into the database
data1 = data[columns]
# The first parameter : Table name
# The second parameter : Database connection engine
# The third parameter : Whether to store index
# Fourth parameter : If the table exists Just append data
t1 = time.time() # Time stamp Unit second
print(' Data insertion start time :{0}'.format(t1))
data1.to_sql(table_naem, engine, index=False, if_exists='append')
t2 = time.time() # Time stamp Unit second
print(' Data insertion end time :{0}'.format(t2))
print(' Successfully inserted data %d strip ,'%len(data1), ' Time consuming :%.5f second .'%(t2-t1))

sqlalchemy+pymssql Connect sqlserver Pay attention to the pit : To specify the database code ,slqserver The default database created is GBK code , About sqlserver After installation, you can view the article Windows Download installation configuration SQL Server、SSMS, Use Python Connect to read and write data .

2.6 Complete calling code

'''
Batch process all excel data
'''
# Data files are stored in a specified directory , Such as :
files_path = './data/'
bf_path = './process/'
# Get the names of all files in the current directory
# files = os.listdir(files_path)
# files
# Table name : The attachment excel name
data_dict = {
'testa': ['test1.xls', 'test2.xls'],
'testb': ['test3.xls'],
'testc': ['test4.xls']
}
# Select the specified column in the attachment , Store only the data in the specified column
columns_a = ['S/No', 'SOID #', 'Current MileStone', 'Store In Date Time']
columns_b = ['Received Part Serial No', 'Received Product Category', 'Received Part Desc']
columns_c = ['From Loc', 'Orig Dispoition Code']
columns = [columns_a, columns_b, columns_c]
flag = 0 # Column selection mark
# Ergodic dictionary Merger related excel Then process the data , Deposit in sql
for k,v in data_dict.items():
table_name = k
data = merge_excel(v, files_path)
# 1、 Processing data
if 'SOID #' not in data.columns:
# Contains no columns to process , Then simply remove the weight 、 Store in database
data.drop_duplicates(inplace=True)
else:
# Special data processing
data = process_data(data)
# 2、 Store the data
# Be on the safe side Save one locally
data.to_excel(bf_path+table_name+'.xls')
# Store in database
data_to_sql(data, table_name, columns[flag])
flag+=1

  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved