Our current BI The technology stack is mainly Python
,MySQL
,Excel&PowerBI
, It's usually ETL After data source and cleaning , It through Power Query
Load into Excel Inside , In this way, the simplest Excel Slicer interaction , It is also convenient for us to automatically refresh and distribute reports .
We take an existing model as an example , This model is called Turn to introduction of numerical control I .xlsx. It uses our regular technology chain
Python do ETL,SQL Write the script ,Excel Loading display
Use Power Query
A good model is usually written and fixed SQL Script , It won't change easily .
In this model , Altogether 14 individual Power Query
Inquire about , Namely :
Currently, the fixed date of the month dimension is this month 1 From the th to the same day , The daily dimension is the same day , So in SQL In other words, it is fixed to
# During the month 1 Japan
date_sub(curdate(),interval day(curdate())-1 day)
# On the day
curdate()
However, the idea of the demand side is Very unstable Of , For example, suddenly want yesterday's , For example, every month 5 I want a copy of last month's …
No problem , There are more ways than difficulties
We have Python
,Python
Yes pywin32
. This module has no documentation , But this module is directly applied VBA.
We just need to write it py Script , Set the date as the parameter , It is good to modify parameters every time
Let's say we expand it one month ahead , From this month 1 It was expanded to last month 1 Number
from win32com.client import Dispatch # Import pywin32
import os # Used to process paths
#1 Parameters
FILENAMES = [
'【 republic 】【 month 】 Transfer to table 1 .xlsx',
'【 republic 】【 month 】 Transfer to table 2 .xlsx',
'【 republic 】【 month 】 Transfer to table 3 .xlsx',
'【 republic 】【 month 】 Introduction trend .xlsx'
]
FOLDER = r'E:\Onedrive\doc\for_share\Python_eco\excel'
#2 Handle
app = Dispatch('excel.application') # open Excel Program
app.Visible = -1 # Display program
for file in FILENAMES: # Circulation table
filepath = os.path.join(FOLDER,file) # Merge paths into full paths
wkb = app.Workbooks.Open(filepath) # open Excel file
for p in wkb.Queries: # Traverse Power Queries
replacePair = [
('date_sub(curdate(),interval day(curdate())-1 day)',
'date_sub(date_sub(curdate(),interval day(curdate())-1 day),interval 1 month)')
] # replace text Replace with text
fml = p.formula
for a in replacePair:
fml = fml.replace(a[0],a[1])
p.formula = fml
print('**REPLACED')
wkb.Save()
wkb.Close(1)
print('**DONE')
pywin32
No, Excel Operation related instruction documents Power Query
Out of the box code wkb.Queries
and p.formula
these two items. Now package this block into a function , according to Python
International practice
def replacePowerQuery(filelist,replaceTuple):
''' take Excel Inside Power Query SQL Script replaces some words '''
from win32com.client import Dispatch
app = Dispatch('excel.application')
app.Visible = -1
for file in filelist:
wkb = app.Workbooks.Open(file)
for p in wkb.Queries:
fml = p.formula
for a in replaceTuple:
fml = fml.replace(a[0],a[1])
p.formula = fml
print('**REPLACED')
wkb.Save()
wkb.Close(1)
app.Quit()
print('**DONE')
# Examples of use
filelist = [
r'E:\Onedrive\doc\for_share\Python_eco\excel\【 republic 】【 month 】 Transfer to table 1 .xlsx',
r'E:\Onedrive\doc\for_share\Python_eco\excel\【 republic 】【 month 】 Transfer to table 2 .xlsx',
r'E:\Onedrive\doc\for_share\Python_eco\excel\【 republic 】【 month 】 Transfer to table 3 .xlsx',
r'E:\Onedrive\doc\for_share\Python_eco\excel\【 republic 】【 month 】 Introduction trend .xlsx'
]
replaceTuple = [
('date_sub(curdate(),interval 1 day)','date_sub(curdate(),interval day(curdate()) day)'),
('date_sub(max(everyday),interval 1 day)','date_sub(max(everyday),interval day(max(everyday)) day)')
]
replacePowerQuery(filelist,replaceTuple)