您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

This plug-in can connect Python and excel and generate code automatically!

編輯：Python

Load one Jupyter After the plug-in , You can do data analysis without writing code , Also help you generate the corresponding code ？

you 're right , You just need to load this file called Mito My gadget bag , use Python Do data analysis , Become and use Excel It's as simple as ：

Introduce

With Excel The spreadsheet represented by is the most important way to explore data sets 、 One of the most adaptive ways . It can help make the necessary changes to the data type 、 Create a new feature 、 Sort the data and create new features from existing features .

Follow the same idea above ,Mito It's a Jupyter-Lab Extension and Python library , It makes it possible to support GUI It's super easy to manipulate data in a spreadsheet environment .

Mito Appearance , It's like a general Python The power of 、 and Excel The ease of use of .

Just master Excel Usage of , Can use Python Data analysis function , Can also write out the code “ Pack out ”.

It makes up for Excel Several problems in data analysis defects ：

Excel Unable to do big data analysis （ Large data sets are not handled well ）
Excel slow
Excel Cannot easily create repeatable processes

meanwhile , And than SQL and Python more Simple 、 intuitive . After all, these professional tools are for 0 For beginners , It will take at least a few years , To get it all started .

In this paper , We will learn together :

How to set up Mito
how debug installation error
Use Mito Various functions provided
How the library generates for all operations on the dataset Python Equivalent code

install Mito

Mito It's a Python library , Can pass pip Package manager installation . It needs to Python 3.6 And above . Besides , Installation is required on the system Nodejs, One JavaScript Runtime environment .

in addition , Can be in a separate environment （ A virtual environment ） Install this package in , Some dependency errors can be avoided . Next, run these commands in the terminal , Complete the installation .

1. Create an environment

I am using Conda Create a new environment . You can still use it Python Of “venv” To create a virtual environment .

conda create -n mitoenv python=3.8

2. Activate the environment

conda activate mitoenv

3. adopt pip install Mito

pip install mitoinstaller

4. function Mito Erection sequence

python -m mitoinstaller install

This process will take some time to install and set up Mito.

5. start-up Jupyter Lab

jupyter lab

Error reporting solution

When to start Jupyter Lab when , The following errors may be encountered ：

.
.
. 
File "c:userslenovoanaconda3envsmitoenvlibsite-packagesjupyter_corepaths.py", line 387, in win32_restrict_file_to_user
    import win32api
ImportError: DLL load failed while importing win32api: The specified module could not be found.

To fix this error , Just run the following command ：

pip install --upgrade pywin32==225

If you encounter other difficulties , Please feel free to comment below . I'd be happy to help .

MitoSheets Interface

stay Jupyter Lab in , Create a new notebook and initialize Mitosheet：

import mitosheet
mitosheet.sheet()

for the first time , The system will prompt you to enter your email address to register ：

After filling in the basic knowledge , Will be redirected to GUI The spreadsheet . Next, let's take a look at all the features of this interface , And learn how to generate Python Equivalent code .

Load data set

To be in MitoSheets Load data set in , Just click Import . There are two choices ：

Add files from the current folder ： This will list all... In the current directory CSV file , You can select File... From the drop-down menu .
Add files by file path ： This will add only that specific file .

As shown in the figure below

If you look at the cells below , You'll find that Python The equivalent code imports a dataset using pandas Appropriate comments have been generated !

This is it. Mito The charm of , you are here Mitosheet Each operation performed in will be converted to Python Equivalent code ！ Next, let's discuss in detail Mito All functions of .

Add and delete columns

Add columns

As in the Excel Like in a spreadsheet , You can add a new column , The column may be created from an existing column or feature . To be in Mito Do this in , Just click “Add Col” Button . The column is added next to the currently selected column . first , The column name will be an alphabet , All values of the column are zero .

Edit the contents of the new column

Click the new column name （ Assigned alphabet ）
The sidebar menu... Will pop up , You can edit the name of the column in it .
To update the contents of this column , Please click any cell in this column , Then enter the value . You can enter a constant value , You can also create feature sets based on existing feature values . If you want to create a value from an existing column , The column name... Is called directly using the operator to be executed .
The data type of the new column changes according to the assigned value .

Below GIF Demonstrates all of the above ：

Delete column

Select any column by clicking .
single click “Del Col”, This particular column will be deleted from the dataset .

Python Code

Generate a with correct comments in the next cell Python Equivalent code , The operations used to perform are ：

# MITO CODE START (DO NOT EDIT)
from mitosheet import * # Import necessary functions from Mito
register_analysis('UUID-7bf77d26-84f4-48ed-b389-3f7a3b729753') # Let Mito know which analysis is being run
# Imported edxCourses.csv
import pandas as pd
edxCourses_csv = pd.read_csv('edxCourses.csv')
# Added column H to edxCourses_csv
edxCourses_csv.insert(7, 'H', 0)
# Renamed H to newCol in edxCourses_csv
edxCourses_csv.rename(columns={"H": "newCol"}, inplace=True)
# Set newCol in edxCourses_csv to =coursePrice + courseEnrollments
edxCourses_csv['newCol'] = edxCourses_csv['coursePrice'] + edxCourses_csv['courseEnrollments']
# Deleted column newCol from edxCourses_csv
edxCourses_csv.drop('newCol', axis=1, inplace=True)
# MITO CODE END (DO NOT EDIT)

Create a PivotTable

PivotTable is an important excel function , It aggregates numerical variables according to another classification feature . To use Mito Create such a table ,

single click “Pivot” And select the source dataset （ The default load CSV）
Select the row of the PivotTable 、 Value and column . You can also select an aggregate function for the value column . All drop-down options , If sum 、 Average 、 The median 、 minimum value 、 Maximum 、 Both counts and standard deviations are available .
After selecting all the necessary fields , You will get a separate table , It contains the implementation of PivotTable .

Below GIF Demonstrates how to aggregate functions “ mean value ” Create a PivotTable ：

Python Code

# MITO CODE START (DO NOT EDIT)
from mitosheet import * # Import necessary functions from Mito
register_analysis('UUID-a35246c0-e0dc-436b-8667-076d4f08e0c1') # Let Mito know which analysis is being run
# Imported edxCourses.csv
import pandas as pd
edxCourses_csv = pd.read_csv('edxCourses.csv')
# Pivoted edxCourses_csv into df2
pivot_table = edxCourses_csv.pivot_table(
    index=['courseOrganization'],
    values=['coursePrice'],
    aggfunc={'coursePrice': 'mean'}
)
# Reset the column name and the indexes
df2 = pivot_table.rename_axis(None, axis=1).reset_index()
# MITO CODE END (DO NOT EDIT)

Merge two datasets

Merging data sets is an important part of data science projects . Usually , Data sets are divided into different tables , To increase the accessibility and readability of information . Merge Mitosheets be prone to .

single click “Merge” And select the data source .
You need to specify the key to merge .
You can also select the columns to be retained after merging from the data source . By default , All columns will remain in the merged dataset .

Python Code

# MITO CODE START (DO NOT EDIT)
from mitosheet import * # Import necessary functions from Mito
register_analysis('UUID-88ac4a92-062f-4ed8-a55d-729394975740') # Let Mito know which analysis is being run
# Imported Airport-Pets.csv, Zipcode-Data.csv
import pandas as pd
Airport_Pets_csv = pd.read_csv('Airport-Pets.csv')
Zipcode_Data_csv = pd.read_csv('Zipcode-Data.csv')
# Merged Airport_Pets_csv and Zipcode_Data_csv
temp_df = Zipcode_Data_csv.drop_duplicates(subset='Zip')
Airport_Pets_csv_tmp = Airport_Pets_csv.drop(['State', 'Division'], axis=1)
Zipcode_Data_csv_tmp = temp_df.drop(['Mean_Income', 'Pop'], axis=1)
df3 = Airport_Pets_csv_tmp.merge(Zipcode_Data_csv_tmp, left_on=['Zip'], right_on=['Zip'], how='left', suffixes=['_Airport_Pets_csv', '_Zipcode_Data_csv'])
# MITO CODE END (DO NOT EDIT)

Modify column data type 、 Sorting and filtering

You can change the data type of existing columns , Sort columns in ascending or descending order , Or filter them through boundary conditions . stay Mito All of these are simple , You can do this by selecting the options on the screen GUI It's done by itself .

Click the desired column
You will see a list of data types . You can select any data type from the drop-down list as needed , This data type will be applied to the entire column .
Next, you can sort the data in ascending or descending order by selecting the options provided .
You can also use custom filters to filter data .

Python Code

# MITO CODE START (DO NOT EDIT)
from mitosheet import * # Import necessary functions from Mito
register_analysis('UUID-cc414267-d9aa-4017-8890-ee3b7461c15b') # Let Mito know which analysis is being run
# Imported edxCourses.csv
import pandas as pd
edxCourses_csv = pd.read_csv('edxCourses.csv')
# Changed coursePrice from int64 to float
edxCourses_csv['coursePrice'] = edxCourses_csv['coursePrice'].astype('float')
# Sorted coursePrice in edxCourses_csv in descending order
edxCourses_csv = edxCourses_csv.sort_values(by='coursePrice', ascending=False, na_position='first')
edxCourses_csv = edxCourses_csv.reset_index(drop=True)
# Filtered coursePrice in edxCourses_csv
edxCourses_csv = edxCourses_csv[edxCourses_csv['coursePrice'] >= 500]
edxCourses_csv = edxCourses_csv.reset_index(drop=True)
# MITO CODE END (DO NOT EDIT)

Charts and statistics generation

You can also generate graphics directly in this extension , Without writing drawing logic . By default , All graphs generated by this extension use Plotly To make the . This means that the drawing is interactive , It can be modified immediately .

Be careful , There is no such thing as an operation column , Generate graphic code in the next cell （ Maybe developers will push this code in future updates ）

have access to Mito Generate two types of graphs ：

1. By clicking the chart button

You will see a sidebar menu , Used to select the drawing type and the corresponding axis to be selected .

2. By clicking on the column name

When you click on the column name in the spreadsheet , You can see the filter and sorting options . But if you navigate to “Summary Stats”, The line chart or bar chart and the summary of variables will be displayed according to the type of variables . This summary is changed to text and has no text variables .

Save and playback

All transformations made to the dataset can be saved and used for other similar datasets . This is in Excel Macro or VBA In the form of . You can also complete the same operation through these functions .

The file is Python Compiling , Instead of using something more difficult to understand VBA.

Backtrack all steps performed

To repeat the above steps , It's very easy ,Mito Bring their own “ Repeat the saved analysis steps ” function , You can analyze other data in the same way with one click . This feature is the most interesting . You can actually track in Mitosheet All transformations applied in . The list of all actions has the appropriate title .

Besides , You can view this particular step ！ This means that suppose you change some columns , Then deleted them . You can go back to the undeleted time .

At the end

Come here , Just learn a new tool with Yunduo “Mito”. Used in Python Implement spreadsheet like functions in the environment , And generate the equivalent operation for each step Python Code .