程序師世界是廣大編程愛好者互助、分享、學習的平台,程序師世界有你更精彩!
首頁
編程語言
C語言|JAVA編程
Python編程
網頁編程
ASP編程|PHP編程
JSP編程
數據庫知識
MYSQL數據庫|SqlServer數據庫
Oracle數據庫|DB2數據庫
您现在的位置: 程式師世界 >> 編程語言 >  >> 更多編程語言 >> Python

This plug-in can connect Python and excel and generate code automatically!

編輯:Python

Load one Jupyter After the plug-in , You can do data analysis without writing code , Also help you generate the corresponding code ?

 

you 're right , You just need to load this file called Mito My gadget bag , use Python Do data analysis , Become and use Excel It's as simple as :

Introduce

With Excel The spreadsheet represented by is the most important way to explore data sets 、 One of the most adaptive ways . It can help make the necessary changes to the data type 、 Create a new feature 、 Sort the data and create new features from existing features .

Follow the same idea above ,Mito It's a Jupyter-Lab Extension and Python library , It makes it possible to support GUI It's super easy to manipulate data in a spreadsheet environment .

Mito Appearance , It's like a general Python The power of 、 and Excel The ease of use of .

Just master Excel Usage of , Can use Python Data analysis function , Can also write out the code “ Pack out ”.

It makes up for Excel Several problems in data analysis defects

  • Excel Unable to do big data analysis ( Large data sets are not handled well )

  • Excel slow

  • Excel Cannot easily create repeatable processes

meanwhile , And than SQL and Python more Simple 、 intuitive . After all, these professional tools are for 0 For beginners , It will take at least a few years , To get it all started .

In this paper , We will learn together :

  1. How to set up Mito

  2. how debug installation error

  3. Use Mito Various functions provided

  4. How the library generates for all operations on the dataset Python Equivalent code

install Mito

Mito  It's a Python library , Can pass  pip  Package manager installation . It needs to  Python 3.6  And above . Besides , Installation is required on the system  Nodejs, One  JavaScript  Runtime environment .

in addition , Can be in a separate environment ( A virtual environment ) Install this package in , Some dependency errors can be avoided . Next, run these commands in the terminal , Complete the installation .

1. Create an environment

I am using  Conda  Create a new environment . You can still use it  Python  Of “venv” To create a virtual environment .

conda create -n mitoenv python=3.8

2. Activate the environment

conda activate mitoenv

3. adopt pip install Mito

pip install mitoinstaller

4. function Mito Erection sequence

python -m mitoinstaller install

This process will take some time to install and set up  Mito.

5. start-up  Jupyter Lab

jupyter lab

Error reporting solution

When to start  Jupyter Lab when , The following errors may be encountered :

.
.
. 
File "c:userslenovoanaconda3envsmitoenvlibsite-packagesjupyter_corepaths.py", line 387, in win32_restrict_file_to_user
    import win32api
ImportError: DLL load failed while importing win32api: The specified module could not be found.

To fix this error , Just run the following command :

pip install --upgrade pywin32==225

If you encounter other difficulties , Please feel free to comment below . I'd be happy to help .

MitoSheets Interface

stay  Jupyter Lab in , Create a new notebook and initialize  Mitosheet:

import mitosheet
mitosheet.sheet()

for the first time , The system will prompt you to enter your email address to register :

After filling in the basic knowledge , Will be redirected to  GUI  The spreadsheet . Next, let's take a look at all the features of this interface , And learn how to generate  Python  Equivalent code .

Load data set

To be in  MitoSheets  Load data set in , Just click Import . There are two choices :

  1. Add files from the current folder : This will list all... In the current directory CSV file , You can select File... From the drop-down menu .

  2. Add files by file path : This will add only that specific file .

As shown in the figure below

If you look at the cells below , You'll find that Python The equivalent code imports a dataset using pandas Appropriate comments have been generated !

This is it.  Mito  The charm of , you are here  Mitosheet  Each operation performed in will be converted to  Python  Equivalent code ! Next, let's discuss in detail  Mito  All functions of .

Add and delete columns

Add columns

As in the  Excel  Like in a spreadsheet , You can add a new column , The column may be created from an existing column or feature . To be in  Mito  Do this in , Just click “Add Col” Button . The column is added next to the currently selected column . first , The column name will be an alphabet , All values of the column are zero .

Edit the contents of the new column

  1. Click the new column name ( Assigned alphabet )

  2. The sidebar menu... Will pop up , You can edit the name of the column in it .

  3. To update the contents of this column , Please click any cell in this column , Then enter the value . You can enter a constant value , You can also create feature sets based on existing feature values . If you want to create a value from an existing column , The column name... Is called directly using the operator to be executed .

  4. The data type of the new column changes according to the assigned value .

Below GIF Demonstrates all of the above :

Delete column

  1. Select any column by clicking .

  2. single click “Del Col”, This particular column will be deleted from the dataset .

Python Code

Generate a with correct comments in the next cell Python Equivalent code , The operations used to perform are :

# MITO CODE START (DO NOT EDIT)
from mitosheet import * # Import necessary functions from Mito
register_analysis('UUID-7bf77d26-84f4-48ed-b389-3f7a3b729753') # Let Mito know which analysis is being run
# Imported edxCourses.csv
import pandas as pd
edxCourses_csv = pd.read_csv('edxCourses.csv')
# Added column H to edxCourses_csv
edxCourses_csv.insert(7, 'H', 0)
# Renamed H to newCol in edxCourses_csv
edxCourses_csv.rename(columns={"H": "newCol"}, inplace=True)
# Set newCol in edxCourses_csv to =coursePrice + courseEnrollments
edxCourses_csv['newCol'] = edxCourses_csv['coursePrice'] + edxCourses_csv['courseEnrollments']
# Deleted column newCol from edxCourses_csv
edxCourses_csv.drop('newCol', axis=1, inplace=True)
# MITO CODE END (DO NOT EDIT)

Create a PivotTable

PivotTable is an important  excel  function , It aggregates numerical variables according to another classification feature . To use  Mito  Create such a table ,

  1. single click “Pivot” And select the source dataset ( The default load CSV)

  2. Select the row of the PivotTable 、 Value and column . You can also select an aggregate function for the value column . All drop-down options , If sum 、 Average 、 The median 、 minimum value 、 Maximum 、 Both counts and standard deviations are available .

  3. After selecting all the necessary fields , You will get a separate table , It contains the implementation of PivotTable .

Below GIF Demonstrates how to aggregate functions “ mean value ” Create a PivotTable :

Python Code

# MITO CODE START (DO NOT EDIT)
from mitosheet import * # Import necessary functions from Mito
register_analysis('UUID-a35246c0-e0dc-436b-8667-076d4f08e0c1') # Let Mito know which analysis is being run
# Imported edxCourses.csv
import pandas as pd
edxCourses_csv = pd.read_csv('edxCourses.csv')
# Pivoted edxCourses_csv into df2
pivot_table = edxCourses_csv.pivot_table(
    index=['courseOrganization'],
    values=['coursePrice'],
    aggfunc={'coursePrice': 'mean'}
)
# Reset the column name and the indexes
df2 = pivot_table.rename_axis(None, axis=1).reset_index()
# MITO CODE END (DO NOT EDIT)

Merge two datasets

Merging data sets is an important part of data science projects . Usually , Data sets are divided into different tables , To increase the accessibility and readability of information . Merge  Mitosheets  be prone to .

  1. single click “Merge” And select the data source .

  2. You need to specify the key to merge .

  3. You can also select the columns to be retained after merging from the data source . By default , All columns will remain in the merged dataset .

Python Code

# MITO CODE START (DO NOT EDIT)
from mitosheet import * # Import necessary functions from Mito
register_analysis('UUID-88ac4a92-062f-4ed8-a55d-729394975740') # Let Mito know which analysis is being run
# Imported Airport-Pets.csv, Zipcode-Data.csv
import pandas as pd
Airport_Pets_csv = pd.read_csv('Airport-Pets.csv')
Zipcode_Data_csv = pd.read_csv('Zipcode-Data.csv')
# Merged Airport_Pets_csv and Zipcode_Data_csv
temp_df = Zipcode_Data_csv.drop_duplicates(subset='Zip')
Airport_Pets_csv_tmp = Airport_Pets_csv.drop(['State', 'Division'], axis=1)
Zipcode_Data_csv_tmp = temp_df.drop(['Mean_Income', 'Pop'], axis=1)
df3 = Airport_Pets_csv_tmp.merge(Zipcode_Data_csv_tmp, left_on=['Zip'], right_on=['Zip'], how='left', suffixes=['_Airport_Pets_csv', '_Zipcode_Data_csv'])
# MITO CODE END (DO NOT EDIT)

Modify column data type 、 Sorting and filtering

You can change the data type of existing columns , Sort columns in ascending or descending order , Or filter them through boundary conditions . stay  Mito  All of these are simple , You can do this by selecting the options on the screen GUI It's done by itself .

  1. Click the desired column

  2. You will see a list of data types . You can select any data type from the drop-down list as needed , This data type will be applied to the entire column .

  3. Next, you can sort the data in ascending or descending order by selecting the options provided .

  4. You can also use custom filters to filter data .

Python Code

# MITO CODE START (DO NOT EDIT)
from mitosheet import * # Import necessary functions from Mito
register_analysis('UUID-cc414267-d9aa-4017-8890-ee3b7461c15b') # Let Mito know which analysis is being run
# Imported edxCourses.csv
import pandas as pd
edxCourses_csv = pd.read_csv('edxCourses.csv')
# Changed coursePrice from int64 to float
edxCourses_csv['coursePrice'] = edxCourses_csv['coursePrice'].astype('float')
# Sorted coursePrice in edxCourses_csv in descending order
edxCourses_csv = edxCourses_csv.sort_values(by='coursePrice', ascending=False, na_position='first')
edxCourses_csv = edxCourses_csv.reset_index(drop=True)
# Filtered coursePrice in edxCourses_csv
edxCourses_csv = edxCourses_csv[edxCourses_csv['coursePrice'] >= 500]
edxCourses_csv = edxCourses_csv.reset_index(drop=True)
# MITO CODE END (DO NOT EDIT)

Charts and statistics generation

You can also generate graphics directly in this extension , Without writing drawing logic . By default , All graphs generated by this extension use  Plotly  To make the . This means that the drawing is interactive , It can be modified immediately .

Be careful , There is no such thing as an operation column , Generate graphic code in the next cell ( Maybe developers will push this code in future updates )

have access to Mito Generate two types of graphs :

1. By clicking the chart button

You will see a sidebar menu , Used to select the drawing type and the corresponding axis to be selected .

2. By clicking on the column name

When you click on the column name in the spreadsheet , You can see the filter and sorting options . But if you navigate to “Summary Stats”, The line chart or bar chart and the summary of variables will be displayed according to the type of variables . This summary is changed to text and has no text variables .

Save and playback

All transformations made to the dataset can be saved and used for other similar datasets . This is in  Excel  Macro or  VBA  In the form of . You can also complete the same operation through these functions .

The file is Python Compiling , Instead of using something more difficult to understand VBA.

Backtrack all steps performed

To repeat the above steps , It's very easy ,Mito Bring their own “ Repeat the saved analysis steps ” function , You can analyze other data in the same way with one click . This feature is the most interesting . You can actually track in  Mitosheet  All transformations applied in . The list of all actions has the appropriate title .

Besides , You can view this particular step ! This means that suppose you change some columns , Then deleted them . You can go back to the undeleted time .

At the end

Come here , Just learn a new tool with Yunduo “Mito”. Used in Python Implement spreadsheet like functions in the environment , And generate the equivalent operation for each step Python Code .


  1. 上一篇文章:
  2. 下一篇文章:
Copyright © 程式師世界 All Rights Reserved