您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Common operation guide for dataframe in pandas

編輯：Python

Catalog

Preface

1. Basic use ：

2. data select, del, update.

3. operation .

4. Group by operation .

5. Export to csv file

summary

Preface

Pandas yes Python The next open source data analysis library , The data structure it provides DataFrame It greatly simplifies some cumbersome operations in the process of data analysis .

1. Basic use ：

establish DataFrame. DataFrame It's a two-dimensional table , You can think of it as a Excel The form or Sql surface .

Excel 2007 The maximum number of lines in and after versions is 1048576, The maximum number of columns is 16384, Data beyond this scale Excel A box will pop up “ This text contains multiple lines of text , Cannot be placed in a worksheet ”.

Pandas It is easy to process tens of millions of data sh Thing , And then we'll see that it's better than SQL Have a stronger ability to express , It can do many complicated operations , To write code And less . Said a lot about its benefits , To be practical, you have to code .

The first task is to create a DataFrame, There are several ways to create it ：

list , Sequence (pandas.Series), numpy.ndarray Dictionary

A two-dimensional numpy.ndarray

other DataFrame

Structured records (structured arrays)

among , My favorite is through two-dimensional ndarray establish DataFrame, Because the code is typed least ：

import pandas as pdimport numpy as npdf = pd.DataFrame(np.random.randn( 3 , 4 ))df0 1 2 30 0.236175 - 0.394792 - 0.171866 0.3040121 0.651926 0.989046 0.160389 0.4829362 - 1.039824 0.401105 - 0.492714 - 1.220438

Of course, you can also learn from mysql Database or csv Load data into the file dataframe.

dataframe in index Used to identify rows ,column Identity column ,shape Represent dimension .

# Get row index information df.index# Get column index information df.columns# get df Of sizedf.shape# get df The number of rows df.shape[0]# get df Of Number of columns df.shape[1]# get df The value in df.values

adopt describe Method , We can df Have a general understanding of the data in ：

df.describe()0 1 2 3count 3.000000 3.000000 3.000000 3.000000mean - 0.050574 0.331786 - 0.168064 - 0.144496std 0.881574 0.694518 0.326568 0.936077min - 1.039824 - 0.394792 - 0.492714 - 1.22043825 % - 0.401824 0.003156 - 0.332290 - 0.45821350 % 0.236175 0.401105 - 0.171866 0.30401275 % 0.444051 0.695076 - 0.005739 0.393474max 0.651926 0.989046 0.160389 0.482936

2. data select, del, update.

By column name select:

df[ 0 ]0 0.2361751 0.6519262 - 1.039824

According to the number of lines select:

df[: 3 ] # Before selection 3 That's ok

By index select:

df.loc[ 0 ]0 0.2361751 - 0.3947922 - 0.1718663 0.304012

According to the number of rows and columns select:

df.iloc[ 3 ] # Select the first 3 That's ok df.iloc[ 2 : 4 ] # Select the first 2 To the first 3 That's ok df.iloc[ 0 , 1 ] # Select the first 0 That's ok 1 The elements of the column dat.iloc[: 2 , : 3 ] # Select the first 0 Go to the first place 1 That's ok , The first 0 Column to the first 2 Elements in the column area df1.iloc[[1,3,5],[1,3]] # Select the first 1,3,5 That's ok , The first 1,3 Elements in the column area

Delete a column ：

del df[0]df1 2 30 - 0.394792 - 0.171866 0.3040121 0.989046 0.160389 0.4829362 0.401105 - 0.492714 - 1.220438

Delete a line ：

5df.drop(0)1 2 31 0.989046 0.160389 0.4829362 0.401105 - 0.492714 - 1.2204383. operation .

Basic operation ：

df[ 4 ] = df[ 1 ] + df[ 2 ]1 2 3 40 - 0.394792 - 0.171866 0.304012 - 0.5666591 0.989046 0.160389 0.482936 1.1494352 0.401105 - 0.492714 - 1.220438 - 0.091609

map operation , and python Medium map Some similar ：

df[ 4 ]. map ( int )0 01 12 0

apply operation ：

df. apply ( sum )1 0.9953592 - 0.5041923 - 0.4334894 0.4911674. Group by operation .

pandas Medium group by Operation is my favorite , No need to import the data excel perhaps mysql Can be flexible group by operation , The analysis process is simplified .

df[ 0 ] = [ 'A' , 'A' , 'B' ]df1 2 3 4 00 - 0.394792 - 0.171866 0.304012 - 0.566659 A1 0.989046 0.160389 0.482936 1.149435 A2 0.401105 - 0.492714 - 1.220438 - 0.091609 Bg = df.groupby([ 0 ])g.size()A 2B 1g. sum ()1 2 3 40A 0.594254 - 0.011478 0.786948 0.582776B 0.401105 - 0.492714 - 1.220438 - 0.091609

5. Export to csv file

dataframe have access to to_csv Methods can be easily exported to csv In file , If the data contains Chinese , commonly encoding Designated as ”utf-8″, Otherwise, the program will throw an exception because it cannot recognize the corresponding string when exporting ,index Designated as False Indicates that you do not need to export dataframe Of index data .

df.to_csv(file_path, encoding='utf-8', index=False)df.to_csv(file_path, index=False) summary

This is about Pandas in DataFrame This is the end of the operation article , More about Pandas DataFrame Please search the previous articles of the software development network or continue to browse the relevant articles below. I hope you will support the software development network more in the future ！