您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

How to set and modify Python pandas index

編輯：Python

Catalog

Preface

Create index

pd.Index

pd.IntervalIndex

pd.CategoricalIndex

pd.DatetimeIndex

pd.PeriodIndex

pd.TimedeltaIndex

Reading data

set_index

reset_index

set_axis

Operation line index

Operation column index

rename

The dictionary form

Function form

Use cases

Count the total consumption by day

By day 、 Gender statistics mean tips , Total consumption

Stupid method

summary

Preface

This article mainly introduces Pandas Row and column indexes in 4 Multiple function operations ：

set_index

reset_index

set_axis

rename

Create index

A quick review Pandas Common methods of creating indexes ：

pd.Index

In [1]:

import pandas as pdimport numpy as np

In [2]:

# Specify the type and name s1 = pd.Index([1,2,3,4,5,6,7], dtype="int", name="Peter")s1

Out[2]:

Int64Index([1, 2, 3, 4, 5, 6, 7], dtype='int64', name='Peter')pd.IntervalIndex

New interval index IntervalIndex Usually use interval_range() Function to construct , It uses data or numerical intervals , Basic usage ：

In [3]:

s2 = pd.interval_range(start=0, end=6, closed="left")s2

Out[3]:

IntervalIndex([[0, 1), [1, 2), [2, 3), [3, 4), [4, 5), [5, 6)], closed='left', dtype='interval[int64]')pd.CategoricalIndex

In [4]:

s3 = pd.CategoricalIndex( # Data to be sorted ["S","M","L","XS","M","L","S","M","L","XL"], # Specify the classification order categories=["XS","S","M","L","XL"], # Demand arrangement ordered=True, # Index name name="category")s3

Out[4]:

CategoricalIndex(['S', 'M', 'L', 'XS', 'M', 'L', 'S', 'M', 'L', 'XL'], categories=['XS', 'S', 'M', 'L', 'XL'], ordered=True, name='category', dtype='category')

pd.DatetimeIndex

Indexed by time and date , adopt date_range Function to generate , Specific examples are ：

In [5]:

# Date as index ,D On behalf of heaven s4 = pd.date_range("2022-01-01",periods=6, freq="D")s4

Out[5]:

DatetimeIndex(['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04','2022-01-05', '2022-01-06'], dtype='datetime64[ns]', freq='D')

pd.PeriodIndex

pd.PeriodIndex It is an index for periodic data , It is convenient to process data with a certain period , The usage is as follows ：

In [6]:

s5 = pd.PeriodIndex(['2022-01-01', '2022-01-02', '2022-01-03', '2022-01-04'], freq = '2H')s5

Out[6]:

PeriodIndex(['2022-01-01 00:00', '2022-01-02 00:00', '2022-01-03 00:00','2022-01-04 00:00'], dtype='period[2H]', freq='2H')

pd.TimedeltaIndex

In [7]:

data = pd.timedelta_range(start='1 day', end='3 days', freq='6H')data

Out[7]:

TimedeltaIndex(['1 days 00:00:00', '1 days 06:00:00', '1 days 12:00:00', '1 days 18:00:00', '2 days 00:00:00', '2 days 06:00:00', '2 days 12:00:00', '2 days 18:00:00', '3 days 00:00:00'], dtype='timedelta64[ns]', freq='6H')

In [8]:

s6 = pd.TimedeltaIndex(data)s6

Out[8]:

TimedeltaIndex(['1 days 00:00:00', '1 days 06:00:00', '1 days 12:00:00', '1 days 18:00:00', '2 days 00:00:00', '2 days 06:00:00', '2 days 12:00:00', '2 days 18:00:00', '3 days 00:00:00'], dtype='timedelta64[ns]', freq='6H')

Reading data

Let's pass a Simple data to explain 4 The use of a function . The data are as follows ：

set_index

Set single level index

In [10]:

# Set single level index df1 = df.set_index("name")df1

We found that df1 The index of has become name Related values of fields .

Here's how to set up a multi-level index ：

# Set two-tier index df2 = df.set_index(["sex","name"])df2

reset_index

Reset the index ：

Reset for multi tier indexes ：

The multi-layer index is directly modified in situ ：

set_axis

Assign the required data axis axis. among axis=0 Represents the direction of the bank ,axis=1 Represents the direction of the column .

Two different ways of writing ：

axis=0 Equivalent to axis="index"axis=1 Equivalent to axis="columns" Operation line index

Use index The effect is the same ：

The original df2 There is no change . If we want the change to take effect , It can also be modified directly in situ ：

Operation column index

in the light of axis=1 perhaps axis="columns" Operation in direction .

1、 Directly pass in the new name we need to modify ：

Use axis="columns" The effect is the same ：

It can also be modified directly in situ ：

rename

Rename the row index or column index , Suppose our raw data are as follows ：

The dictionary form

1、 Modify through the dictionary form of one or more attributes passed in ：

In [29]:

# Modify the index of a single column ; Non in situ modification df2.rename(columns={"Sex":"sex"})

Modify the names of multiple column attributes at the same time ：

Function form

2、 Modify through the passed in function ：

In [31]:

# Passing in functions df2.rename(str.upper, axis="columns")

You can also use anonymous functions lambda：

# All in lowercase df2.rename(lambda x: x.lower(), axis="columns")

Use cases

In [33]:

Here we use the visualization Library plotly_express Self contained data sets in the library tips：

import plotly_express as pxtips = px.data.tips() tips

Count the total consumption by day

In [34]:

df3 = tips.groupby("day")["total_bill"].sum()df3

Out[34]:

dayFri 325.88Sat 1778.40Sun 1627.16Thur 1096.33Name: total_bill, dtype: float64

In [35]:

We found that df3 It's actually a Series Data of type ：

type(df3) # Series Data of type

Out[35]:

pandas.core.series.Series

In [36]:

So let's go through reset_index The function turns it into DataFrame data ：

df4 = df3.reset_index()df4

Let's rename the index in the column direction ：

In [37]:

# Directly modify in situ df4.rename(columns={"day":"Day", "total_bill":"Amount"}, inplace=True)df4

By day 、 Gender statistics mean tips , Total consumption

In [38]:

df5 = tips.groupby(["day","sex"]).agg({"tip":"mean", "total_bill":"sum"})df5

We found that df5 yes df5 Is a data frame with multiple indexes ：

In [39]:

type(df5)

Out[39]:

pandas.core.frame.DataFrame

We can choose to reset one of the indexes ：

While resetting the index , Directly discard the original field information ： Below sex The message was deleted

In [41]:

df5.reset_index(["sex"],drop=True) # Non in situ modification

The index in the column direction is directly modified in place ：

df5.reset_index(inplace=True) # Modify in place df5

Stupid method

Finally, a stupid method is introduced to modify the name of the column index ： Is to assign all the new names to the data frame in the form of a list columns attribute

It is very convenient to use when the number of column indexes is small , If there are too many, it is not recommended to use .

summary

This is about Python pandas This is the end of the article on index setting and modification , More about pandas Please search the previous articles of SDN or continue to browse the related articles below. I hope you can support SDN more in the future ！