您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Python uses pandas to implement filtering function

編輯：Python

Catalog

1 Filter out the specified rows of data

2 Filter out all data records whose data column is a certain value

3 Pattern matching

4 Range interval value filtering

5 Get a value of a row or column

6 Get the original numpy Two dimensional array

7 Get the position of a line element according to the condition

8 Element location filtering

9. Delete multiple lines / Multiple columns

10 to_datetime Convert string format to date format

11 apply() function

12 map() function

Reference resources

summary

1 Filter out the specified rows of data

data=df.loc[2:5] # there [2:5] It means the first one 3 Go to the first place 5 Row content ,[] The first starting point is 0, Represents the first row of data

2 Filter out all data records whose data column is a certain value

data = df[(df[' Name 1']== ‘ The column value 1')]# When multiple conditions match data_many=df[(df[' Name 1']== ‘ The column value 1')&(df[' Name 2']==‘ The column value 2')]# When multiple values match data_many=df[df[' Name 1'] in [‘ value 1',‘ value 2',......]]

3 Pattern matching

# Pattern matching with a value at the beginning cond=df[' Name '].str.startswith(' value ')$ Pattern matching with a value in the middle cond=df[' Name '].str.contains(' value ')

4 Range interval value filtering

# Filter out data based on between two values ：cond=df[(df[' Name 1']>‘ The column value 1')&(df[' Name 1']<‘ The column value 2')]

5 Get a value of a row or column print(ridership_df.loc['05-05-11','R003'])# perhaps print(ridership_df.iloc[4,0])# result :16086 Get the original numpy Two dimensional array print(df.values)7 Get the position of a line element according to the condition

import pandas as pddf = pd.DataFrame({'BoolCol': [1, 2, 3, 3, 4],'attr': [22, 33, 22, 44, 66]},index=[10,20,30,40,50])print(df)a = df[(df.BoolCol==3)&(df.attr==22)].index.tolist()b = df[(df.BoolCol==3)&(df.attr==22)].index[0]c = df[(df.BoolCol==3)&(df.attr==22)].index.valuesprint(a)

8 Element location filtering

print(date_frame) # Print the full display print(date_frame.shape) # obtain df The number of rows 、 Row number ancestor print(date_frame.head(2)) # front 2 That's ok print(date_frame.tail(2)) # after 2 That's ok print(date_frame.index.tolist()) # Get only df Index list of print(date_frame.columns.tolist()) # Get only df List of column names print(date_frame.values.tolist()) # Get only df A list of all values of （ 2 d list ）

9. Delete multiple lines / Multiple columns

# The premise of use is ,dataframe Of index and columns Using numbers , Take advantage of drop（） and range() function .DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')# axis = 0, Indicates to delete the row ; axis = 1 Indicates that the column is deleted .# Want to delete multiple lines / Column , use range that will do , For example, before deleting 3 That's ok ,drop(range(0,3),axis = 0( Default to zero , Don't write )) that will do .

10 to_datetime Convert string format to date format

import datetimeimport pandas as pddictDate = {'date': ['2019-11-01 19:30', '2019-11-30 19:00']}df = pd.DataFrame(dictDate)df['datetime'] = pd.to_datetime(df['date'])df['today'] = df['datetime'].apply(lambda x: x.strftime('%Y%m%d'))df['tomorrow'] = (df['datetime'] + datetime.timedelta(days=1)).dt.strftime('%Y%m%d')

11 apply() function

# pandas Of apply() Functions can act on Series Or the whole thing DataFrame, The function is also to automatically traverse the whole Series perhaps DataFrame, Run the specified function for each element .def add_extra(nationality, extra): if nationality != " han ": return extra else: return 0df['ExtraScore'] = df.Nationality.apply(add_extra, args=(5,))df['ExtraScore'] = df.Nationality.apply(add_extra, extra=5)df['Extra'] = df.Nationality.apply(lambda n, extra : extra if n == ' han ' else 0, args=(5,))def add_extra2(nationaltiy, **kwargs): return kwargs[nationaltiy]df['Extra'] = df.Nationality.apply(add_extra2, han =0, return =10, hidden =5)

12 map() function

import datetimeimport pandas as pddef f(x): x = str(x)[:8] if x !='n': gf = datetime.datetime.strptime(x, "%Y%m%d") x = gf.strftime("%Y-%m-%d") return xdef f2(x): if str(x) not in [' ', 'nan']: dd = datetime.datetime.strptime(str(x), "%Y/%m/%d") x = dd.strftime("%Y-%m-%d") return x def test(): df = pd.DataFrame() df1 = pd.read_csv("600694_gf.csv") df2=pd.read_csv("600694.csv") df['date1'] =df2['DateTime'].map(f2) df['date2'] =df1['date'].map(f) df.to_csv('map.csv')

Reference resources

Pandas And DataFrame operation

pandas.DataFrame.drop — pandas 1.4.1 documentation

pandas apply() The usage function

pandas.Series.apply — pandas 1.4.1 documentation

summary

This is about python Use pandas This is the end of the article on how to implement the filtering function , More about pandas Please search the previous articles of the software development network or continue to browse the relevant articles below. I hope you will support the software development network in the future ！