1 Filter out the specified rows of data
2 Filter out all data records whose data column is a certain value
3 Pattern matching
4 Range interval value filtering
5 Get a value of a row or column
6 Get the original numpy Two dimensional array
7 Get the position of a line element according to the condition
8 Element location filtering
9. Delete multiple lines / Multiple columns
10 to_datetime Convert string format to date format
11 apply() function
12 map() function
Reference resources
summary
1 Filter out the specified rows of datadata=df.loc[2:5] # there [2:5] It means the first one 3 Go to the first place 5 Row content ,[] The first starting point is 0, Represents the first row of data
2 Filter out all data records whose data column is a certain value data = df[(df[' Name 1']== ‘ The column value 1')]# When multiple conditions match data_many=df[(df[' Name 1']== ‘ The column value 1')&(df[' Name 2']==‘ The column value 2')]# When multiple values match data_many=df[df[' Name 1'] in [‘ value 1',‘ value 2',......]]
3 Pattern matching # Pattern matching with a value at the beginning cond=df[' Name '].str.startswith(' value ')$ Pattern matching with a value in the middle cond=df[' Name '].str.contains(' value ')
4 Range interval value filtering # Filter out data based on between two values :cond=df[(df[' Name 1']>‘ The column value 1')&(df[' Name 1']<‘ The column value 2')]
5 Get a value of a row or column print(ridership_df.loc['05-05-11','R003'])# perhaps print(ridership_df.iloc[4,0])# result :1608
6 Get the original numpy Two dimensional array print(df.values)
7 Get the position of a line element according to the condition import pandas as pddf = pd.DataFrame({'BoolCol': [1, 2, 3, 3, 4],'attr': [22, 33, 22, 44, 66]},index=[10,20,30,40,50])print(df)a = df[(df.BoolCol==3)&(df.attr==22)].index.tolist()b = df[(df.BoolCol==3)&(df.attr==22)].index[0]c = df[(df.BoolCol==3)&(df.attr==22)].index.valuesprint(a)
8 Element location filtering print(date_frame) # Print the full display print(date_frame.shape) # obtain df The number of rows 、 Row number ancestor print(date_frame.head(2)) # front 2 That's ok print(date_frame.tail(2)) # after 2 That's ok print(date_frame.index.tolist()) # Get only df Index list of print(date_frame.columns.tolist()) # Get only df List of column names print(date_frame.values.tolist()) # Get only df A list of all values of ( 2 d list )
9. Delete multiple lines / Multiple columns # The premise of use is ,dataframe Of index and columns Using numbers , Take advantage of drop() and range() function .DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')# axis = 0, Indicates to delete the row ; axis = 1 Indicates that the column is deleted .# Want to delete multiple lines / Column , use range that will do , For example, before deleting 3 That's ok ,drop(range(0,3),axis = 0( Default to zero , Don't write )) that will do .
10 to_datetime Convert string format to date format import datetimeimport pandas as pddictDate = {'date': ['2019-11-01 19:30', '2019-11-30 19:00']}df = pd.DataFrame(dictDate)df['datetime'] = pd.to_datetime(df['date'])df['today'] = df['datetime'].apply(lambda x: x.strftime('%Y%m%d'))df['tomorrow'] = (df['datetime'] + datetime.timedelta(days=1)).dt.strftime('%Y%m%d')
11 apply() function # pandas Of apply() Functions can act on Series Or the whole thing DataFrame, The function is also to automatically traverse the whole Series perhaps DataFrame, Run the specified function for each element .def add_extra(nationality, extra): if nationality != " han ": return extra else: return 0df['ExtraScore'] = df.Nationality.apply(add_extra, args=(5,))df['ExtraScore'] = df.Nationality.apply(add_extra, extra=5)df['Extra'] = df.Nationality.apply(lambda n, extra : extra if n == ' han ' else 0, args=(5,))def add_extra2(nationaltiy, **kwargs): return kwargs[nationaltiy]df['Extra'] = df.Nationality.apply(add_extra2, han =0, return =10, hidden =5)
12 map() function import datetimeimport pandas as pddef f(x): x = str(x)[:8] if x !='n': gf = datetime.datetime.strptime(x, "%Y%m%d") x = gf.strftime("%Y-%m-%d") return xdef f2(x): if str(x) not in [' ', 'nan']: dd = datetime.datetime.strptime(str(x), "%Y/%m/%d") x = dd.strftime("%Y-%m-%d") return x def test(): df = pd.DataFrame() df1 = pd.read_csv("600694_gf.csv") df2=pd.read_csv("600694.csv") df['date1'] =df2['DateTime'].map(f2) df['date2'] =df1['date'].map(f) df.to_csv('map.csv')
Reference resources Pandas And DataFrame operation
pandas.DataFrame.drop — pandas 1.4.1 documentation
pandas apply() The usage function
pandas.Series.apply — pandas 1.4.1 documentation
summaryThis is about python Use pandas This is the end of the article on how to implement the filtering function , More about pandas Please search the previous articles of the software development network or continue to browse the relevant articles below. I hope you will support the software development network in the future !