numpy庫中提供了argsort()函數用於排序,而pandas庫則提供了sort_values()函數用於排序
DataFrame.sort_values(self, by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last')[source]
一共有六個參數,by、axis、ascending、inplace、kind和na_position
by : str or list of str
Name or list of names to sort by.
if axis is 0 or ‘index’ then by may contain index levels and/or column labels
if axis is 1 or ‘columns’ then by may contain column levels and/or index labels
Changed in version 0.23.0: Allow specifying index or column level names.
如果axis=0,那麼by參數為列標簽,縱向排序;
如果axis=1,那麼by參數為行標簽,橫向排序;
axis : {0 or ‘index’, 1 or ‘columns’}, default 0
Axis to be sorted.
選擇按行排序、還是按列排序
axis默認為0,0表示為縱向排序
axis為1,表示為橫向排序
ascending : bool or list of bool, default True
Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by.
默認為True表示升序,為False表示降序,若by參數是一個列表,則ascending參數可為一個相同長度的列表,指定其中每個標簽的升降序規則
inplace : bool, default False
If True, perform operation in-place.
inplace參數默認為False,若為True,則用排序後的數據代替原數據
kind : {‘quicksort’, ‘mergesort’, ‘heapsort’}, default ‘quicksort’
Choice of sorting algorithm. See also ndarray.np.sort for more information. mergesort is the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label.
選擇哪一種排序算法,默認為快速排序
na_position : {‘first’, ‘last’}, default ‘last’
Puts NaNs at the beginning if first; last puts NaNs at the end.
把缺失值放在什麼位置,默認為last,即把缺失值放在最後,可設置為first即把缺失值放在最前面
import pandas as pd
data = pd.DataFrame([[1, 'Wang', 20], [2, 'Li', 20], [1, 'Wang', 21], [1, 'Wang', 20]], columns=['id', 'name', 'age'])
數據為
id name age
0 1 Wang 20
1 2 Li 20
2 1 Wang 21
3 1 Wang 20
按id和age進行排序,id升序,age降序
data = data.sort_values(['id', 'age'], ascending=[True, False])
結果是
id name age
2 1 Wang 21
0 1 Wang 20
3 1 Wang 20
1 2 Li 20
按行排序,讓在每一行出現從小到大的順序
data = data.sort_values(0, axis=1)
結果是
id age name
2 1 21 Wang
0 1 20 Wang
3 1 20 Wang
1 2 20 Li