您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Pandas de redo the previous or next drop_ duplicates

編輯：Python

pandas duplicate removal Keep previous or next drop_duplicates

subset Parameters
keep Parameters
inplace Parameters
Example

pandas In the library drop_duplicates() A function is a de duplication artifact , This function can also be used to manually set whether to keep the top record or the bottom record in the de duplication process .

DataFrame.drop_duplicates(self, subset=None, keep='first', inplace=False)[source]

There are three parameters ,subset、keep and inplace

subset Parameters

subset : column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by default use all of the columns

subset Parameter is used to set which column repetition is used as the repetition standard , Parameters are column labels , If the value is not set , The default is to use all columns as the repeated judgment condition .

keep Parameters

keep : {
‘first’, ‘last’, False}, default ‘first’
first : Drop duplicates except for the first occurrence.
last : Drop duplicates except for the last occurrence.
False : Drop all duplicates.

keep It can be set to three parameters , The default is first
first It means to keep the record of the first occurrence
last It means to keep the record of the last occurrence
False Delete all duplicates

inplace Parameters

inplace : boolean, default False
Whether to drop duplicates in place or to return a copy

inplace It can be set to True or False, The default is False
True It means to remove the weight in place , Will change dataframe
False Indicates that a new... Will be returned dataframe, It won't change the original variable

Example

import pandas as pd
data = pd.DataFrame([[1, 'Wang', 20], [2, 'Li', 20], [1, 'Wang', 21], [1, 'Wang', 20]], columns=['id', 'name', 'age'])

The data is

 id name age
0 1 Wang 20
1 2 Li 20
2 1 Wang 21
3 1 Wang 20

Obviously No 0 Article and paragraph 3 Duplicate records , Use the default usage to remove

print(data.drop_duplicates())

The result is

 id name age
0 1 Wang 20
1 2 Li 20
2 1 Wang 21

It is obvious that the first 0 Bar record , And go except for the first 3 Bar record , By setting keep Parameter is last Make it keep the last parameter

print(data.drop_duplicates(keep='last'))

The result is

 id name age
1 2 Li 20
2 1 Wang 21
3 1 Wang 20

And for datasets

 id name age
0 1 Wang 20
1 2 Li 20
2 1 Wang 21
3 1 Wang 20

Think id and name The same is repetition , have access to

print(data.drop_duplicates(['id', 'name']))

obtain

 id name age
0 1 Wang 20
1 2 Li 20

If you want to delete all duplicate data , Then use

print(data.drop_duplicates(['id', 'name'], keep=False))

obtain

 id name age
1 2 Li 20

上一篇文章： Pandas sort_ values
下一篇文章： what? Python too slow? Try the numba library!

Python

[Python] Python讀取百萬級數據自動寫入Mysql數據庫

美圖欣賞2022/06/20 數據練習集user_data.

詳解python的運行方式

python包含兩種運行方式：交互式和腳本式。交互式可以通過

Centos7 arm64 schema switching Python version causes Yum invalidation recovery method and reinstallation of Yum

1. If the following error occu

Self-learning Python 48 date and time functions (3)

Python 日期和時間函數（三）文章目錄Python 日期

A treasure cartoon avatar 50 yuan? 1 line of Python code, dont pay IQ tax again

Hello everyone , I am a Python

「Python循環結構」阿凡提拿工資

案例要求土財主巴依老爺叫阿凡提幫他干活，每個月給他200元。

Un article résume lensemble du processus de modélisation du contrôle des risques financiers (Python)

Comment trouver des valeurs en double dans le dictionnaire de liste Python

Pool de connexion à la base de données dencapsulation Python

【 traitement des exceptions python】: comment traiter les erreurs de déclaration des exceptions?

Vous pouvez utiliser pandas. Méthode du cadre de données pour obtenir la quantification quotidienne des données de la ligne K

Opérateurs Python, problèmes de fonction intégrés

Python de duplication drop_ Reset after duplicates_ index()

Évaluation de larbre des compétences Python

Notepad + + fin de ligne début ajouter une chaîne - Win32 - pipe Technology - Shared Memory Technology - parent Process Data Exchange - How to add a directory in csdn - Python - font to Text

Où est la route? Résumé de lapprentissage Python

熱門圖文

thinkphp3查詢mssql數據庫亂碼解決方法分享 python opencv 圖像處理值類型和引用類型裝箱和拆箱類和結構的異同接口抽象類異同 SpringMVC+Spring+Mybatis整合 POJ 2155 Matrix 二維線段樹 C#向word文檔插入新段落及隱藏段落的方法 Delphi 消息機制引入的一個副作用一次php應用的優化實踐

欄目導航