您现在的位置：程式師世界 >> 編程語言 > >> 更多編程語言 >> Python

Pandas determines the information of other columns according to a certain column and selects deletion and retention

編輯：Python

The phenomenon and background of the problem

I have one csv This csv There are many duplicate columns in
I need to judge by one of the columns
If the column has two rows with the same contents Then judge whether other columns are consistent If it's the same Delete If it's not the same The retention And one of the columns is not used as a judgment condition

What I want to achieve

Raw data

1. 1. 1. 1. 6
        A 22 a b c d e
        B 33 c a b d e
        C 22 b b c d e
        D 44 b c d e f
        E 33 c b b d e
        F 44 b c d e g
        G 55 a b c d e
        H 55 a b c d e

What I want to achieve after running

1. 1. 1. 1. 6
        A 22 a b c d e
        B 33 c a b d e
        C 22 b b c d e
        E 33 c b b d e
        In this example, the 1 Columns are reference columns 1 In the column A That's ok C Line contents are equal But these two lines 2 Different columns So these two lines remain
        1 In the column B That's ok E Line contents are equal But these two lines 3 Different columns So these two lines remain
        1 In the column D That's ok F Line contents are equal But these two lines 6 Different columns And the first 6 The column is not set as a decision condition So delete these two lines
        1 In the column G That's ok H Line contents are equal But these two lines 2345 It's the same So these two lines are deleted