The phenomenon and background of the problem
I have one csv This csv There are many duplicate columns in
I need to judge by one of the columns
If the column has two rows with the same contents Then judge whether other columns are consistent If it's the same Delete If it's not the same The retention And one of the columns is not used as a judgment condition
What I want to achieve
Raw data
- 6
A 22 a b c d e
B 33 c a b d e
C 22 b b c d e
D 44 b c d e f
E 33 c b b d e
F 44 b c d e g
G 55 a b c d e
H 55 a b c d e
What I want to achieve after running
- 6
A 22 a b c d e
B 33 c a b d e
C 22 b b c d e
E 33 c b b d e
In this example, the 1 Columns are reference columns 1 In the column A That's ok C Line contents are equal But these two lines 2 Different columns So these two lines remain
1 In the column B That's ok E Line contents are equal But these two lines 3 Different columns So these two lines remain
1 In the column D That's ok F Line contents are equal But these two lines 6 Different columns And the first 6 The column is not set as a decision condition So delete these two lines
1 In the column G That's ok H Line contents are equal But these two lines 2345 It's the same So these two lines are deleted