I have a huge data set and prior to machine learning modeling it is always suggested that first you should remove highly correlated descriptors(columns) how can i calculate the column wice correlation and remove the column with a threshold value say remove all the columns or descriptors having >0.8 correlation.<\/i> I have a huge data set , Before machine learning modeling , It is always recommended that you delete highly relevant descriptors first ( Column ) How do I calculate column dependencies and delete columns with thresholds , For example, delete all columns or descriptors that have > 0.8 The relevance of .<\/b> also it should retained the headers in reduce data..<\/i> It should also keep the headings in the reduced data ..<\/b><\/p>
Example data set<\/i> Sample datasets <\/b><\/p>
GA PN PC MBP GR AP 0.033 6.652 6.681 0.194 0.874 3.177 0.034 9.039 6.224 0.194 1.137 3.4 0.035 10.936 10.304 1.015 0.911 4.9 0.022 10.11 9.603 1.374 0.848 4.566 0.035 2.963 17.156 0.599 0.823 9.406 0.033 10.872 10.244 1.015 0.574 4.871 0.035 21.694 22.389 1.015 0.859 9.259 0.035 10.936 10.304 1.015 0.911 4.5