I was curious about this and made a few tests. I’ve trained a model on the diamonds dataset, and observed that the variable “x” is the most important to predict whether the price of a diamond is higher than a certain threshold. Then, I’ve added multiple columns highly correlated to x, ran the same model, and observed the same values. It seems that when the correlation between two columns is 1, xgb