Multicollinearity, also known as collinearity, is a phenomenon in statistics where one predictor variable in a multiple regression model is highly correlated with other predictor variables
. This can lead to several issues, such as:
- Erratic coefficient estimates : The coefficient estimates of the multiple regression may change erratically in response to small changes in the data
- Wider confidence intervals : Multicollinearity can lead to wider confidence intervals, producing less reliable probabilities
- Incorrect assumptions in technical analysis : In technical analysis, multicollinearity can lead to incorrect assumptions about an investment
- Data-based multicollinearity : This type of multicollinearity occurs when the data is poorly designed or relies on observational data, making it difficult to manipulate the system on which the data is collected
There are two types of multicollinearity:
- Structural multicollinearity : This is a mathematical artifact caused by creating new predictors from existing ones
- Data-based multicollinearity : This type occurs when two or more independent variables are moderately or highly correlated due to the nature of the data under consideration or poorly designed experiments
To detect multicollinearity, you can use tools such as scatter plots and correlation coefficients
. Although multicollinearity can be a problem in some cases, it does not necessarily reduce the predictive power or reliability of the model
. In some situations, it might be better to leave the collinear variables in the model, especially if you are using the model for prediction purposes and do not need to understand the specific role of each variable