

A value of r close to 0: indicates that the 2 variables are not correlated (no linear relationship exists between them).A value of r close to -1: means that there is negative correlation between the variables (when one increases the other decreases and vice versa).R is a number between -1 and 1 (-1 ≤ r ≤ 1): The correlation coefficient r can help us quantify the linear relationship between 2 variables. Correlation coefficient, correlation matrix and VIF Others, such as An Introduction to Statistical Learning by Gareth James et al. Note that because multicollinearity is a special case of collinearity, some textbooks refer to both situations as collinearity such as: Regression Modeling Strategies by Frank Harrell and Clinical Prediction Models by Ewout Steyerberg. This last case is sometimes referred to as multicollinearity. So collinearity can exist either because a pair of predictors are correlated or because 3 or more predictors are linearly related to each other. In fact, collinearity is a more general term that also covers cases where 2 or more independent variables are linearly related to each other. The strong correlation between 2 independent variables will cause a problem when interpreting the linear model and this problem is referred to as collinearity. The result is that their coefficients will become less exact and less interpretable. However, when correlation exists between the predictors, we can no longer determine the effect of 1 while holding the other constant because the 2 variables change together. So, a strong correlation between these variables is considered a good thing. One important assumption of linear regression is that a linear relationship should exist between each predictor X i and the outcome Y. In this article, we’re going to discuss correlation, collinearity and multicollinearity in the context of linear regression:

Multicollinearity is a special case of collinearity where a strong linear relationship exists between 3 or more independent variables even if no pair of variables has a high correlation predictors) have a strong linear relationship Here’s a table that summarizes the differences between correlation, collinearity and multicollinearity:Ĭorrelation refers to the linear relationship between 2 variablesĬollinearity refers to a problem when running a regression model where 2 or more independent variables (a.k.a.
