Correlation is a metric which measures the strength of a linear relationship between variables.
While looking at the correlation between two variables, it is also important to visualize it.
The reason is: To see what a correlation value exactly means.
Below we visualize some of Pearson's correlation coefficients.
The above are the scatter plots for different values of a correlation coefficient.
The difference between a correlation of 0.7 and 0.5 is not the same as that of 0.7 and 0.9.
In terms of information content, a correlation of 0.7 is closer to 0.5 than to 0.9.
Similarly, a correlation of 0.5 is closer to 0 than to 1.
What does that mean?
It is more likely that a relationship does not exist.
The variables X and Y follow a standard normal distribution.
We use mutual information to measure the information content.
Mutual information between two gaussian random variables X and Y is given by:
$$ I(X;Y) = -\frac{1}{2}\ln(1-\rho^2) $$
$$ \text{where }\rho\text{ is the correlation coefficient between X and Y}$$
For a correlation of 0.5, I(X;Y) is 0.14384
Similarly, for a correlation of 0.7, I(X;Y) is 0.33667
Also, for a correlation of 0.9, I(X;Y) is 0.83036
Hence, a correlation of 0.7 is closer to 0.5 than to 0.9, in terms of information content.
samples=500
library(MASS)
r=c(0, 0.5, 0.7, 0.9)
par(mfrow=c(2,2))
for( i in 1:length(r))
{
data = mvrnorm(n=samples, mu=c(0, 0), Sigma=matrix(c(1, r[i], r[i], 1), nrow=2), empirical=TRUE)
X = data[, 1] # standard normal (mu=0, sd=1)
Y = data[, 2] # standard normal (mu=0, sd=1)
plot(X,Y, main= paste("Corr = ", r[i]), col="red")
}