Correlation : Theoretical background
Cours : Correlation : Theoretical background. Recherche parmi 300 000+ dissertationsPar bambou06 • 21 Avril 2020 • Cours • 746 Mots (3 Pages) • 453 Vues
Correlation[pic 1]
Theoretical background
Correlation is a concept derived from biology. It is through the work of Francis Galton that the correlation becomes a statistical concept. It is then Karl Pearson who proposes in 1896 a mathematical formula for the notion of correlation and an estimator of this magnitude.
Principle of correlation method
The correlation is a bivariate method used to cross two variables X and Y to detect a possible relation between them. However, X and Y are not necessarily causal therefore a correlation doesn't necessarily imply a causality.
For example: the fact that reading ability is highly correlated with the IQ doesn't mean that reading ability determines the IQ of the individual and vice versa.
The correlation coefficient between two real random variables X and Y each having a variance, denoted cov (X; Y) or r, is defined by:
[pic 2]
Pearson’s Correlation Coefficient « r »
The Pearson correlation is a parametric test that crosses two discrete or continuous quantitative variables. This test will characterize the existence or absence of a relationship between these two samples of values taken from the same group of subjects.
This correlation is expressed by the coefficient "r" which indicates the direction (postive or negative) and the intensity of this connection.
There are three types of correlation:
- Positive or negative:
[pic 3]
- Positive correlation: A positive correlation between two variables indicates that when one variable increases the other increases as well. It is represented by a positive correlation coefficient.
- Negative correlation: A positive correlation between two variables indicates that as one variable increases the other increases decreases and vice versa. It is represented by a negative correlation coefficient.
- Linear or non-linear.
- Simple, partial and multiple correlations:
- Simple: When two variables in correlation are taken in to study, it is called simple correlation.
- Partial: When one variable is a factor variable and with respect to that factor variable, the correlation of the variable is considered, then it is a partial correlation.
- Multiple: When multiple variables are considered for correlation, then they are called multiple correlations.
The degree of correlation (r) is measured on a scale of -1 to 1:
- 0 means a total absence of correlation between the two measures.
- 0.1
- 0.3
- r > 0.5 means a strong correlation.
- 1 means a perfect correlation, it means that by knowing the value of one measure allows us to know exactly the value of the other.
[pic 4]
Assumptions
When the analyse chose is the Pearson’s correlation, part of the process involves checking to make sure that the data can actually be analysed using Pearson’s correlation.
- Each variable should be continuous.
- A linear relationship between those two variables.
- No significant outliers. Outliers are simply single data points within the data chose.
- Approximately normally distributed
If those assumptions are not present, it is possible to analyse the dataset by using Spearman correlation which is a non-parametric alternative to Pearson’s correlation coefficient
...