How to Calculate Product Moment Correlation Coefficient

The product moment correlation coefficient allows you to work out the linear dependence of two variables (referred to as x and y). An example in economics may be that you are the owner of a restaurant. For every 10th customer you record the time he stayed in your restaurant (x, in minutes) and the amount spend (y, in dollars). Is it generally true that the long stayers are also the bigger spenders? This would be a positive correlation. Or is it actually the other way around, e.g. the richer the client the less time he takes for his lunch? This would be a negative correlation. In order to shed some light on this mystery you can calculate the PMCC.

Edit Steps

Remove incomplete pairs. In the next steps use only the observations where both x and y are known. However do not exclude observations just because one of the values equals zero.

Summarize the data into the values needed for the calculation. These are:
- n - the number of data.
- Σx - the sum of all the x values.
- Σx² - the sum of the squares of the x values.
- Σy - the sum of all the y values.
- Σy² - the sum of the squares of the y values.
- Σxy - the sum of each x value multiplied by its corresponding y value.

Calculate S_xy, S_xx and S_yy using these values.
- S_xy=Σxy-(ΣxΣy÷n)
- S_xx=Σx²-(ΣxΣx÷n)
- S_yy=Σy²-(ΣyΣy÷n)

Insert these values into the equation below to calculate the product moment correlation coefficient (r). The value should be between 1 and -1.
- A value close to 1 implies strong positive correlation. (The higher the x, the higher the y).
- A value close to 0 implies little or no correlation.
- A value close to -1 implies strong negative correlation. (The higher the x, the lower the y).

Edit Video

Edit Tips

Always make a scatter plot. Otherwise you may miss your discovery because the product moment correlation coefficient only takes straight lines into consideration in the business of predicting y from x.

This is the reason why a lot of questionnaires feature the same questions, making them incredibly boring to answer. The researchers often know a lot about question x and question y, but they don't know yet how they are related.

Edit Warnings

Before you state that two variables are correlated make sure the correlation coefficient is statistically significant. That is to say that the calculated correlation coefficient is unlikely to be a result of pure chance. E.g. all your 3 points may lay on the same line, this has a coefficient of +1 or -1, but it would still be inconclusive.

When the coefficient is not significant there is generally no point in reporting its value.

When the correlation is significant you have not proven that one variable "causes" the other. You have only proven that knowledge of the value of x may help to some degree in predicting the value of y or the other way around.