# S-Cool Revision Summary

## S-Cool Revision Summary

Sometimes in statistics we need to compare 2 sets of data from the same source, we can do this by means of a scatter diagram. To plot a scatter diagram for 2 sets of data x and y, we plot each pair of corresponding points. Data like this is called **bivariate data**.

#### Regression lines

If our scatter diagram shows a correlation between the 2 sets of data then we can add a line of regression. These are pretty much 'lines of best fit' but if we have a fair degree of scatter we often draw/calculate 2 regression lines. These lines are:

**1. Regression line y on x.**

Used to estimate y, taking x to be accurate. This line is calculated by finding the least sum of the squares of the vertical distances from the points. Let's look at the following diagram to explain this.

The vertical distance from each point to the line is squared and added to each other result. The line which has the least total will be the regression line y on x.

**2. Regression line x on y.**

Used to estimate x, taking y to be accurate. This line is calculated by finding the least sum of the squares of the horizontal distances from the points. The following diagram will explain this further.

The horizontal distance from each point to the line is squared and added to each other result. The line which has the least total will be the regression line x on y.

#### Calculating equations of regression lines

*Regression line y on x*

This equation will have the formula: **y = a + bx** (not surprisingly this is the equation of a straight line!)

Where **a** and **b** are calculated using the formulae:

*Regression line x on y*

This equation will have the formula: **x = a' + b'y**

Where **a** and **b** are calculated using these formulae:

Note the similarities between S_{xy}, S_{xx} and S_{yy} in these formulae. The formulae may look tricky but in actual fact are quite easy and straightforward to use. The following example will demonstrate this.

#### Independent/dependent variables

If our data **x** looks to be controlled where **y** appears to be dependent on an experiment and **x**, then we say that **x** is an independent variable and **y** a dependent variable. As **x** appears controlled and accurate we only need to calculate the **regression line y on x**.

#### The product moment correlation coefficient

The product moment correlation coefficient, **r**, is a measure of the degree of scatter.

The value of **r** will lie between -1 and 1. If the correlation is positive and the points lie exactly on a straight line, then both regression lines coincide and **r = 1**. Where:

#### Spearman's Coefficient of rank correlation

Spearman's Coefficient of rank correlation, **r _{s}**, is another value that measures the spread of our scatter. Like the product moment correlation coefficient, r, the value of r

_{s}lies between -1 and 1 and the sort of correlation obtained for various values of r

_{s}is the same as r.

Spearman's Coefficient of rank correlation, r_{s} is an approximation to the product moment correlation coefficient and is calculated by a process of ranking the data in order of size.

**The formula used to calculate Spearman's Coefficient of rank correlation, r _{s} is:**

**n** = number of items to be ranked

**d** = rank difference

The rank difference (d) is the positive difference between the 2 rank values given for that object