Sixth Form Statistics

Statistics

There are eight videos on Sixth Form Statistics.

Mean and Variance

Variance is a measure of spread, which makes it excellent for examining the distribution of masses of jam and Marmite.

Standard deviation is the square root of the variance. Also, variance is the square of the standard deviation.

Whoever considered making deviation standard? It's political correctness gone mad. You couldn't make it up.

When you add 3 to every data point, the mean also goes up by 3 but the standard deviation and the variance remain unchanged. However, when you multiply every data point by 3 the mean and standard deviation are 3 times larger but the variance is 9 times larger.

Correlation and Regression

In the book Oliver, Oliver asks for an extra helping. If only he had learnt that extrapolation was bad, he may not have made such a foolish request.

Context is very important when defining a and b in exam questions. a is the value of y when x = 0 and b is the rate of change.

The closer to 1 or -1 you get, the stronger the correlation. The closer to 0 you get in either direction, the weaker the correlation.

Iron bar is the mean average of all the irons.

Interpolation is good.

The closer to 1 or -1 you get, the stronger the correlation. The closer to 0 you get in either direction, the weaker the correlation.

Sxy is not the same as Sigma xy.

PMCC is the same as r is the same as the correlation coefficient is the same as the product-moment-correlation-coefficient.

The value of r is unchanged by scaling or repositioning the axes.

Discrete Random Variables and Discrete Uniform Distribution

Capital F is the cdf - simply add up all the probabilities up to and including the value indicated.

Should the unfortunate circumstance arise in which one of the probabilities in a discrete random variable table has been omitted, either accidentally or purposefully, all is not lost. Add up the ones you know and subtract from 1. And if two probabilities are missing, you might still be able to rectify the situation via simultaneous equations should you be told E(X).

E(X) = Sigma px. Since sigma is Greek for S, this formula almost reads as ex spex, which in turn sounds a little like x-ray specs. Similarly, Var(X) = Sigma x squared P subtract the mean squared, which unfortunately does not sound anything like the type of product you might be able to purchase from a practical joke shop

In the unexpected scenario in which every value of x running from 1 through to n has the same chance of occurring (aka the discrete uniform distribution), there are special formulae for the mean and variance

Stem and Leaf / Skewness / percentiles

Deciles derive their name from decimals (i.e. base 10) whereas percentiles derive their name from the French phrase 'per cent', meaning 'out of a hundred'.

The nature of skew (positive or negative) is determined by the tail – if the tail drags to the right, this is positive skew (like where the positive x-axis is) and vice versa.

If you put the words mean, median and mode in alphabetical order (like I just did) and then insert inequality symbols appropriate to your data, the inequality symbols will point in the direction of skew.

A back to back stem and leaf diagram is a good way to compare two sets of similar information for different groups.

I live in the leafy suburbs and whenever I wish to represent the village news like missing dustbin lids that stems from the indiscipline of youth these days, I naturally turn to the stem and leaf diagram.

A stem and leaf diagram must always accompanied by a key, lest the data be misinterpreted size-wise.

Outliers can often be guessed but you will usually be given a formula giving a boundary either side of the quartiles which is a multiple of the IQR above and below the quartiles as appropriate.

A box plot consists of five verticals like squeezin’ trees in and four horizontals to make it aesthetically pleasing.

Probability

A tree diagram is a good way to represent events that happen (or don’t happen) consecutively.

Independent events have no effect on each other but can both occur whereas mutually exclusive events cannot happen simultaneously.

John Venn invented the Venn Diagram (according to Wikipedia).

The legitimacy of the symbols for union and intersection (respectively) were recognised in a UN resolution, being the initials letters of said organisation.

The initials of Venn Diagram are VD which unfortunately implies that Venn Diagrams carry a health risk which is not actually present.

Be careful to read probability questions with due care and attention, ensuring absolute certainty over the appropriate denominator.

Imagine a generous benefactor has donated to you, in his/her will, a vertical line. Then you would have been 'given that' vertical line.

U for Union, N for Ntersection.

If you have three overlapping circles on a Venn diagram, try to fill in the middle number first.

Histograms

The frequency of a bar in a histogram is proportional to the area, not the height.

Impress your classmates by using the appropriate class boundaries when plotting a histogram.

If there are gaps between the class limits you need to consider the class boundaries before calculating the class widths.

Normal Distribution

The mean, median and mode of a normal distribution are all equal since the normal distribution is symmetric.

The letters X and Z were chosen for use in the normal distribution in order that typewriter keys would not rust.

The normal distribution was so named before political correctness was invented, because these days nobody would be so insensitive as to suggest that very skinny or very fat people are abnormal.

The normal distribution is often called a bell-shaped curve due to its similarity in shape to a bell.

Modelling

The modelling mantra is ‘Observe, Collect, Compare, Refine’.

The main difference between statistical models and super models is that statistical models don’t strut down catwalks.

A model is the most popular way of representing real life situations in statistical form because the first four letters are 'mode'.

Kraftwerk had a number one hit in the 1980s with a song called The Model which was not in any way related to the process of statistical modelling.

Home