Tutorial 3

The third lecture of this module introduced the framework of Null Hypothesis Significance Testing and, on an example of arm span in elite climbers vs non-climbers, illustrated how it can be used to make an informed guess about what kind of world it is that we live in. In the lecture, we likened the competing hypotheses (the alternative hypothesis \(H_1\) and the null hypothesis \(H_0\)) to parallel universes or alternative realities. In the universe of \(H_0\), there is no effect, no relationship, no difference of interest, while in the universe \(H_1\), these effects exist.

Let’s stretch this fanciful theme further and imagine a scenario where we can do some real-life NHST…

 

So, you’re in a pub.

A year into the COVID-19 pandemic1, this is as fantastic a scenario as we can fathom!

So, you’re in a pub with your friends, there’s no need for face masks, people are clicking their glasses, and talking to each other at a high volume and short distance.

What are you drinking?

Just water

You’re sipping on your beverage of choice when, suddenly, you find yourself transfixed at the scene playing out in the corner of the room. Two middle-aged guys, their polo shirts heaving to contain their respectable beer bellies, are playing a game of darts.

Honestly, that’s the kind of pub you go to after a year of confinement?!

In any case, here you are. In an old pub with a sticky carpet, watching dudes throw darts. You have no idea why you find the sight so mesmerising, you’ve never really had any interest in darts; frankly, you don’t even know what the goal of the game is! But there’s just something about it…

After a while, you think to yourself that these chaps must either not be very good or they’ve had a few more than just a few pints, because they seem to be missing the centre rather a lot. But then you think, maybe they’re not trying to hit the centre! Maybe the optimal strategy for the game requires that you aim for something else than the bullseye.

You are intrigued so you decide to investigate further. For a brief moment, you consider walking up to them and asking about the rules of the game but quickly dismiss the idea to actually talk to people. Oh no, you’re in this on your own and you ain’t need no men!

But you are not afraid. You put on your quantitative researcher hat and boldly go where seemingly only middle-aged men have gone before. (Other people have squeezed past on their way to the loo but that’s beside the point! Stay focused!)

Formulate a hypothesis

Conceptual

As a researcher, you know that the first thing to do is to come up with a hypothesis. After pondering your options for a moment, you settle on:

“The optimal strategy for this game – whatever the game is – is not to aim at the centre of the dart board.”

You know this is a good hypothesis: It’s specific, simple, and, in principle, testable. However, it doesn’t really comment on where a player should be aiming. And of course it doesn’t! Without a prior knowledge of the game, there’s really no point guessing. It does occur to you to check out the rules of the game but where’s the fun in that?!

So yes, it’s a good hypothesis but it’s not obvious what kind of data/observation would support it best. You quickly realise that a way out of this problem is to find yourself a null hypothesis to test.

Task 1

You consider a few hypotheses but only one can be right null for you.

The optimal strategy is to aim for the centre

Operational

Though you have your hypotheses figured out, you know that’s not the end of it just yet. In order to be able to gather data to test your hypothesis, you first need to operationalise your variables.

Task 2

You briefly remind yourself what operationalisation means

Definition in terms of measurement

Task 3

You take a few moments to think about how you could operationalise “aiming for the centre” in measurable terms.

Quantitative methodology is about deriving generalisable knowledge. When we apply it, we always have to engage in a level of abstraction and simplification. In our case, there are many different ways in which you could miss the centre: you could overshoot, undershoot, veer to the left or right, or any combination of the above. In order to be able to measure what we need, we need to break things down a little.

A dart board is a 2-dimensional plane: it has a height and a width:

We can describe any point on the board as a combination of horizontal and vertical coordinates. Let’s say that:

  • [0, 0] is centre,
  • [0, 1] is 1 cm above the centre,
  • [3, -2] is 3 cm to the right of and 2 cm below the centre, and so on…

This coordinate system allows us to express any throw of the dart as a set of only two numbers.

While this system is substantially simpler than what we started with, two numbers is still one number too many. There are ways of expressing distance in a 2D space as a single number but, for the sake of simplicity, let’s simply ignore the horizontal dimension and only measure the vertical distance from the centre. For the purpose of this piece of research, we’re simply not going to care if someone missed to the right or to the left; we’re only going to record how far above or below the bullseye they hit.

So, our operational definition of missing is going to be the distance in centimetres of the dart from the bullseye along the vertical dimension.

Task 4

Then, you restate your conceptual hypotheses as operational ones.

\(H_1:\) The mean vertical distance of a dart from the centre of the dart board is not zero.

\(H_0:\) The mean vertical distance of a dart from the centre of the dart board is zero.

Statistical

You are close and you can feel it. Only one step till you have hypotheses fit for some stats love!

Task 5

You thought of the mathematical relationship contained in your operationalisation and figured out what your statistical null hypothesis looks like.

(you add the correct arithmetic operator)

=

\(H_0: \mu_{\text{dist}}\) \(0\ \)  Correct!That’s not right…

Your \(H_0\) is about aiming for the centre and so, on average, you expect the distance from the centre to be equal to zero: People are equally likely to undershoot as they are to overshoot.

At last, you are ready to start testing your hypothesis!

Before you do, though, you ponder the two parallel worlds of your two hypotheses, focusing on what a dartboard full of darts would look like if either of the hypotheses was true:

\(H_0:\) The optimal strategy is to aim for the centre.

\(H_1:\) The optimal strategy is NOT to aim for the centre.

Inspecting this mental image, you realise that you’ll need statistics to figure out whether or not the guys are aiming for the bullseye. After all, there is always some uncertainty and so even if they are trying to hit the middle, maybe in the short window that you’ll be observing them, they’ll go through an unlucky streak and the average distance from the middle in the sample of dart throws will not be representative of its population!

Setting an \(\alpha\)

You’ve now firmly resolved to use a statistical test of the null hypothesis and so you know you need to settle on a significance level (\(\alpha\)) by which you are going to judge your results. If you find that the probability of your data assuming that you live in the world of \(H_0\) is smaller than your chosen \(\alpha\) level, you’ll reject the null. If not, you will retain it.

Despite what some of your friends with a more cavalier approach to pub science think, you know that it’s imperative that you choose an \(\alpha\) level before you start collecting data!

Task 6

Thinking of all this reminds you of how different levels of significance are pretty much arbitrary and conventional. For some reason the most frequently used one pops into your head. It is, of course…

(you’re not thinking of a percentage, more like a 0.something number…)

0.05

Correct!That’s not right…

.05

Collecting data

You’re now ready to start taking some sweet, sweet measurements. You find a cosy spot not far from the dart action with a vantage point that lets you see the entire dart board, unobstructed by the robust physiques of the athletes. Devising a way of accurately measuring the distance of each dart from the bullseye takes some ingenuity but, armed with a thick layer of plot armour, you are unhindered by this task.

After each player’s turn, you snap a sneaky pic with your phone and, after 10 rounds, you think you have enough data. Excited, you pack your things and, merely as an after-though, call out to your friends on your way out: “Gotta run, have data to analyse. Byee!” Everyone accepts this without as much as the slightest hint of bafflement. That’s just what you be like…

When you get home, you download the picture on your computer and overlay them one on top of another. This is what you see:

“Good thing I got that optical scanning software the other day,” you think to yourself. And a good thing indeed! Here’s the data for the vertical distance, just as you intended:

darts <- tibble::tibble(
  round = rep(1:10, each = 6),
  player = rep(rep(c("Colin", "Stu"), each = 3), 10),
  dart = rep(1:3, 20),
  distance = c(
    3.5, 4.5, 7.2, 3, 5.5, 8, 1, 0.2, 7.5, 9, 9.2, 9.5, 0.5, 4.8, 5, -0.2,
    -2.2, 8.2, -2, 10, 6.2, 6.5, 3.8, -3.8, 4.5, 5.8, 4, 6.8, -4.2, 3, -1,
    2.5, 7.2, 5.8, 7.2, -0.8, 0, -1.8, -0.8, 0.5, 9.2, 4.5, 9, 6.2, -1.2,
    8.5, 5.2, 0.8, 5.2, 5.2, 3.2, 9.8, 0.5, 6.5, -1.2, 4, 3.2, 3, 6.5, 3.2)
)

You have to admit that, at face value, it seems like you might be onto something here but you know better than to rely on eye-balling. After all, you are the world’s foremost dartologist and accuracy is the name of the game! Well, the name of the game is darts but accuracy is its important component, anyway… Dart Vader is your name and precision is your game! Your favourite musketeer is Dartagnan. Since you were small, you’ve always wanted to be a darta scientist.

 

 

Once you’ve stopped simultaneously cackling with self-satisfaction and shuddering with self-loathing at those awful jokes, you are ready to analyse your… observations.

First, you need your test statistic.

Task 7

Out of several more-or-less-good options, you choose…

Mean vertical distance

Of course, what else!?

It must be mean distance because your \(H_0\) talks about mean population distance from the bullseye. You’re moving swiftly now.

Task 8

With surprising ease and grace, you type into the R console the command to calculate the mean of the distance variable.

The result is…

3.915

Correct!That’s not right…

mean(darts$distance)
[1] 3.915

Hmm… it does look like the punters were aiming above the middle of the dart board after all…

Sampling distribution of the test statistic

However, you know very well that in order to assess the probability of your result or an even more extreme result, you need to know what the sampling distribution that your statistic comes from looks like.

Task 9

You understand the relationship between the hypothesis you are testing and the expected value of the statistic, and so you know that the mean of the sampling distribution of the test statistic is…

0

Correct!That’s not right…

The mean of the sampling distribution of any parameter is the same as the true population value of the same parameter. Under the null hypothesis, we assume that the players are aiming for the middle, and so the true value of the statistic in this world must be…

One parameter down; one more to go.

Unfortunately, you know that the standard deviation of the sampling is tricky. You simply don’t know what it is. But you do know its relationship with population standard deviation in human dart-throwing accuracy.

After a quick internet search, you come across a paper reporting that the variance in people’s accuracy in a game of darts is 210.937 cm2. The study was conducted on a huge sample and so you take this as a trustworthy estimate of the population variance, \(\sigma^2\).

For a moment, you wonder if anyone has ever researched the rules of 501 darts but you quickly snap out of your daydream. There’s no time for escapist fancy!

Task 10

Based on the population variance in dart-throwing accuracy, you can safely assume that the sampling distribution of the test statistic has a standard deviation of…

1.875

Correct!That’s not right…

Variance in accuracy will be the same no matter where people are aiming because it’s independent of the mean. Also, realise what the standard deviation of the sampling distribution is and how it is calculated.

The standard deviation of the sampling distribution is the standard error and it’s calculated as:

\[\begin{aligned}SE&=\frac{\sigma}{\sqrt{N}}\\&=\sqrt{\frac{\sigma^2}{N}}\\&=\sqrt{\frac{210.9375}{60}}\\&=\sqrt{3.515625}\\&=1.875\end{aligned}\]

Get the p-value

Your heart is racing now. You are so tantalisingly close to the TruthTM! Only one thing left to do: calculate the probability of obtaining a test statistic of 3.915 or even a more extreme result, if you actually live in the wold of \(H_0\)!

Task 11

You frantically type in the code to get the p-value into the R console and, with bated breath, press Enter.

The result – rounded to 3 decimal places – is showing bright and clear…

0.037

Correct!That’s not right…

You know that the sampling distribution of your test statistic is normal, you know its assumed mean (given \(H_0\)), and you know its standard deviation. You also know the value of your test statistic so calculating the probability of getting this value, or even a more extreme one, is fairly easy.

n <- 60
variance <- 210.9375
se <- sqrt(variance/n)
test_stat <- mean(darts$distance)
# times two because we had a 2-tailed hypothesis
# it could have turned out that people aim BELOW the centre
p_value <- pnorm(test_stat, 0, se, lower.tail = FALSE) * 2
p_value %>% round(3)
[1] 0.037

Decision

A blissful smile is beaming in your face as you relax back into your chair, exhausted but happy. Now you know… You’ve been through it all, formulated your hypotheses, gathered your data, calculated your stats, found your results.

Task 12

So what will it be? Do you reject or do you retain the null hypothesis?

Reject

 

Your hunch was correct, they were not aiming for the bullseye!

Well, it may have been correct. You are well-aware that no single piece of research can ever answer a research question definitively. Of course you know that! But now it’s not the time for raining on your own parade.

Now it’s the time to watch some darts.

 

Well deserved!

 


  1. I wrote this tutorial in early 2021 after almost a year of lockdown. I decided to keep it unedited as a reminder to myself of the weird places my brain can go after a period of isolation. Sorry everyone, it was a tough time…↩︎