Practical 9 worksheet

Instructions

This is a marked worksheet that contains 7 questions. The questions cover topics from last week's lectures and skills lab, and the tutorial you just completed. Before you begin the worksheet, you should first read these instructions and complete the analyses described in "Analysis", below.

You will have 7 randomly selected questions to answer; do not be surprised if you have different questions from others working on the same worksheet!

To access the worksheet, you must attend your practical session. In the session, a passcode will be announced to unlock the worksheet; you must begin the worksheet within 5 minutes of the passcode being released. You will have 30 minutes to complete the worksheet, unless you have reasonable adjustments for extra time (38 minutes for 25% extra time, and 45 minutes for 50% extra time).


Academic Honesty

You are welcome to use module resources - e.g. lecture slides, tutorials, skills labs scripts - to answer these questions. You are also welcome to use RStudio to solve problems, do calculations, or refer to output. However, you should not work with other students while you are completing the worksheet, and tutors will only be able to answer questions about technical problems (e.g. computer crash).

Setting Up

Task 1

Open your project for this week in RStudio. Then, open a new Markdown file with HTML output and save it in the r_docs folder. (Give it a sensible name, like worksheet_09 or similar!)

For each of the tasks in the Analysis section, write your code to complete the task in a new code chunk.

Remember, you can add new code chunks by:

  1. Using the RStudio toolbar: Click Code > Insert Chunk
  2. Using a keyboard shortcut: the default is Ctrl + Alt + I (Windows) or ⌘ Command + Option + I (MacOS), but you can change this under Tools > Modify Keyboard Shortcuts…
  3. Typing it out: ```{r}, press ↵ Enter, then ``` again.
  4. Copy and pasting a code chunk you already have (but be careful of duplicated chunk names!)

I Wanna Be the Very Best

The theme for today’s task is inspired by a weird Internet thing that happened in the spring of 2014, called Twitch Plays Pokémon. A programmer in Australia hooked up the classic Pokémon Red game to the chat room on video game streaming site Twitch. By typing commands in the chat, viewers could control what the character in the game did - but on the scale of thousands or tens of thousands of participants at once. The game turned into a massive social experiment and even spawned a minor cult before the cumulative 1.16 million viewers beat the game in about two and a half weeks1.

To Catch Them Is My Real Test, To Train Them Is My Cause

If you aren’t really familiar with Pokémon, all you need to know for today is that it’s a series of video games (later TV show, graphic novels, etc. etc.) that take place in an alternate universe with magical animals called Pokémon. Pokémon battling (sort of like magical dogfighting?) is a core part of this universe, with children setting out at a young age to travel the world, capture many different types of Pokémon, and compete in massive championships, with the winner being crowned a Pokémon Master. It’s a bit ethically murky because at least some Pokémon seem to be sentient, and they can all talk, but only to say their own species name. You know, as I’m writing this description, it just keeps sounding weirder…

A picture of Pikachu, a mouselike yellow fantasy creature. Figure 1 A Pokémon called Pikachu. You may have heard of him. Source

For this week’s stats practical, we’re doing a Twitch-Plays-Pokémon-style walkthrough of a dataset of Pokémon characteristics to practice the linear model. You don’t need to know anything about Pokémon to do this practical besides what’s in the box above.

Task 2

Load the tidyverse package and read in the pokemon dataset straight from https://and.netlify.app/datasets/pokemon.csv (we took the data from this fun website).

Task 3

You should always have a look at what variables you have in your data. Do it now by asking R to either give you the names of the variables or by looking at a rought summary of the data.

As you can see there are quite a few variables but most of them will not be relevant to us.

Making Predictions

First we need to choose a research question to investigate.

As future Pokémon Masters, we want to work out which Pokémon is the strongest. So, we’ll use attack as our outcome variable, which quantifies a Pokémon’s offensive capabilities.

For the predictor, let’s choose the hp variable. HP or hit points is a geek gaming term name for health of your character/creature: The more HP a thing has, the more damage it can withstand.

Task 4

In your RMarkdown file, write down your prediction about the relationship between the predictor – hp – and the outcome, attack.

Visualising the Relationship

Next up, we should have a look at our data.

Task 5

Create a scatterplot of hp as the predictor and attack as the outcome.

Task 5.1

Label the axes better and clean up the plot with a theme.

Task 5.2

Stop and have a look at your plot. How would you draw the line of best fit? Is this the direction of relationship you expected?

Task 5.3

Add a line of best fit to your plot. Is this what you expected?

Task 5.4

Optionally, add another line to your plot that represents the null model.

Hint

Have a look at geom_hline, or the code for the plots in the lecture!

Creating the Model

Now that we have some idea of what to expect, we can create our linear model!

Task 6

Use the lm() function to run your analysis and save the model in a new object called poke_lm.

Task 7

Call this poke_lm object in the Console to view it, then write out the linear model from the output.

Task 8

How can you interpret the value of b1 for this model? Write down your thoughts in your RMarkdown.

Task 9

Using your equation, what attack value would you predict a Pokémon with 86 HP to have?

Evaluating the Model

Now we have the model parameters, but we don’t want to just describe the line - we want to be able to say something about the population, not just our sample. For this, we need some more info!

Significance Testing

Task 10

Use broom::tidy() to get p-values for your bs. Is your predictor significant?

Hint

Remember, that in the so-called scientific notation, the number xe-n, where x and n are numbers means x × 10−n. For example 2.3e-4 means 2.3 × 10−4 which is the same as 2.3/10000 or 0.00023.

Task 11

Add the conf.int = T argument to broom::tidy() to get confidence intervals. How can you interpret these? Do they “agree” with the p-value?

Goodness of Fit

Task 12

Use broom::glance() to get R2 and adjusted R2 for this model. How much of the variance of the outcome is explained by the model for this sample? What would we expect this to be in the population?

Reading the Summary

Finally, we can get all this same information - except for CIs - from the summary() function. I (Jennifer) like summary() because you can get a good overview of a lot of information quickly, but it’s very inconvenient for reporting, so it’s good to know how to use the broom functions as well.

Task 13

Get a summary of your model. What do the asterisks (stars) mean?

Reporting

Task 14

Report the results of your model. Specifically, you should clearly report the coefficient of interest (b1) and the fit of the model (R2), including all the important details (see Tutorial 8).

I Know It’s My Destiny

That’s as far as we’ll go today, well done!

The Linear Model is all of our destinies for the next year or so, so it’s important to get comfortable working with it.