Taking the P out of Psychology

Earlier this week the American Psychological Association issued a statement clarifying some basic facts of psychology that statisticians and other scientists keep getting wrong.

No, wait, I might have that backwards.  It was the American Statistical Association that issued the statement, comprising six principles about the use of p values. It was aimed at scientists (social or otherwise) who all bristled with irritation at the ASA's condescension until they got to number two on the list and surreptitiously Googled the difference between "the probability of the data given the hypothesis" and "the probability of the hypothesis given the data".

All those whose scientific livelihood depends on producing p values below .05 are aware of the craziness of the situation. A study supported by a statistic with a p value of .049 is lauded as a breakthrough worthy of publication. But a p value of .051 means you're a terrible scientist churning out junk. Worryingly, this is not a huge exaggeration. Flick through a contemporary psychology journal and try to find a results section that doesn't include the magic p<.05.

When genius eugenicist Sir Ronald Fisher published Statistical Methods for Research Workers in 1925 he would have been astonished to learn that thousands of people nearly a century later would still be slavishly following his rules of thumb.

 Ronald Fisher with a faraway look in his eye. Or has he spotted someone with a lesser "innate capacity for intellectual and emotional development" ?

Ronald Fisher with a faraway look in his eye. Or has he spotted someone with a lesser "innate capacity for intellectual and emotional development" ?

And rules of thumb they were. He considered a critical p value of .05 to be sensible, given that it was about two standard deviations from the mean (in a 2-tailed test with a normal distribution). But he was also clear that while this was a "test of significance", what that actually meant was that if replications of the study achieved similar p values, you were probably on to something.

To implement Fisher's method you would calculate a statistic and then consult the tables at the back of his book to see if your value was greater than the one given for a p value of .05. If it was, you could celebrate by writing p<.05 in your lab report, along with the superfluous "and thus we may reject the null hypothesis".

The reason for using the term p<.05 was that you didn't know the exact value of p, because that would have been too difficult to calculate with only a slide rule and ten fingers, unless your name was Ronald Fisher. This was acceptable at the time of Downton Abbey. It was understandable even 50 years ago when statistical computing machines were operated with punchcards by earnest, pipe smoking men. For bizarre reasons it was still the practice 25 years ago when I learned statistics the hard way because psychologists didn't yet trust new-fangled computer programs like SPSS (now approaching its 50th birthday). But in 2016 it's absurd.

Can we all agree to dispense with p<.05 ?

And once we've thrown out that bathwater, we can throw out the baby of pass/fail hypothesis testing. We still teach students that if p<.05 they can happily reject their null hypothesis. Instead we should be teaching them how to accumulate and evaluate different types of evidence that support a hypothesis. Statistical probabilities are one part of that evidence. But so are the magnitudes of effect sizes. And how we interpret these depends on what's being measured.

We poo-poo the idea of even rules of thumb for the magnitude of correlation coefficients or effect sizes. I have a slide I use in statistics lectures which states that a correlation coefficient between 0.1 and 0.3 is "small", between 0.3 and 0.5 is "medium" and greater than 0.5 is "large". I then wait for the students to note this down before an overwrought PowerPoint animation crosses this out, replacing it with the po-faced statement "There is no objective interpretation of coefficient magnitude; it depends on the context". How they laugh. 

The special treatment of p values is evident in statistical software like SPSS which will happily annotate tables by adding asterisks in varying number to indicate that the p value it has calculated to dozens of decimal places is below an arbitrary threshold.

If you calculate a vast matrix of correlation coefficients you can easily spot those that are "statistically significant", regardless of whether the correlations themselves are large. The negative effect of prioritising the former over the latter leads to misleading research. For example, a study of the effects of violent video games on teenagers' behaviour showed a statistically significant correlation between video game playing and engaging in physical fights. This correlation was so significant, it had three asterisks next to it.

By contrast, the correlation itself (.21) was so small, less than 5% of the variance in physical fights was accounted for by video game playing. At this point, most people would be thinking so what accounts for other 95%? Instead, the authors of the paper were thinking yay, we got three asterisks.

I propose that if we are to continue with the concept of statistical significance in relation to the probability of the data given the null hypothesis, we should start using the concept of statistical magnificence in relation to correlation coefficients (or indeed effect sizes in general).

A correlation greater than about 0.7 should be considered statistically magnificent, because it implies that one variable accounts for most of the variance (>50%) in the other.

And we should do away with asterisks to indicate levels of significance. Instead, we should have different symbols to indicate significance, magnificence and anything else to which our attention should be drawn.

I am confident the next version of SPSS will produce output like this:

SPSS Correlation Matrix

Alternatively, we could read the ASA guidelines and learn how to use probability, effect size and statistical power sensibly.

First Post! Amnesia

The very worst topic for a first blog post is "my first blog post". So instead I'll pretend that this is a considered essay on cognition, with reference to observations concerning human autobiographical memory.

Setting up a blog should be easy. You just log into Wordpress or Blogspot and start writing. But like Heston Blumenthal frying a chip I'm quite capable of spending days on tasks that should take a few moments. I like to think of this as a perennial quest for perfection, but in reality it's about spending 10 times as long to do a job only 10% better. Or more frequently, procrastinating until I forget what I was actually trying to do in the first place.

In this case I became obsessed with getting my own domain name. Really I should have gone with justinobrien123.freeblogsite.com. Instead I became annoyed that the usual top level justinobrien domains were already taken.

This should have concerned me for no longer than it took to type in my name and discover that there are now thousands of possible top level domains. I could be justinobrien.science or justinobrien.vision.

Instead I fixated on the domains I couldn't have. Specifically justinobrien.com and justinobrien.co.uk were already taken. Now I do understand that there are other people with the same name as me. I know this because I have a Google Alert that emails me whenever a "Justin O'Brien" appears in the news (everybody does this, right?). It transpires that there are a lot of Justin O'Briens and they all play high school football in the American midwest. Or at least, Google thinks all the newsworthy ones do.

I was already aware that justinobrien.com belongs to an Australian film maker, who wastefully redirects the domain to a different site. What I didn't know is who had my preferred .co.uk.

You used to be able to find out who owned a domain with a whois lookup. But that's hardly worth bothering with because nowadays everyone masks their identity for security reasons. Only an idiot would allow their private contact details to be available on a public website.

If you search for my name on LinkedIn it claims to find 165 people. Not all of these are high school football players. Indeed, these J. O'Briens are clearly too busy performing sporting deeds on the football pitches of the mid-West so heroic they merit mention in the local press to upload profiles to LinkedIn. I myself don't even appear on the first page of LinkedIn Justin O'Briens.

So who was the culprit? Who was so narcissistic he just had to have his own name domain, a man so intent on aspiring to the utterly pointless status as the UK's #1 Justin O'Brien that he sat on the domain year after year, making sure nobody could steal it when the registration came up for renewal.

I grimly typed "www.justinobrien.co.uk" into my browser and waited for a self-satisified, beaming face to appear. It took only a moment to download.

And that's why this post is about amnesia.

Because the four-eyed, balding Justin O'Brien I found myself looking at was of course me.

It took a moment to process this. Smugly grinning was not a younger version of myself, with a full mane of lustrous dark hair. It was the Justin O'Brien snapped at work only a few weeks ago. Could I have ... was it possible that ... no, surely I hadn't set up a website and forgotten about it in the space of a fortnight?

TL;DR - I'm not that far gone. I bought the domain 10 years ago and set it to point at my Brunel University personal page. ¯\_(ツ)_/¯

 The undisputed #1 Justin O'Brien in the UK, according to Justin O'Brien

The undisputed #1 Justin O'Brien in the UK, according to Justin O'Brien