Assumptions are maybe the weirdest part of doing statistics in psychological research. The vast majority of us rely on statistical tests that require certain assumptions to be met (i.e., *parametric* statistical tests). But for every 10 conversations I have about statistical analyses that involve the mention of “assumptions”, 9.9 of them end the same way: **“Why are these assumptions needed, anyways? I know they are t****echnically ****important, but what are the practical consequences of failing to meet them?” **

As it turns out, hardly any of us *report* on our assumption checks. And maybe it’s because hardly any of us *understand* how to correctly *check *on our assumptions. Damn assumptions–forever making an ass out of u and me.

**The Big Picture: Why Is It Necessary to Meet Assumptions of a Test?**

Let’s take the case of linear regression as an example (we’ll get to its particular assumptions in a moment), and specifically, the tried-and-true *Ordinary Least Squares* (OLS) method of estimating linear regression models.

So why should you care about meeting the assumptions for OLS regressions? What’s the “big picture” practical problem(s) with not evaluating these assumptions?

When the assumptions for OLS regression are met, it is what Cohen, Cohen, West, and Aiken (2003; the “bible” of regression references) refer to as, the *Best Linear Unbiased Estimator (BLUE)*–an estimator with a number of highly desirable long-term properties for calculating slope and intercept terms, including estimates that are:

**Unbiased**–if you ran many, many identical studies, the averages of these estimates should equal their respective population values**Consistent**–their standard errors will get smaller at larger samples**Efficient**–the smallest standard errors (read: most powerful tests) of any estimation method

The practical benefits of achieving BLUE-status might seem less than obvious to you (they did to me), but in spite of the comprehensive nature of their book, Cohen et al. (2003; seriously, buy that book if you don’t have it already) provide a stunningly accessible explanation of why they should matter to psychologists conducting research:

Of importance, violation of an assumption may potentially lead to one of two problems. First and more serious, the estimate of the regression coefficients may be biased…in such cases,

the estimates of the regression coefficients,. Second, only the estimate of the standard error of the regression coefficients may be biased. In such cases, tR2, significance tests, and confidence intervals may all be incorrecthe estimated value of the regression coefficients is correct, but hypothesis tests and confidence intervals may not be correct. (p. 117)

The executive summary: assumption violation =** increased Type I** (false-positive) **and** **Type** **II** (false-negative) **error rates** in your study, **in the short-term**, and maybe even **meta-analytic biasing in the long-term**. An even more succinct summary: **assumption violations lead to unreplicable findings**. And though we might want to debate where in the priority list of scientific values replicability should fall, I suspect most psychologists these days are interested in good, cheap, and fast ways to assure themselves of the replicability of their research; assumption verification is one approach to doing so.

**What Are the Assumptions of OLS Regression and the Consequences for Violating Them?**

So if I’ve convinced you that that meeting assumptions is good, and violating assumptions is bad (who doesn’t want fewer Type I/II errors in their research…?), then it’s probably worth refreshing what assumptions are made when fitting an OLS regression model, and how they impact the outcome of analysis when they are violated. Here are “the big three” that are typically described (though there are others).

**The association between X and Y is properly specified (usually linear):**Typically, though not always, psychologists use regression to model linear associations between X and Y, though other patterns of association (e.g., curvilinear) are possible. OLS regression relies on the assumption that you have chosen the correct pattern of association between X and Y in your modelling approach. If you model X and Y as being linearly associated, and they actually are, then you’re in the green. But if you mis-specify their pattern of association, you’re going to run into problems. And in many ways,**this assumption is the worst to violate**, because it not only**biases the standard error of the regression slope**(like the next two assumptions do, when violated), but it also exclusively**biases the estimate of the regression slope itself**. That means**more frequent false-positives**(if your slope is overestimated, or its standard error underestimated)**or false-negatives**(if your slope is underestimated, or its standard error overestimated).

**The residuals from the regression model are normally distributed:**a common misunderstanding about OLS regression assumptions is that we assume that X and Y are normally distributed. But we actually don’t care about the distributions of these variables. What we*actually*need to concern ourselves with is the distribution of**residuals**from our model in which Y is regressed onto X (i.e., the differences between each observed-Y and each predicted-Y ). If the normality of residuals assumption is violated, then the**standard errors of our slopes will be over-estimated**(resulting in**worse statistical power**). Thankfully, the impact of this violation is greatly attenuated at larger sample sizes. Still it’s worth checking out because a) you might not have a sufficiently large sample size (and we suck at ball-parking our sample size needs), and/or b) because non-normality is often a clue about other assumption-related problems (Cohen et al., 2003).**The residuals from the regression model are homoscedastic (i.e., constantly varied)****:**simply stated, when we make predictions, as we do in regression, we are going to err–our predictions for Y can’t all be perfect. The homoscedasticity of residuals assumptions simply asks that we err consistently, and not get any better, or worse at predicting Y, in a way that is systematically related to X. If residuals are associated with X, they are heteroscedastic, and t**he consequence of heteroscedasticity is a biased standard error of the slope**; sometimes standard error will be overestimated (meaning less power), sometimes it will be underestimated (meaning the illusion of greater power, and thus, potentially more false-positives)–it largely depends on the direction of the association between X and residual size. Thankful, Cohen and colleagues (2003) suggest that the variance in residuals needs to be 10x greater or smaller across the range of X, before it starts to seriously bias standard error estimates.**:Grumble Grumble: ….Fine. Okay. Alright! How Do I Check These Assumptions?**

I never thought you’d ask! I started familiarizing myself with how to check regression assumptions in *R *this summer, as I was prepping my (under)graduate stats courses (I’m using Navarro’s open source book [only $40 if you want a print copy] in my grad class, and loving it!). And as it turns out, it is maybe the easiest thing I have ever had to do in R.

Let’s see a reproducible example, using some personality data from the *psych*() package:

#Install and call psych() and car() install.packages('psych') install.packages('car') library(psych) library(car) #Run the code below mindlessly to create a data frame with personality-variable data #and to create the composite scores for agreeableness, conscientiousness, extraversion, neuroticism, and openness keys.list=list(agree=c('-A1','A2','A3','A4','A5'),conscientious=c('C1','C2','C3','-C4','-C5'), extraversion=c('-E1','-E2','E3','E4','E5'),neuroticism=c('N1','N2','N3','N4','N5'), openness = c('O1','-O2','O3','O4','-O5')) scores=scoreItems(keys.list,dat) dat$agree = scores$scores[,1] dat$consc = scores$scores[,2] dat$extra = scores$scores[,3] dat$neuro = scores$scores[,4] dat$open = scores$scores[,5]

Now let’s pretend you’re interested in testing a regression model where you predict someone’s openness personality dimension from their agreeableness dimension (don’t ask me if/why that might be interesting…I know zilch about personality). Start by fitting your regression model…

“Wait, what? John, that’s silly. Why would you start by fitting the model?!–you haven’t looked at your assum–”

Shhhh! Wait. Magic is about to happen…

#Fit regression model and store output in object 'model1' model1 = lm(data = dat, open~agree)

Now, simply run the built-in plot() function on your saved regression model object…

plot(model1)

And R automatically creates and spits out 4 diagnostic plots for you to cycle through (by pressing the enter key in the console). The first three plots visualize your linearity, normality, and homoscedasticity assumptions, and R is kind enough to throw you an outlier diagnostic plot–the fourth plot–as a bonus. Hold your cursor over each plot, and you’ll be provided with a quick guide for what to look for to diagnose whether you meet or violate these assumptions. Sure, they aren’t as pretty as some other plots, but it’s not like you’d ever place these in the text of an actual journal article submission (though they might look nice in an open/transparent supplement…)

But perhaps you’re the type who prefers inferential tests of assumptions, to graphical assessments (for what it’s worth, Cohen et al., 2003, promote graphs over tests)? That’s fine, they’re not much more trouble to produce either:

#Evaluate whether linear model is sufficient (null) vs. curvilinear residualPlots(model1) #Run a Shapiro-Wilks test of normally distributed residuals (null of normality) shapiro.test(model1$residuals) #Run a non-constant residual variance test (null of homoscedasticity) ncvTest(model1)

Interestingly, whereas our visualizations (to me) suggest that, normality aside, our data seem to meet assumptions of OLS regression reasonable well, these inferential techniques suggest that there are significant violations of the linearity, normality, and homoscedasticity assumptions:

Then again, in a sample of 2800 observations, we shouldn’t be *that* surprised that we’re rejecting null-hypotheses as *p *< .05, right? So let’s go with the impression the visualizations give us, over these inferential tests.

……………………………….

……………………………………….

………………………………………………..

I bet that explanation above sounded awfully reasonable, didn’t it? But what you just saw was a very intuitive and compelling (if I may be so bold…) occurrence of researcher-degrees-of-freedom in the wild. If I was seriously invested in this research question, I could very easily have convinced myself (and likely you) that the converse is true (e.g., the inferential tests are more “objective”, “empirical”, “scientific”, etc., vs. plot inspections), if interpreting the inferential tests suited my theory better, or gave me more interesting effects to write about, compared to interpreting the plots.

Surprise! [if you share this post, please don’t “spoil” this element in your summary–I think it’s a nice teachable moment 🙂 ]

The take home messages, therefore, are these:

**Evaluating assumptions is important**: when assumptions are violated badly, analyses have inflated Type I/II error rates, meaning that study will be less replicable**Evaluating assumptions is easy**: whether you are Team Plot, or Team Inferential Test, you can evaluate most regression assumptions in less than 100 characters of R code, post the results in an online supplement, and sleep better knowing that you’ve provided evidence that attest to the replicability of your findings**Evaluating assumptions is highly subjective/flexible**: you can easily trick yourself, post-hoc, into seeing what you want to see in the process of assumption evaluation–and not just for these “big three” assumptions, but others as well. Because of this, you should*really***consider pre-registering your assumption evaluation criteria**, so that you won’t later have to worry about concerns that you might have played fast-and-loose with criteria for meeting assumptions that were convenient to you after-the-fact, as opposed to establishing them a priori.

So, go on–the next time you fit a regression model, try evaluating your model assumptions. Do it–doit!

Do you know of any effect size measures for these tests? It seems to me that the reliance on p-values for these tests in a large sample is likely ill-advised. So it’d be cool to be able to say (a priori) “significant, but small violation, therefore, carry on.” But how small is small…would be interested to find out! Thanks!

LikeLike

I don’t. However, the point you’re raising is one of two common objections to these kinds of inferential tests of assumptions: 1) at higher n, they become increasingly likely to reject the nulls of meeting each assumption, even when such departures may be trivial, and 2) researchers may find the logic of these tests to be circular, as some rely on parametric assumptions, which the tests are supposed to be evaluating…

The visualization approach, though subjective, at least allows you to attempt to characterize the magnitude of departure from these assumptions (though in a subjective way).

LikeLike

Nice post! Two thoughts on this (I spend way more time thinking about assumptions than I should).

1. I can’t remember if it holds for regression, but for categorical predictors (e.g. t-tests and ANOVA) the danger with assumptions isn’t when one is violated, but when two are violated at the same time. For example, t-tests are pretty robust to violations of the assumptions of equal variances, UNLESS you also have unequal sample sizes, then things fall apart.

2. Just a reminder that failing to meet assumptions isn’t the worst thing in the world. In fact, it means something cool is happening in your data (possibly something unexpected, but still, something cool!). Let’s take the example of heteroscedastic residuals. This means that the variance in the residuals changes as a function of one or more other variables. That’s important information! To use your example: people higher on agreeableness are more similar on openness than people who are low on agreeableness. I know nothing about personality either, but that might be super meaningful. In cases like this, switching to an analysis strategy that explicitly models the assumption violations (like a location-scale model for heteroscedasticity) brings in all sorts of fun information that can help inform your theory. Of course, now you’re in the realm of exploratory analyses, but there are worse things.

LikeLiked by 1 person

Really nice post: reminded me of another blog post on this issue that talks about most of those test flag everything as non-normal when sample size gets high.

http://disjointedthinking.jeffhughes.ca/2015/12/probably-dont-need-test-data-normality/

LikeLike