Simulating Correlated Multivariate Data

Simulating bivariate normal data

Ah, simulations. The mere mention conjures images of wily statistical wizards from beyond the beyond, fingers flitting about the keyboard bashing out a frenzied stream of code fashioned to render order from the chaos of numerical nothingness.

kermit keyboard

In reality, it’s not nearly this intense. At least not for you, dear user. Your poor computer however, will have to do most of the heavy lifting. So be sure to thank it for the service.

Anyways, yes… simulating correlated data. A useful but slightly tricky endeavor if you’re new to this sort of stuff. It’s not beyond your means however, so stick with me here and we’ll make some progress. Let’s tackle this one in R.

We’ll use the MASS package’s mvrnorm function to quickly and easily generate multiple variables that are distributed as standard normal variables. Here I’m simulating 100 observations, which I define as the value N below (i.e., I’m going to be drawing values randomly from some normally distributed variable in the population 100 times). Next I’m defining two population means for two variables that I’ll simulate for these 100 observations, which I define in a vector called mu (from here on, we’ll just call these variables variable 1 and variable 2 because, let’s be real, it’s past midnight as I’m writing this, and I’m way too tired to be creative about it right now). Lastly, I’m specifying an underlying 2×2 covariance matrix (that I call sigma), which defines the variances of each variable in the population, as well as defining the covariance of both variables in the population. I’ll explain the numbers below after the jump. For now, just peep the code:

#Libraries we'll need:
library (MASS)


N <-100                           #setting my sample size             
mu <- c(2,3)                      #setting the means
sigma <- matrix(c(9,6,6,16),2,2)  #setting the covariance matrix values. The "2,2" part at the tail end defines the number of rows and columns in the matrix

set.seed(04182019)  #setting the seed value so I can reproduce this exact sim later if need be
df1 <- mvrnorm(n=N,mu=mu,Sigma=sigma)  #simulate the data, as specified above

Continue reading

Intro Topics: A Regression Primer

Welcome to another stupidly-long, but hopefully informative instructional on introductory statistical concepts. Today we tackle regression analysis.  Use the menu links below to jump around if you need/want to get a quick bit of info on any topic:


Menu

1) Background: Correlation analysis [conceptually] explained
2) Correlation analysis and OLS linear regression
3) From guesses to predictions: The logic of using linear regression
— a) Building the equation
— b) Interpreting regression
— c) What is OLS?
4) Advanced Applications
5) Conclusion & Further reading


Continue reading

Basics: Highway to the Danger Zone: Why median-splitting your continuous data can ruin your results.

Not too long ago, I wrote an article here about advanced procedures for examining interactions in multiple regression. As I described some of the challenges researchers commonly face in trying to examine differences between people in a data set, I argued that when it comes to data analysis, splitting a continuous variable into a dichotomy (i.e. two categories) is kind of a dumb idea (MacCallum et al, 2002). Continue reading

SPSS Correlation Tabulator (v 1.4)

The Correlation Tabulator is a tool I designed for use with SPSS. It will require you to run a set of Pearson correlations in SPSS, paste the correlation table output into the tabulator (if you follow the instructions, of course). It will then take those results and compile them into an APA-style correlation table (coefficients reported to two decimal places, with asterisks indicating significance levels), which you can copy and paste into Word or a similar program.

Continue reading