During my undergraduate years I spent large segments of my working week learning SPSS. Much of it was trial and error (ok, mostly error), but in my trials I recall one consistent experience. An experience that is familiar to many other students, I’m sure.
The usual steps included going through various motions, running step after step, procedure after procedure, and at the end of the day, saving a few output files, shutting down the computer, locking the lab door, and going home for a nutritious ramen supper. The next day generally involved some attempt to replicate some or all of what I’d done 24 hours ago (or carrying out slight variations of yesterday’s work), and suffering the tedium of going through all the point and click motions again.
Frustrating and inefficient. This was the way of things for years, until I finally plucked up the courage to explore the use of syntax (AKA the command language). After I did, I soon realized what I was missing. It was a total game-changer. My efficiency skyrocketed when using syntax, as opposed to the point-and-click interface. Yours can too, once you take the leap.
Here are the basics in my experience.
What I’ll generally do in SPSS is the following:
1) Open a data set
2) Open a new syntax window & paste all procedures into it.
3) Annotate the hell out of it with my notes as I go along.
4) Coffee.
I usually start my syntax files by putting basic identifying info at the top, inside annotations (e.g., today’s date, name of data set i’m using, what the overall purpose of this particular analysis is, most recent update), then follow that with a little more detailed info about what I’m trying to accomplish. I find it helpful for big projects (e.g., a thesis) where you might eventually end up with about 20 different syntax files related to the same project. Knowing which ones were written specifically for which purposes is one way to alleviate headaches down the road.
To insert annotations, simply begin your line with an asterisk (*) and end whatever note you write with a period. If done right, the annotation should turn grey.
As a general rule, I annotate all of my syntax files liberally. Annotations are my way of reminding my future self of what my thought process looked like during the writing phase, and what each and every block of syntax is intended to do.
To get a foothold in the ways in which SPSS turns point and click actions into instructions, I initially run any data restructuring, manipulation, or analyses procedures via the good, old-fashioned command buttons in the toolbar… BUT, instead of clicking “OK” when I’m done plugging in the variables and setting my parameters, I click “PASTE” (As a rule of thumb, I never click “OK” unless it’s something really small and inconsequential that I’m certain I won’t bother saving and re-doing). Clicking “PASTE” sends the instructions for the procedure to the syntax window in code form. From here you can manipulate the hell out of it if you’d like (of course SPSS will piss and moan if the syntax is incorrect, usually by turning your syntax bright red – the color of shame and erroneous syntax).
After performing enough procedures, I save my syntax file and carry on doing analyses.
What’s nice about the syntax approach is, as I mentioned, it really speeds things up. Let me explain by way of an example in SPSS, from a paper I presented two years ago at the MPA conference1.
EXAMPLE: For this part I wanted to model people’s positive expectations of alcohol use based on their personality traits, and test whether that association was moderated by age and body mass index (among other descriptive factors). So here’s a bit of the actual syntax I used with my data set, including my annotations (note that my moderator is in bold below):
*****************AGE***********************. * Generalized Estimating Equations- Positive expectations from Personality (with AGE). GENLIN CEOAposSP WITH AGE Open_C Consc_C Extrav_C Agree_C Neuro_C /MODEL AGE Open_C Consc_C Extrav_C Agree_C Neuro_C INTERCEPT=YES DISTRIBUTION=NORMAL LINK=IDENTITY /CRITERIA SCALE=MLE PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012 ANALYSISTYPE=3(WALD) CILEVEL=95 LIKELIHOOD=FULL /REPEATED SUBJECT=DYADNUM SORT=YES CORRTYPE=EXCHANGEABLE ADJUSTCORR=YES COVB=ROBUST MAXITERATIONS=100 PCONVERGE=1e-006(ABSOLUTE) UPDATECORR=1 /MISSING CLASSMISSING=EXCLUDE /PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION COVB WORKINGCORR /SAVE RESID.
Now at this point I could go back into the “Generalized Linear Models” menu in SPSS, Start setting up a new GEE model, then drag out the moderator variable I used, then drag in the variable I need to run the same analysis with BMI as a moderator. Instead of all that, I just:
1) Copied and pasted the above syntax
2) Replaced the variable “AGE” with the variable “BMI”
3) Clicked “Run” in the syntax toolbar.
That’s it. Four seconds of work for a whole new analysis. Here’s what I ended up with:
*****************BMI***********************. * Generalized Estimating Equations- Positive expectations from Personality (with BMI). GENLIN CEOAposSP WITH BMI Open_C Consc_C Extrav_C Agree_C Neuro_C /MODEL BMI Open_C Consc_C Extrav_C Agree_C Neuro_C INTERCEPT=YES DISTRIBUTION=NORMAL LINK=IDENTITY /CRITERIA SCALE=MLE PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012 ANALYSISTYPE=3(WALD) CILEVEL=95 LIKELIHOOD=FULL /REPEATED SUBJECT=DYADNUM SORT=YES CORRTYPE=EXCHANGEABLE ADJUSTCORR=YES COVB=ROBUST MAXITERATIONS=100 PCONVERGE=1e-006(ABSOLUTE) UPDATECORR=1 /MISSING CLASSMISSING=EXCLUDE /PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION COVB WORKINGCORR /SAVE RESID.
Just like that, I spent all of 4 seconds getting a new GEE model analyzed and results printed. Super efficient. During the analysis session above, I went on to test alternate versions of this model 8 more times (two outcomes with five separate covariates each… because screw Type I errors). It was incredibly easy and extremely quick. Particularly when combined with some of my MS Office shortcuts. About 20 seconds of actual work to probe my data extensively. The best part? If I forgot to save my output (silly move, I know), and I wanted to do this again the next day, I could simply re-open my syntax file and re-run the entire block of syntax to regenerate the output I forgot to save. That beats going back into the point-and-click menus and re-running everything by hand any day of the week.
Another good idea for learning is to tinker around a bit. Run & save syntax for basic procedures like running descriptives, frequencies, and correlations, but create several versions of the analyses and change little things each time. This way you can see exactly what portions of the syntax are directing which parts of the analysis. Moreover, you can begin to get a feel for which parts of the syntax cause which types of errors when specified incorrectly. Trial and error is a fantastic way to learn, so mess around with your syntax until you feel confident enough to take on more complex procedures. You could also just read a book on SPSS syntax, I suppose, but what fun is that really? (Sorry, I’m a hands-on learner, so I prefer digging into the stuff directly, but do what suits you, y’know).
Once you get started, it can seem intimidating (particularly if you’re unfamiliar with the general nature of programming languages & environments). However, give it a little bit of time and effort, and you’ll be writing entire sessions of analyses in no time.
Just be sure to save your files as you proceed!
1 Clavel, F. D., Shrout, P. E., & Bolger, N. (2013) Roommate influences on drinking expectations and behavior among college students. Paper presented at the Annual meeting of the Midwestern Psychological Association. May. Chicago, IL.
Pingback: Data management with SPSS + MS Excel | Fred Clavel