Restructure Data
The Restructure function is often useful when
dealing with data which is in
Long format and one needs the data in Wide format,
or vice versus. Long format refers to data in which each observation or
participant has multiple rows. Wide format refers to data in which each
observation or participant has only one row. This tutorial will focus
on using the restructure function to change data from long format to
wide format. Longitudinal research is an example of the type of
research which often creates data files in long format.
For the duration of this tutorial we will be using
the
LongExample.sav
file; which contains 10 participants measured on 3 outcome variables
under 10 different conditions. This data was generated using
this
script in R.
Begin by importing the above data file (LongExample.sav)
and briefly examine the data.
Notice, we have a 'code' variable which simply
assigns a sequential number to each row of data. Then, there is a
'participant.id' variable in which each number represents a participant
(or case, or observation). Next, we have a series of categorical
variables which each identify a condition of our study. The first two
such variables, 'x1.numbers' and 'x1.letters' can be thought of as
identifying the 10 different times of measurement using either numbers
or letters respectively as identifiers. The next two variables
('x2.numbers' & 'x2.letters') can be thought of as representing
5 different conditions of our study (e.g. 5 different therapy types, 5
different treatment drugs, 5 different locations, etc.). The next two
variables ('x3.numbers' & 'x3.letters') can be thought of as
representing 2 different conditions of our study, similar to the x2
variables; but with only two levels. The final three variables are our
interval / ratio outcome variables.
So, currently there are 100 rows of data with 10
participants, each measured 10 different times and at each time of
measure they were exposed to a unique set of x2 and x3 conditions and
measured with three instruments.
Our goal here is to use the Restructure function
to transform the format of the data file from its current Long format
(each participant has 10 rows), to a short format (where each
participant has one row) while still retaining all
the information contained in the original data file.
Start by clicking on Data in the tool bar. Next,
click on Restructure...
Next, select "Restructure selected cases into
variables" option which is emphasized with a red
ellipse here. Then click the Next > button.
Step 2; highlight / select the participant.id
variable and use the top arrow button to move it to the Identifier
Variable(s): box. Then click the Next > button.
Step 3; we do not need to
change the Sort option; we can allow SPSS to sort the data by the
identifier -- which will list each participant sequentially from 1 to
10 as rows in our new data file. Click the Next > button to
continue.
Step 4; select "Group by index(for example: w1 h1,
w2 h2, w3 h3" which is emphasized here with a red
rectangle. This option will allow us to keep each unique
combination of x1, x2, x3 conditions' outcome scores separated from
each group of outcome scores associated with every other unique
combination of x1, x2, x3 conditions. Click the Next > button to
continue.
Lastly, select the "Paste the syntax generated by
the wizard into a syntax window" option which is emphasized here with a
red ellipse.
After selecting the paste option, click the Finish button and a warning
box will appear to let you know the sets (data sets) will still be
available in after restructuring has taken place.
Click the OK button and a syntax window will open with the generated
syntax in it.
In the syntax window, highlight all the text and
then click on the run Selection button
to run the syntax. Below
right; the word Selection has appeared because the cursor is being held
over the run Selection button.
Once the function runs, the output window will
open (if not already open) and it will contain some trivial output
showing which variables were generated (really which variables were
transposed) and it will show a Processing Statistics table which
displays the number of cases in and out, number of variables in and
out, and the number of index variables.
The new data file should resemble what is below.
Each row in the new data file corresponds to a
single participant because we used the participant.id variable as our
identifier variable. The participant.id variable is the only column
which was not changed or re-named in the restructuring. The code
variable from the original data has now been broken down into ten
columns and servers as a marker in the new data file; marking each of
the 10 segments or chunks of data. By segment or chunk, we mean each
time of measure. The x1.number and x1.letters variables also serve this
function; identifying each of the ten times of measure. Each segment
contains the unique combination of x1, x2, x3 identifiers and a unique
score on x4, x5, x6 for each participant. Sliding the cursor to the
right (in the data window of SPSS) you'll notice each time of measure,
or chunk; identified by the sequential numbers and letters (chunks from
left to right, 1 to 10 & A to J) in the columns associated with
the x1 variables. You will also notice in each chunk a unique
identifier for each of the x2 (1 to 5 & A to E) and x3 (1 to 2
& A to B) variables. Each also chunk contains scores (for each
participant) on the three outcome measures (x4, x5, x6). Having the
data in this format allows us to run repeated measures analysis and/or
compute total scores for each or any unique combination of conditions.
As is the case with all of the tutorials on this
web site, this tutorial should not be considered an exhaustive review
of the topic covered; restructure data from long to wide format.
Restructuring data from wide to long can be done by using similar
steps; simply choose "Restructure selected variables into cases" at the
initial Restructure Data Wizard dialog and follow the steps of the
wizard.
|