LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2005)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Thu, 7 Jul 2005 17:54:00 -0400
Reply-To:   Richard Ristow <wrristow@mindspring.com>
Sender:   "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:   Richard Ristow <wrristow@mindspring.com>
Subject:   Re: Importing data: Problems with data structure from Excel file
Comments:   To: Rob <mafioso_dk@YAHOO.DK>
Comments:   cc: John Norton <jnorton@lumc.edu>
In-Reply-To:   <200507071931.j67JVhaE015762@listserv.cc.uga.edu>
Content-Type:   text/plain; charset="us-ascii"; format=flowed

At 03:31 PM 7/7/2005, Rob wrote:

>[I'm importing] data from an Excel file. The dataset includes patient >data (weight, blood pressure etc.) measured at 10 different time >points. > >Unfortunately the dataset has not been entered in "SPSS mode", but in >such way that each patient fills out 10 rows (representing the 10 time >points) with each variable accordingly ordered in its own column. >Further, one column indicates the patient's id number (thus repeated >10 times). I would of course like each line to represent one patient >with all (time) variables separated in they own columns. > >Is there anyway in SPSS (or in Excel for that matter) to avoid >manually re-entering all the data ??

John Norton's solution is right on: read the data into SPSS as it is, and there are a number of ways to restructure it. (By the way, I hope to goodness that each row gives the time that row's data was taken. And that you've thought what to do if, say, a patient's third row is missing.)

Don't assume, though, that (the appropriate) "SPSS mode" is "each line [case] representing one patient with all (time) variables separated in their own columns [variables]". The question's been discussed a lot on this list. "Each line representing [all data for] one patient," is now being called 'wide' data organization. The organization in your Excel spreadsheet (or, the same transferred to SPSS) is called 'long' organization.

The developing consensus (I'm one of those arguing for it) is that 'long' is usually better. In brief, - Fewer variables are always easier to keep track of and manage. (More records - 'cases' in SPSS - add very little complexity.) - In particular, data transformations are much easier. In 'long' structure, converting weight from pounds to kg., or calculating BMI (body mass index), are one COMPUTE statement each. In 'wide' structure, they're a LOOP or DO REPEAT at best, and ten new variables to name. - Data oddities, such as missing data at occasional time points, are much easier to handle: you leave out a record as a whole, rather than using logic to decide which variables you skip in creating the 'wide' organization - As John showed, it's very easy to go to 'wide' organization for analysis, as you need to. You rarely need to transform the whole data set to 'wide'; usually, it's a subset of the variables, and a subset of the time periods. - For many analyses you'll use summary statistics: maximum blood pressure, difference of highest and lowest observed weight. Those are generally easier to calculate across SPSS 'cases' (records) than within them: hurray for AGGREGATE.

(Alas, some useful summary measures, like within-patient regression slope for weight, are very difficult with either organization, though I may be missing something.)

Hmmmm. That wasn't all that 'brief'.

It may sound academic, but in database design theory, the 'wide' organization is badly un-normalized. You don't have to make database design theorists happy, but they advocate normalization for a lot of good reasons, including those I've mentioned.


Back to: Top of message | Previous page | Main SPSSX-L page