Date: Mon, 24 Jan 2005 10:39:01 -0800
Reply-To: Elaine Pierce <namaste3@SBCGLOBAL.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Elaine Pierce <namaste3@SBCGLOBAL.NET>
Subject: Basic SAS questions
Content-Type: text/plain; charset=WINDOWS-1252; delsp=yes; format=flowed
Hello, I'm an MPH grad student at a university with no SAS support.
While we did have a lab on stat analysis, none of the preliminary
dataset cleaning/formatting/merging were covered, and I am unable to
get the programs to work despite referring to books and the online
help. So I hope that someone has the time to assist me with these
novice concerns! I'm using version 8.
First question: can someone tell me why my Proc contents output makes
no sense? You can see the # column is in a certain (correct) order, but
the "Pos" column is incorrect. This is from an Excel file that I
imported as a .csv file - when I look at the .csv file, the # is
correct, but Pos is not. Here's the Proc contents output : Note how
variable 1 "ID" begins at position 80, var 5 "VC" begins at position
"0", etc.
-----Alphabetic List of Variables and
Attributes-----
# Variable Type Len Pos
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
8 cigsperdayunmod Num 8 16
4 dob Char 8 104
19 fcigsperday Num 8 56
6 fev1 Num 8 8
16 ffev1 Num 8 48
20 fformersmok Char 8 168
21 fht Num 8 64
18 flungca Char 8 160
9 formersmok Char 8 120
23 frace Char 8 176
14 fsex Char 8 144
13 ftestdate Char 8 136
15 fvc Num 8 40
22 fwt Num 8 72
10 ht Num 8 24
1 id Char 8 80
7 lungcancer Char 8 112
12 raceunmod Char 8 128
3 sexunmod Char 8 96
2 testdate Char 8 88
5 vc Num 8 0
11 wt Num 8 32
17 zipcode Char 8 152
Second question: I can't figure out how to get the date formatting to
work. Here's my programming, and it keeps giving me error messages:
(I actually have 3 date variables in this dataset, but here I only used
the formatting once - using it for all 3 just produces more error
messages)
data herman;
infile 'C:\Documents and Settings\user\Desktop\ExpNoLabels.csv'
delimiter=',' missover;
input id $ testdate MMDDYY10. sexunmod $ dob $ vc fev1 lungcancer $
cigsperdayunmod formersmok $ ht wt raceunmod $ ftestdate $ fsex $
fvc ffev1 zipcode $ flungca $ fcigsperday fformersmok $ fht fwt
frace $;
LOG MESSAGE FOLLOWS:
9 data herman;
10
11 infile 'C:\Documents and Settings\user\Desktop\ExpNoLabels.csv'
delimiter=',' missover;
12 input id $ testdate mmddyy10. sexunmod $ dob $ vc fev1
lungcancer $
13 cigsperdayunmod formersmok $ ht wt raceunmod $ ftestdate $ fsex
$ fvc ffev1 zipcode $
13 ! flungca $ fcigsperday fformersmok $ fht fwt frace $;
14
NOTE: The infile 'C:\Documents and
Settings\user\Desktop\ExpNoLabels.csv' is:
File Name=C:\Documents and Settings\user\Desktop\ExpNoLabels.csv,
RECFM=V,LRECL=256
NOTE: Invalid data for testdate in line 2 3-12.
NOTE: Invalid data for fev1 in line 2 34-34.
NOTE: Invalid data for cigsperdayunmod in line 2 38-38.
NOTE: Invalid data for wt in line 2 47-47.
NOTE: Invalid data for fcigsperday in line 2 81-81.
NOTE: Invalid data for fwt in line 2 90-90.
RULE:
----+----1----+----2----+----3----+----4----+----5----+----6----+---
-7----+----8----+--
2
2,2/7/2002,F,5/20/1933,3150,3400,N,0,Y,63,122,C,1/30/
2004,F,2780,1960,92108,N,0,Y,63,12
88 4,C 90
id=2 testdate=. sexunmod=5/20/193 dob=3150 vc=3400 fev1=. lungcancer=0
cigsperdayunmod=.
formersmok=63 ht=122 wt=. raceunmod=1/30/200 ftestdate=F fsex=2780
fvc=1960 ffev1=92108 zipcode=N
flungca=0 fcigsperday=. fformersmok=63 fht=124 fwt=. frace= _ERROR_=1
_N_=2
NOTE: Invalid data for testdate in line 3 3-12.
3
MUCH MORE OF THE SAME OMITTED HERE...
flungca=N fcigsperday=0 fformersmok=Y fht=63 fwt=138 frace=C _ERROR_=1
_N_=26
NOTE: 696 records were read from the infile 'C:\Documents and
Settings\user\Desktop\ExpNoLabels.csv'.
The minimum record length was 89.
The maximum record length was 102.
NOTE: The data set WORK.HERMAN has 696 observations and 23 variables.
NOTE: DATA statement used:
real time 0.53 seconds
cpu time 0.24 seconds
15 proc contents;
16 run;
When I omit the mmddyy10. after variable “testdate” all these error
messages disappear. So what is the correct way to format dates in the
input list type of statement? (Note that I didn’t want to switch to the
Input statement where each var is listed @ column position, since my
column positions are wrong in the Proc contents output - See question
1.)
Question 3:
When I use the following import procedure (using an Excel file in which
names had already been removed and the actual data begins on the first
row), even though I said “Getnames=no” the variable names in the
resulting file use the first row variables values as variable names.
Any idea what to do about this?
PROC IMPORT OUT= WORK.herman
DATAFILE= "C:\Documents and Settings\user\Desktop\ExpNoLabel
s.xls"
DBMS=EXCEL2000 REPLACE;
GETNAMES=NO;
RUN;
Thanks for the help! I feel like a complete bonehead since I can’t even
get to the analysis part with these problems in the way.
Elaine Pierce