Date: Sun, 6 Mar 2011 10:01:08 -0500
Reply-To: Arthur Tabachneck <art297@ROGERS.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Arthur Tabachneck <art297@ROGERS.COM>
Subject: Re: how to treat consecutive delimiters as one when importing txt
files
Content-Type: text/plain; charset=ISO-8859-1
It's too early. I, of course, meant to say:
>dropping that combined with your use of the MISSOVER option on your infile
>statement, may have corrected the problems.
not
>dropping that combined with your use of the missing option on your infile
>statement, may have corrected the problems.
Art
-------
On Sun, 6 Mar 2011 09:52:03 -0500, Arthur Tabachneck <art297@ROGERS.COM>
wrote:
>Nuria,
>
>I'm still waiting to hear from others who are more familiar with how SAS
>treats a mixed combination of spaces and tab characters. However, while I
>only manually looked at a couple of your records, I did misspeak a little.
>
>Of course, you don't only have six variables. You have 4 numeric
variables,
>followed by a date, and then followed by 26 more numeric variables. And,
>yes, every field is separated by a tab, but may also be separated by spaces
>as well.
>
>I did, though, see some extra problems that you will have to account for.
>You don't only have two hyphens representing missing values but, rather,
two
>hyphens followed by two tab characters (at least in the first record).
That
>seemed to how true for every variable except for the last variable. It had
>2 hyphens, followed by 3 spaces, a tab character, and then a carriage
return
>and line feed.
>
>If tabs are treated as spaces when one doesn't specify dlm, then your
>dropping that combined with your use of the missing option on your infile
>statement, may have corrected the problems.
>
>I, though, would definitely hand inspect at least some randomly selected
>records, from at least some of your files, to gain some validation that the
>data is being read as anticipated.
>
>Art
>--------
>On Sun, 6 Mar 2011 03:40:11 -0800, =?iso-8859-1?Q?N=FAria_Chapinal?=
><nchapinal@YAHOO.COM> wrote:
>
>>I hadn't realized that some fields where separated by just spaces and no
>tabs. The file looks good to me as it is now, but I could be glossing over
>some flaws, indeed.... Is there any way to specified the delimiter as
either
>spaces or tabs, so that if in any instance there�are spaces only
separating
>columns, those�2 columns are not merged in the SAS dataset?
>>�
>>thanks!
>>
>>
>>�
>>N�ria Chapinal
>>Postdoctoral fellow
>>Animal Welfare Program, University of British Columbia
>>Department of Population Medicine, University of Guelph
>>
>>
>>
>>--- On Sat, 3/5/11, Arthur Tabachneck <art297@ROGERS.COM> wrote:
>>
>>
>>From: Arthur Tabachneck <art297@ROGERS.COM>
>>Subject: Re: how to treat consecutive delimiters as one when importing txt
>files
>>To: SAS-L@LISTSERV.UGA.EDU
>>Received: Saturday, March 5, 2011, 11:06 PM
>>
>>
>>I have seen a sample of Nuria's raw data, but can't find anything specific
>>in the documentation.
>>
>>Nuria has six fields, four numeric, followed by a date, and then one
>numeric
>>field.� Actual missing data are represented by two hyphens.
>>
>>Each field is delimited by either one or more spaces, one or more spaces
>and
>>a tab character, or just a tab character.
>>
>>I was going to suggest the infile extendtabs option, but then Nuria
>>mentioned that removing the dsd option solved the problem.
>>
>>Her code was:
>>
>>filename FT46F001 'C:\Users\nuria\Desktop\am\fc\*.txt' lrecl=256;
>>data fresh;
>>� length fname $256;
>>� infile FT46F001 /*dsd*/ dlm='09'x eov=eov filename=fname noexpandtabs
>>missover;
>>� input @;
>>� if _n_ eq 1 or eov then do;
>>� � *filename=fname;
>>� � date = input(scan(fname,-2,' '),mmddyy10.);
>>� � retain filename date;
>>� � eov = 0;
>>� � input #5;
>>� � delete;
>>� end;
>>� input (index cow lact group)(:??) cdate :mmddyy10. (dim c1-c26 )(:??);
>>� format date cdate mmddyy10.;
>>run;
>>
>>In the case where she is using noexpandtabs, are tabs considered the same
>as
>>blanks?
>>
>>Art
>>------
>>On Sat, 5 Mar 2011 18:31:55 -0600, Data _null_; <iebupdte@GMAIL.COM>
wrote:
>>
>>>Nat,
>>>
>>>Nuria, is use ?? in input statement to suppress the invalid data
>>>reports.� From what I now understand use DLM without DSD should solve
>>>the problem.
>>>
>>>Nuria,� you might what to investigate just which lines have two tabs
>>>in a row.� Something like this untested code.
>>>
>>>input;
>>>if index(_infile_,'0909'x) then list;
>>>
>>>On Sat, Mar 5, 2011 at 6:19 PM, Nat Wooding <nathani@verizon.net> wrote:
>>>> Nuria
>>>>
>>>> If SAS is seeing -- in a numeric field and turns it into a decimal, you
>>>> should see a note saying that there was invalid data between column ...
>>and
>>>> column ...
>>>>
>>>> Could you supply a few sample lines of input data and show what the
>>result
>>>> should look like?
>>>>
>>>> Nat Wooding
>>>>
>>>> -----Original Message-----
>>>> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
>>N�ria
>>>> Chapinal
>>>> Sent: Saturday, March 05, 2011 1:49 PM
>>>> To: SAS-L@LISTSERV.UGA.EDU
>>>> Subject: Re: how to treat consecutive delimiters as one when importing
>>txt
>>>> files
>>>>
>>>>
>>>>
>>>> I deleted the DSD option and it looks like it is working fine. I do
>thave
>>>> missing values, but they appear as '--' in the txt files and then as
>>periods
>>>> in the final SAS dataset. So I guess I can use that to differentiate
>them
>>>> from 2 consecutive tabulations ,right? I think it is what the code is
>>doing
>>>> right now.� Is there any specific piece in my code that is specifying
>>>> "change -- into ."? I do not think so. Is it because I am specifying
>>>> numerical format and then anything that is not numerical becomes a .?
>>>> I am still worried that missing values and consecutive tabulations can
>>get
>>>> mixed up. Can we add something else to make use of the -- and avoid
>>>> confusion?
>>>>
>>>>
>>>>
>>>> N�ria Chapinal
>>>> Postdoctoral fellow
>>>> Animal Welfare Program, University of British Columbia
>>>> Department of Population Medicine, University of Guelph
>>>>
>>>>
>>>>
>>>> --- On Sat, 3/5/11, Arthur Tabachneck <art297@ROGERS.COM> wrote:
>>>>
>>>>
>>>> From: Arthur Tabachneck <art297@ROGERS.COM>
>>>> Subject: Re: how to treat consecutive delimiters as one when importing
>>txt
>>>> files
>>>> To: SAS-L@LISTSERV.UGA.EDU
>>>> Received: Saturday, March 5, 2011, 1:37 PM
>>>>
>>>>
>>>> Nuria,
>>>>
>>>> Ignoring multiple delimiters also implies that you don't have any
>missing
>>>> values.� If you don't have any missing values, then you could probably
>do
>>>> what you want by including code to change the tabs into blanks and then
>>use
>>>> the & modifier.
>>>>
>>>> Art
>>>> --------
>>>> On Sat, 5 Mar 2011 13:17:35 -0500, Nuria Chapinal <nchapinal@YAHOO.COM>
>>>> wrote:
>>>>
>>>>>On Tue, 1 Mar 2011 11:54:33 -0500, Nuria Chapinal <nchapinal@YAHOO.COM>
>>>>>wrote:
>>>>>
>>>>>>Thanks a lot for all your help.
>>>>>>
>>>>>>This is one of the codes for one of the batches:
>>>>>>
>>>>>>
>>>>>>filename FT46F001 'C:\Users\nuria\Desktop\am\fc\*.txt' lrecl=256;
>>>>>>data fresh;
>>>>>>���length fname $256;
>>>>>>���infile FT46F001 dsd dlm='09'x eov=eov filename=fname noexpandtabs
>>>>>missover;
>>>>>>���input @;
>>>>>>���if _n_ eq 1 or eov then do;
>>>>>>� � � *filename=fname;
>>>>>>� � � date = input(scan(fname,-2,' '),mmddyy10.);
>>>>>>� � � retain filename date;
>>>>>>� � � eov = 0;
>>>>>>� � � input #5;
>>>>>>� � � delete;
>>>>>>� � � end;
>>>>>>input (index cow lact group)(:??) cdate :mmddyy10. (dim c1-c26 )(:??);
>>>>>>format date cdate mmddyy10.;
>>>>>>���run;
>>>>>
>>>>>Hi,
>>>>>
>>>>>I ran into a small trouble when using the codes above. Some times there
>>>>>are 2 tabulations in a row separating columns. I would need an option
>>that
>>>>>did the same as the option "treat consecutive delimiters as one" in
>>>>>Excel...
>>>>>
>>>>>Thanks in advance!
>>>>
>>
|