LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2006, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 13 Dec 2006 19:39:45 +0100
Reply-To:     Rune Runnestø <rune@FASTLANE.NO>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Rune Runnestø <rune@FASTLANE.NO>
Subject:      Re: trying to fix a data file with bad quality
Comments: To: sas-l@uga.edu

Thanks a lot for the code adjustment. Now the first line of the SAKSTITTEL and DOKTITTEL is preserved. But the second one is gone. Is it possible at all to preserve the second line as well ? Either on its own dataline or manipulate the text so that the final first line contains both the first and second original lines (concatenated) ?

Regards, Rune

""Terjeson, Mark"" <Mterjeson@RUSSELL.COM> wrote in message news:16FD64291482A34F995D2AF14A5C932C015A6EBD@MAIL002.prod.ds.russell.com... > Hi Rune, > > > Very clever and handy use of formats to have very > few tasks(statements) achieving your goal. Cute. > > > Two answers for you. > > > 1) IF, the missing SEK.KODE: in the third group > was just an oversight when making the sample data > then your code works great once you have SEK.KODE: > in the third group. > > > 2) IF, the missing SEK.KODE: in the third group > is real data then the following adjustments make > your code work fine for missing SEL.KODE: and > missing SAKSTITTEL:. If you encounter other labels > that become missing, you will have to adjust again. > > > Notably, testing for only the first occurance of > j6 i12 avoids overwriting r7 with the second line. > > > > > > > data _null_; * eller 'data newfile' viss ein ønskjer å laga eit datasett; > file jn_ny; > infile datalines dlm = '~' firstobs = 2 eof = eof pad; > array _r[12] $130; > _r[7] = ''; * reset ; > do j = 1 by 1 until(_r[i] eq: '-'); > input @1 i idx. @1 _r[i] $char80.; > if j eq 6 and i eq 12 and _r[7] eq '' then > do; > _r[7] = _r[i]; > substr(_r[7],1,11) = putn(7,'idlbl.'); > end; > end; > do i = 1 to dim(_r)-1; > if missing(_r[i]) then _r[i] = put(i,idlbl.); > put _r[i] $char140.; > end; > return; > > eof: > i = 1; > put i idlbl.; > stop; > return; > datalines; > --------------------------- > SAK NR.: 1998/00047 > JOURN.DATO: > ARKIV: 634.4 > SEK.KODE: > AVS/MOT: > This text should have company of a label 'SAKSTITTEL:' > DOKTITTEL: > SAKSANSV.: XX/ADM/NS > GRAD: > L.NR: 0 > --------------------------- > SAK NR.: 1998/00009 - 1 > JOURN.DATO: 12.10.1998 > ARKIV: MIDLERTIDIG > SEK.KODE: > AVS/MOT: > SAKSTITTEL: all labels are present in this record > DOKTITTEL: and they occur on just one line each > SAKSANSV.: XX/HAF/LKG > GRAD: > L.NR: 1998000011 > --------------------------- > SAK NR.: 1998/00010 - 1 > JOURN.DATO: 12.10.1998 > ARKIV: 501 > AVS/MOT: > SAKSTITTEL: This is the first line of the SAKSTITEL here and > this is the second line of the SAKSTITTEL field > DOKTITTEL: This is the first line of the DOKTITTEL field and > this is the second line of the DOKTITTEL field > SAKSANSV.: XX/PM/JLH > GRAD: > L.NR: 1998000012 > --------------------------- > ; > run; > > > > > > > > Hope this is helpful. > > > Mark Terjeson > Senior Programmer Analyst, IM&R > Russell Investment Group > > > Russell > Global Leaders in Multi-Manager Investing > > > > > > > > > > > > > > -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Rune > Runnestø > Sent: Tuesday, December 12, 2006 12:46 AM > To: SAS-L@LISTSERV.UGA.EDU > Subject: trying to fix a data file with bad quality > > Hi, > I am trying to repair a file with bad quality. The file has 3 records, > separated by '----------'. > Thie file has labels with appurtenant data values. The quality problem is > that the label 'SAKSTITTEL:' > is missing in the first record. The code I have made below, fix this > problem, but it creates the problem of omitting the one of the two lines > which can occur for the fields SAKSTITTEL and DOKTITTEL. > If the SAKSTITTEL: label is missing, I have observed that the > corresponding data value is just in one line. > > I have wondered it a regular expression could help me out of this, but I > haven't figured out how. > > Can anyone crack this nut ? > > Regards, Rune > > > > proc format; > invalue idx > '-----------' =1 > "SAK NR.:" =2 > "JOURN.DATO:" =3 > "ARKIV:" =4 > "SEK.KODE:" =5 > "AVS/MOT:" =6 > "SAKSTITTEL:" =7 > "DOKTITTEL:" =8 > "SAKSANSV.:" =9 > "GRAD:" =10 > "L.NR:" =11 > ' ' =12 > ; > value idlbl > 1='---------------------------' > 2="SAK NR.:" > 3="JOURN.DATO:" > 4="ARKIV:" > 5="SEK.KODE:" > 6="AVS/MOT:" > 7="SAKSTITTEL:" > 8="DOKTITTEL:" > 9="SAKSANSV.:" > 10="GRAD:" > 11="L.NR:" > 12=' ' > ; > run; > > filename jn_ny "d:\myfile.txt"; > data _null_; * eller 'data newfile' viss ein ønskjer å laga eit datasett; > file jn_ny; > infile datalines dlm = '~' firstobs = 2 eof = eof pad; > array _r[12] $130; > do j = 1 by 1 until(_r[i] eq: '-'); > input @1 i idx. @1 _r[i] $char80.; > * put j= i= _r[i]=; > if j eq 6 and i eq 12 then do; > _r[7] = _r[i]; > substr(_r[7],1,11) = putn(7,'idlbl.'); > end; > end; > do i = 1 to dim(_r)-1; > if missing(_r[i]) then _r[i] = put(i,idlbl.); > put _r[i] $char140.; > end; > return; > eof: > i = 1; > put i idlbl.; > stop; > return; > datalines; > --------------------------- > SAK NR.: 1998/00047 > JOURN.DATO: > ARKIV: 634.4 > SEK.KODE: > AVS/MOT: > This text should have company of a label 'SAKSTITTEL:' > DOKTITTEL: > SAKSANSV.: XX/ADM/NS > GRAD: > L.NR: 0 > --------------------------- > SAK NR.: 1998/00009 - 1 > JOURN.DATO: 12.10.1998 > ARKIV: MIDLERTIDIG > SEK.KODE: > AVS/MOT: > SAKSTITTEL: all labels are present in this record > DOKTITTEL: and they occur on just one line each > SAKSANSV.: XX/HAF/LKG > GRAD: > L.NR: 1998000011 > --------------------------- > SAK NR.: 1998/00010 - 1 > JOURN.DATO: 12.10.1998 > ARKIV: 501 > AVS/MOT: > SAKSTITTEL: This is the first line of the SAKSTITEL here and > this is the second line of the SAKSTITTEL field > DOKTITTEL: This is the first line of the DOKTITTEL field and > this is the second line of the DOKTITTEL field > SAKSANSV.: XX/PM/JLH > GRAD: > L.NR: 1998000012 > --------------------------- > ; > run; > > /* > The code seems to fix the problem with the missing label in the first > record, but it CREATES a problem in the third record by not taking care of > the values of SAKSTITTEL and DOKTITTEL in both lines: > This is part of the output (for the third record) > > --------------------------- > SAK NR.: 1998/00010 - 1 > JOURN.DATO: 12.10.1998 > ARKIV: 501 > SEK.KODE: > AVS/MOT: > SAKSTITTEL: this is the second line of the SAKSTITTEL field > DOKTITTEL: This is the first line of the DOKTITTEL field and > SAKSANSV.: XX/PM/JLH > GRAD: > L.NR: 1998000012 > --------------------------- > */


Back to: Top of message | Previous page | Main SAS-L page