LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2005)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Wed, 15 Jun 2005 18:21:18 -0400
Reply-To:   Dwyer Ted <DWYERT@pcsb.org>
Sender:   "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:   Dwyer Ted <DWYERT@pcsb.org>
Subject:   Re: embedded codes in my Data problems
Content-Type:   text/plain; charset="iso-8859-1"

Marta,

Thank you your solution worked (metapad)

The "offical counts" and the resulting counts after I opened up the file in metapad and saved it were consistent. I will be scrutinizing the data closer tomorrow, however the program seems to have successfully stripped the offending characters.

Ted

-----Original Message----- From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Marta García-Granero Sent: Wednesday, June 15, 2005 11:03 AM To: SPSSX-L@LISTSERV.UGA.EDU Subject: Re: embedded codes in my Data problems

Hi Ted

I had a somewhat related problem a lot of time ago (exporting Amiga documents to a PC with Windows). A lot of control codes (ASCII values under 32) where embedded in the text documents. We wrote a tiny BASIC program that read sequentially the files and replaced any byte under 32 with a " " (ASCII code 32). Then we were able to read the texts files with Word (and add all the lost format again). I have lost that program, but I don't think it will be difficult to write, I wasn't an expert then (nor now), but it took me less than half an hour (and a GWBASIC manual at hand) to write it. In pseudocode, it went more or less like this:

- Ask for the input filename & the output filename - OPEN first filename as INPUT and 2nd as OUTPUT - WHILE not EOF - READ a byte from the first file - IF the value was under 32, replace it by 32 (a blank) - WRITE the byte in 2nd file - WEND. - CLOSE both files - END program.

You can also try METAPAD. It's a sort of Notepad program, but it's able to read greater files, and eliminates codes it can't translate to characters authomatically (it issues a warning about non readable characters and nulls).

It can be downloaded from: http://liquidninja.com/metapad/

HTH, Marta mailto:biostatistics@terra.es

DT> I have multiple large data files sometimes with millions of records but DT> usually with only about 100K+ or so.

DT> Sometimes (and with a recent alarming increase) they have command codes DT> embedded that SPSS sees as end of file or end of record commands.

DT> When I look at the file with a text editor I can see nothing DT> When I look with a hex editor there are a variety of different codes.

DT> The only way that I can go through and eliminate the codes is using the DT> hex editor which is a painstaking process which I would like to avoid.

DT> Does anyone know a method of stripping out embedded codes?

DT> The files are too big for excel. (Excel has the clean command which has DT> worked in the past but only for the smaller of my datasets.)

DT> Access has trouble with the files as well.


Back to: Top of message | Previous page | Main SPSSX-L page