| Date: | Wed, 15 Jun 2005 18:21:18 -0400 |
| Reply-To: | Dwyer Ted <DWYERT@pcsb.org> |
| Sender: | "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU> |
| From: | Dwyer Ted <DWYERT@pcsb.org> |
| Subject: | Re: embedded codes in my Data problems |
| Content-Type: | text/plain; charset="iso-8859-1" |
Marta,
Thank you your solution worked (metapad)
The "offical counts" and the resulting counts after I opened up the file in metapad and saved it were consistent. I will be scrutinizing the data closer tomorrow, however the program seems to have successfully stripped the offending characters.
Ted
-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Marta García-Granero
Sent: Wednesday, June 15, 2005 11:03 AM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: Re: embedded codes in my Data problems
Hi Ted
I had a somewhat related problem a lot of time ago (exporting Amiga
documents to a PC with Windows). A lot of control codes (ASCII values
under 32) where embedded in the text documents. We wrote a tiny BASIC
program that read sequentially the files and replaced any byte under
32 with a " " (ASCII code 32). Then we were able to read the texts
files with Word (and add all the lost format again). I have lost that
program, but I don't think it will be difficult to write, I wasn't an
expert then (nor now), but it took me less than half an hour (and a
GWBASIC manual at hand) to write it. In pseudocode, it went more or
less like this:
- Ask for the input filename & the output filename
- OPEN first filename as INPUT and 2nd as OUTPUT
- WHILE not EOF
- READ a byte from the first file
- IF the value was under 32, replace it by 32 (a blank)
- WRITE the byte in 2nd file
- WEND.
- CLOSE both files
- END program.
You can also try METAPAD. It's a sort of Notepad program, but it's
able to read greater files, and eliminates codes it can't translate to
characters authomatically (it issues a warning about non readable
characters and nulls).
It can be downloaded from: http://liquidninja.com/metapad/
HTH,
Marta mailto:biostatistics@terra.es
DT> I have multiple large data files sometimes with millions of records but
DT> usually with only about 100K+ or so.
DT> Sometimes (and with a recent alarming increase) they have command codes
DT> embedded that SPSS sees as end of file or end of record commands.
DT> When I look at the file with a text editor I can see nothing
DT> When I look with a hex editor there are a variety of different codes.
DT> The only way that I can go through and eliminate the codes is using the
DT> hex editor which is a painstaking process which I would like to avoid.
DT> Does anyone know a method of stripping out embedded codes?
DT> The files are too big for excel. (Excel has the clean command which has
DT> worked in the past but only for the smaller of my datasets.)
DT> Access has trouble with the files as well.
|