Date: Wed, 10 Jan 2007 23:22:53 -0800
Reply-To: David L Cassell <davidlcassell@MSN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: David L Cassell <davidlcassell@MSN.COM>
Subject: Re: Copy of dataset corrupted with OS tools
In-Reply-To: <1168444588.948987.48440@o58g2000hsb.googlegroups.com>
Content-Type: text/plain; format=flowed
rolandberry@HOTMAIL.COM replied:
>
>xav wrote:
> > Hello Ed;
> >
> > If my return code <$?> of my cp command is equal to 0
> > I assume that the cp command is ok.
> >
> > $> cp toto.sasbdat titi.sas7bdat
> > $> echo $?
> >
> > If it's an "important" dataset I'am using "proc compare" too.
> >
> > Xavier
> >
> >
> >
> > Ed Notari a écrit :
> > > Hello folks,
> > >
> > > We recently had a moderately large dataset that corrupted during a
>copy.
> > > We run an Alpha system with Tru64 UNIX v5.1A. The dataset was just
> > > created on a RAID 5 volume and was copied ("cp" command) to another
>RAID
> > > 5 volume on the same SAN.
> > >
> > > The corruption was insidious (19 records out of 80,000,000) and
> > > clustered, as far as I can tell. The byte size of the files were
> > > identical, and a simple PROC CONTENTS doesn't show anything odd (no
> > > surprise). None of the logs (binary.errlog, etc) showed any odd
> > > behavior during the time the file was transferred or afterward.
> > >
> > > The upshot of all of this is;
> > >
> > > 1) What are SAS folk using to assure that copying datasets has worked?
> > > 2) Do I need to "touch" each record to verify the starting dataset is
> > > the same as the ending dataset?
> > >
> > >
> > >
> > >
> > > Ed Notari
> > > Transmissible Diseases Department
> > > Jerome H. Holland Laboratory
> > > American Red Cross
>
>This is very worrying for people handling clinical data on a
>Unix/Linux/AIX platform and those who do so should TAKE CAREFUL NOTE of
>this problem. If it is not practical to do a "proc compare" for every
>dataset copied then perhaps, at the very least, the "sum" command could
>be used within the script doing the copying to ensure the checksums of
>the original and copied file are the same.
>
>If in this situation it would be a very good idea to copy this email up
>your line management to make sure they are aware of this problem so
>they can ensure a technical solution can be put in place to prevent a
>bad copy ever going undetected.
>
>Roland
I'd go farther than that.
This should be very worrying for people handling *any* valuable data,
on *any* platform. The bigger the platform, the more tools there
are in place to address this kind of problem.
Of course, for those people keeping their critical data in Excel
spreadsheets, there are bigger things to worry about first.
HTH,
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330
_________________________________________________________________
From photos to predictions, The MSN Entertainment Guide to Golden Globes has
it all. http://tv.msn.com/tv/globes2007/?icid=nctagline1