LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (November 2006, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 9 Nov 2006 11:48:31 -0800
Reply-To:     shiling99@YAHOO.COM
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         shiling99@YAHOO.COM
Organization: http://groups.google.com
Subject:      Re: find and drop empty variables
Comments: To: sas-l@uga.edu
In-Reply-To:  <200611090925.kA93FHrr002750@malibu.cc.uga.edu>
Content-Type: text/plain; charset="us-ascii"

The probility for a USUAL variable with all missing values is very small. I think it is a good way to select a small sample from your LARGE data set (gigbytes). You can quickly find out a small list of variables with all missing value in your sample data. Then proc freq, proc mean against whole data set for these variables ONLY. This will save you a lot of time when you work on LARGE data set. Attached below is a sample pgm to illustrate the idea and you may modify for your work.

HTH

data t1; array nv(*) n1-n30; na=.; array cv(*) $ c1-c30; ca=' '; do i = 1 to 1000000; do j=1 to dim(nv); nv(j)=rannor(123); end; do j=1 to dim(cv); cv(j)=compress(put(rannor(123),3.)); if cv(j)='0' then cv(j)=' '; end; output; end; keep n: c:; run;

proc format library=work; invalue _c ' '=. other=1; run;

proc sql ; select '_'||trim(name)||'=input('||trim(name)||',_c.)' into : cvarlist separated by ';' from sashelp.vcolumn where libname='WORK' and memname='T1' and type='char' ; quit;

data t1v/view=t1v; do i=1 to nobs by 1000; set t1 point=i nobs=nobs; &cvarlist ; output; end; stop; ; run;

proc means data=t1v noprint; var _numeric_; output out=varlist(drop=_freq_ _type_) n=; run;

data _null_; set varlist; array allv(*) _numeric_; do i = 1 to dim(allv); vname=vname(allv(i)); if substr(vname,1,1)='_' then vname=substr(vname,2); if allv(i)=0 then put 'var with all missing value in 1/1000 sample --->' vname; end; run;

Alexander Norderwind wrote: > I have the problem that a big database I import to SAS with proc sql and > ODBC has about thousand variables (fields) and lots of them are empty all over. > Thus I would like dropping them to get rid of this ballast. > Therefore I first have to find out, which of them really are empty. > Does anybody have a suggestion how to solve my problem? > Would be nice! > Xander


Back to: Top of message | Previous page | Main SAS-L page