Date: Sat, 14 Jun 2008 23:29:42 GMT
Reply-To: Lou <lpogoda@VERIZON.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Lou <lpogoda@VERIZON.NET>
Subject: Re: proc append option force and variables left behind in the sand
"Aldi Kraja" <aldi@wustl.edu> wrote in message
news:48531C59.9040404@wustl.edu...
> Hi,
>
> I had the impression that if one had a base sas set A, with one variable
> k1 and I append a new set B with 3 variables that overlap the k1
> variable of set A, because of FORCE option I was expecting the new
> appended set A to have 3 variables. The example below shows that it is
> not the case. SAS drops any extra variable(s) compared to the base.
> Question: is there a solution to set data together without loosing
> variables? If I use SET statement is not economical because I have
> thousands of sets to be appended. I was hoping append was very quick,
> but I it surprised me to see so many variables left behind in the sand.
The documentation clearly states that variables not on the base dataset will
be dropped. There are a couple of potential problems with doing as you
suggest. One you've already stumbled over - a new variable won't show up in
the appended dataset. The other is that a given variable name could appear
on multiple datasets but have different attributes - different labels,
different lengths, and/or be a character type on one and numeric on another.
Unless you're certain that identically named variables will have the same
attributes on all datasets, I think you need to do some data analysis before
proceeding.
But let's ignore that aspect for the moment. In order to append everything
together, you need to construct a base dataset that has all the variables
(with the correct attributes) that you need in the result. I'd suggest
using PROC SQL to query the dictionary.columns table to get a list of every
variable in every dataset in your input library (or libraries). The list
needs to winnowed of course, since there will be a lot of duplicates - for
numeric variables, you can just pick any of the entries in the list to serve
as your model, but for character variables, you probably want to use one
with the longest length attribute. Use the list to write ATRRIB statements
for each variable, and create an empty base dataset. From there, you can
append at will.
The only alternative that comes to mind is to use a SET statement in a
datastep. As you say, that's not efficient, but your example log shows only
one observation in the appending dataset. If this is a one-shot deal and
you're talking about only a few thousand observations altogether, maybe you
can live with the inefficiency.
> Thanks, Aldi
>
> Example 1:
> NOTE: Appending P3T3.GP3D102C3 to OUT3.ZZZSTART3P3C3.
> WARNING: Variable ZZZ017972XXX was not found on BASE file. The variable
> will not be added to the BASE file.
> NOTE: FORCE is specified, so dropping/truncating will occur.
> NOTE: There were 1 observations read from the data set P3T3.GP3D102C3.
> NOTE: The data set OUT3.ZZZSTART3P3C3 has 2 observations and 69168
> variables.
>
> Simple Example 2 (see the warning in the log and the output when you run
> this program):
> =====================================
> data A;
> input k1;
> datalines;
> 12
> ; run;
> data B;
> input k1 k2 k3;
> datalines;
> 12 13 14
> ;
> run;
> proc append base=A data=B force;
> run;
> proc print data=A;
> title "A after append procedure missing variables k2 and k3";
> run;
>
>
> --
|