Date: Thu, 28 Nov 2002 19:29:12 -0800
Reply-To: Bruce Bradbury <BruceBrad@INAME.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Bruce Bradbury <BruceBrad@INAME.COM>
Organization: http://groups.google.com/
Subject: Most efficient way to subset a large dataset
Content-Type: text/plain; charset=ISO-8859-1
What is the most efficient way to subset a large dataset using records
from another dataset as keys?
Eg dataset SMALL contains a variable ID and is sorted by ID (one
record per ID). Dataset LARGE contains ID plus other variables, and is
also sorted by ID. It is very large: millions of observations and 100
variables. It contains multiple records for each ID. I want to extract
all records from LARGE if they match a record in SMALL. I know I can
do this by using
Data EXTRACT;
merge small (in=insmall) large;
by ID;
if insmall;
run;
Is there are more efficient way of doing this? Eg by using an index
and point operations? Is proc sql likely to be faster?
Any comments welcome.
Bruce Bradbury
[please reply via group]