In article <rcs8.121.009B09BF@psu.edu>, firstname.lastname@example.org (Roger C. Shouse)
>Subject: Re: How to delete a bunch of observations?
>From: email@example.com (Roger C. Shouse)
>Date: Tue, 28 Jan 1997 13:48:27 UNDEFINED
>In article <firstname.lastname@example.org> email@example.com
>(Liming Cai) writes:
>>From: firstname.lastname@example.org (Liming Cai)
>>Subject: How to delete a bunch of observations?
>>Date: 28 Jan 1997 17:36:59 GMT
>>I have a dataset of N countries, each over 30 years, from 1960 to 1989.
>of the interesting
>>variables is GDP. However, for some countries, GDP was not available in
>1960. What I would
>>like to do is to delete those countries, their entire history of 30
>if they don't have a GDP data
>>in 1960. Anybody can help me on that? Thank you very much.
>Depending on how you've specified your missing data, it may be as simple
>data <newdata>;set <olddata>;
>if GDP = . then delete;
>assuming here that missing values are represented by '.'
That will delete observations where the GDP is missing. It will NOT
delete all observations for a country where at least one observation for
that country has a missing GDP.
There are a couple of ways to tackle the task. If you want to use
procedural code, you could do something like:
PROC SORT DATA = COUNTRIES
OUT = SORTED;
BY COUNTRY GDP;
DATA NOGDP (KEEP = COUNTRY);
IF FIRST.COUNTRY AND GDP = .;
MERGE COUNTRY (IN = A) NOGDP (IN = B);
IF A AND NOT B;
Alternatively, you could take an SQL approach:
CREATE TABLE GDPONLY AS
WHERE COUNTRY NOT IN (SELECT DISTINCT COUNTRY
WHERE GDP IS MISSING);