|
In article <rcs8.121.009B09BF@psu.edu>, rcs8@psu.edu (Roger C. Shouse)
writes:
>Subject: Re: How to delete a bunch of observations?
>From: rcs8@psu.edu (Roger C. Shouse)
>Date: Tue, 28 Jan 1997 13:48:27 UNDEFINED
>
>In article <5cldfr$2fg@charm.magnus.acs.ohio-state.edu> cai.23@osu.edu
>(Liming Cai) writes:
>>From: cai.23@osu.edu (Liming Cai)
>>Subject: How to delete a bunch of observations?
>>Date: 28 Jan 1997 17:36:59 GMT
>
>>Hi,
>
>
>>I have a dataset of N countries, each over 30 years, from 1960 to 1989.
One
>of the interesting
>>variables is GDP. However, for some countries, GDP was not available in
>1960. What I would
>>like to do is to delete those countries, their entire history of 30
years,
>if they don't have a GDP data
>>in 1960. Anybody can help me on that? Thank you very much.
>
>Depending on how you've specified your missing data, it may be as simple
as:
>
>data <newdata>;set <olddata>;
>if GDP = . then delete;
>
>assuming here that missing values are represented by '.'
>
>
That will delete observations where the GDP is missing. It will NOT
delete all observations for a country where at least one observation for
that country has a missing GDP.
There are a couple of ways to tackle the task. If you want to use
procedural code, you could do something like:
PROC SORT DATA = COUNTRIES
OUT = SORTED;
BY COUNTRY GDP;
DATA NOGDP (KEEP = COUNTRY);
SET SORTED;
BY COUNTRY;
IF FIRST.COUNTRY AND GDP = .;
DATA GDPONLY;
MERGE COUNTRY (IN = A) NOGDP (IN = B);
BY COUNTRY;
IF A AND NOT B;
RUN;
Alternatively, you could take an SQL approach:
PROC SQL;
CREATE TABLE GDPONLY AS
SELECT *
FROM COUNTRIES
WHERE COUNTRY NOT IN (SELECT DISTINCT COUNTRY
FROM COUNTRIES
WHERE GDP IS MISSING);
QUIT;
|