Date: Fri, 21 Jun 2002 09:35:33 +0200
Reply-To: "Albrecht O. Schweikert" <schweikert@AOS-CONSULT.DE>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Albrecht O. Schweikert" <schweikert@AOS-CONSULT.DE>
Organization: T-Online
Subject: Re: 1. Concatenating entries and deleting dupblicates; 2.
summarising variables
Content-Type: text/plain; charset=ISO-8859-1
Adrian,
You should be able to solve your problem with proc means.
Albrecht
"Adrian Slack" <Adrian.Slack@vuw.ac.nz> schrieb im Newsbeitrag
news:002601c218e6$26d4a510$3e4fc382@rh62201c...
> Hello.
> I'm a new SAS user, so please keep in mind my inexperience when reading
> through my query.
> I've been given a dataset with costs incurred over a period of time
> (described more below) and need to clean it up before I run an analysis.
>
> Each subject has a unique identifier. On a particular day, a subject may
> have incurred several costs which have been grouped under four categories
> (cost diary sections 1 - 4) relating to what activity the cost was
incurred
> because of. Each activity incurring a cost is recorded as a seperate
entry.
> Sections are recorded as seperate files, but can be merged by subject id
(I
> plan to prepare each section then merge them before analysis).
>
> Qn 1.
> There is some duplication of information within a section. For example, a
> person may record having travelled to do a number of activities within a
> diary section (ie activities of the same category). The person may have
> recorded travel costs multiple times. Simply summing the costs for each
> person will double count the travel costs (where two activities have been
> done with one trip).
>
> So I would like a way of ensuring travel costs relating to a particular
trip
> (on which a number of activities may have been performed) are counted only
> once. (Not all people have included travel information multiple times
where
> they did a number of activities on the same trip). I have the following
> variables which I think might be used (SAS variable names are in
brackets).
> The variables arel numeric, except for subject id which is alphanumeric.
> Subject id (id)
> Date (date)
> Transport used (trans)
> Cost of travel (transc)
> Kilometres travelled (transkm)
>
> However, I need to retain the cost data related to each activity (recorded
> as seperate variables; not listed above) as I need to calculate travel
cost,
> activity type (a_type) and activity cost (a_cost) . As an example a couple
> of entries may look as follows.
> id date transc transkm a_type a_cost
> A1 01/12/2000 15 20 1 20
> A1 01/12/2000 15 20 2 5
> A1 02/12/2000 10 10 2 5
> A2 02/12/2000 10 10 2 5
> B1 02/12/2000 10 10 2 5
>
> The output could be as follows:
> id date transc transkm a_type a_cost
> A1 01/12/2000 15 20 1 20
> A1 01/12/2000 . . 2 5
> A1 02/12/2000 10 10 2 5
> A2 02/12/2000 10 10 2 5
> B1 02/12/2000 10 10 2 5
>
> I would prefer not to have to lose activity type, but if necessary could
go
> with a revised dataset that looked like:
> id date transc transkm a_type a_cost
> A1 01/12/2000 15 20 25
> A1 02/12/2000 10 10 2 5
> A2 02/12/2000 10 10 2 5
> B1 02/12/2000 10 10 2 5
>
> Is this feasible? Or are more variables required (there are a couple of
> others, but I'm not so confident that the same trip can be pinned down
using
> these variables for all people).
>
> Qn 2.
> This questions is not related solely to this project (I realise their may
be
> different ways of getting around the problem for this example).
> When I do a proc summary to get total per person costs (by id) I lose a
lot
> of variables. That is, the proc summary reports id, frequency and the
summed
> cost, but drops things such as date or transkm (there are other variables
> such as demographic data). This data is required for subsequent analysis.
> The problem is that when I try and merge the summarised data with another
> dataset (such as that holding the demographic data or other sections of
the
> diary) the summarised variables appear according to the frequency. So for
> the example above, subject A1 would have total transport costs of 15 and
> activity costs of 25, giving an overall cost of 40. The frequency for this
> sum would be 3 (one transport cost observation and two activity cost
> obervations).
> 2a) So, is there a way to keep the unsummarised variables (where this are
> identical across the observations, eg person's ethnicity; it won't
> work/wouldn't be sensible for something like the activity type where the
> values aren't identical).
> 2b) Is there a way to turn the summed variable into a single observation
in
> SAS. The way I've done it for now is to export the file to Excel and
> reimport it so that the frequency is no longer 'attached' to the summed
> variable.
>
> If any of this could do with some clarification, please get back to me.
>
> Thanks in advance.
>
> Regards,
>
> Adrian.
> --------------------------------------------------------------------------
--
> ------
>
> Health Services Research Centre : Te Hikuwai Rangahau Hauora
>
> DDI +64 4 463 6524
> Fax +64 4 463 6568
> E-mail adrian.slack@vuw.ac.nz
> Address Level 6, Rutherford House
> 23 Lambton Quay
> PO Box 600
> Wellington
> NEW ZEALAND
>
> The Health Services Research Centre is a joint venture of the Institute of
> Policy Studies, Victoria University of Wellington and the Wellington
School
> of Medicine, University of Otago. http://www.vuw.ac.nz/hsrc/
|