LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2002, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 21 Jun 2002 09:35:33 +0200
Reply-To:     "Albrecht O. Schweikert" <schweikert@AOS-CONSULT.DE>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Albrecht O. Schweikert" <schweikert@AOS-CONSULT.DE>
Organization: T-Online
Subject:      Re: 1. Concatenating entries and deleting dupblicates; 2.
              summarising              variables
Content-Type: text/plain; charset=ISO-8859-1

Adrian, You should be able to solve your problem with proc means.

Albrecht

"Adrian Slack" <Adrian.Slack@vuw.ac.nz> schrieb im Newsbeitrag news:002601c218e6$26d4a510$3e4fc382@rh62201c... > Hello. > I'm a new SAS user, so please keep in mind my inexperience when reading > through my query. > I've been given a dataset with costs incurred over a period of time > (described more below) and need to clean it up before I run an analysis. > > Each subject has a unique identifier. On a particular day, a subject may > have incurred several costs which have been grouped under four categories > (cost diary sections 1 - 4) relating to what activity the cost was incurred > because of. Each activity incurring a cost is recorded as a seperate entry. > Sections are recorded as seperate files, but can be merged by subject id (I > plan to prepare each section then merge them before analysis). > > Qn 1. > There is some duplication of information within a section. For example, a > person may record having travelled to do a number of activities within a > diary section (ie activities of the same category). The person may have > recorded travel costs multiple times. Simply summing the costs for each > person will double count the travel costs (where two activities have been > done with one trip). > > So I would like a way of ensuring travel costs relating to a particular trip > (on which a number of activities may have been performed) are counted only > once. (Not all people have included travel information multiple times where > they did a number of activities on the same trip). I have the following > variables which I think might be used (SAS variable names are in brackets). > The variables arel numeric, except for subject id which is alphanumeric. > Subject id (id) > Date (date) > Transport used (trans) > Cost of travel (transc) > Kilometres travelled (transkm) > > However, I need to retain the cost data related to each activity (recorded > as seperate variables; not listed above) as I need to calculate travel cost, > activity type (a_type) and activity cost (a_cost) . As an example a couple > of entries may look as follows. > id date transc transkm a_type a_cost > A1 01/12/2000 15 20 1 20 > A1 01/12/2000 15 20 2 5 > A1 02/12/2000 10 10 2 5 > A2 02/12/2000 10 10 2 5 > B1 02/12/2000 10 10 2 5 > > The output could be as follows: > id date transc transkm a_type a_cost > A1 01/12/2000 15 20 1 20 > A1 01/12/2000 . . 2 5 > A1 02/12/2000 10 10 2 5 > A2 02/12/2000 10 10 2 5 > B1 02/12/2000 10 10 2 5 > > I would prefer not to have to lose activity type, but if necessary could go > with a revised dataset that looked like: > id date transc transkm a_type a_cost > A1 01/12/2000 15 20 25 > A1 02/12/2000 10 10 2 5 > A2 02/12/2000 10 10 2 5 > B1 02/12/2000 10 10 2 5 > > Is this feasible? Or are more variables required (there are a couple of > others, but I'm not so confident that the same trip can be pinned down using > these variables for all people). > > Qn 2. > This questions is not related solely to this project (I realise their may be > different ways of getting around the problem for this example). > When I do a proc summary to get total per person costs (by id) I lose a lot > of variables. That is, the proc summary reports id, frequency and the summed > cost, but drops things such as date or transkm (there are other variables > such as demographic data). This data is required for subsequent analysis. > The problem is that when I try and merge the summarised data with another > dataset (such as that holding the demographic data or other sections of the > diary) the summarised variables appear according to the frequency. So for > the example above, subject A1 would have total transport costs of 15 and > activity costs of 25, giving an overall cost of 40. The frequency for this > sum would be 3 (one transport cost observation and two activity cost > obervations). > 2a) So, is there a way to keep the unsummarised variables (where this are > identical across the observations, eg person's ethnicity; it won't > work/wouldn't be sensible for something like the activity type where the > values aren't identical). > 2b) Is there a way to turn the summed variable into a single observation in > SAS. The way I've done it for now is to export the file to Excel and > reimport it so that the frequency is no longer 'attached' to the summed > variable. > > If any of this could do with some clarification, please get back to me. > > Thanks in advance. > > Regards, > > Adrian. > -------------------------------------------------------------------------- -- > ------ > > Health Services Research Centre : Te Hikuwai Rangahau Hauora > > DDI +64 4 463 6524 > Fax +64 4 463 6568 > E-mail adrian.slack@vuw.ac.nz > Address Level 6, Rutherford House > 23 Lambton Quay > PO Box 600 > Wellington > NEW ZEALAND > > The Health Services Research Centre is a joint venture of the Institute of > Policy Studies, Victoria University of Wellington and the Wellington School > of Medicine, University of Otago. http://www.vuw.ac.nz/hsrc/


Back to: Top of message | Previous page | Main SAS-L page