LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2006, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 26 May 2006 14:00:28 -0500
Reply-To:     Duck-Hye Yang <dyang@CHAPINHALL.ORG>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Duck-Hye Yang <dyang@CHAPINHALL.ORG>
Subject:      Re: grouping features based on position
Comments: To: HERMANS1@WESTAT.com
Content-Type: text/plain; charset=US-ASCII

Hi, Thanks for your help. Could you let me know how to start with sas coding?

You are right. Many districts do not have any foster-care kids. Many kids concentrate within city (Chicago) and fewer kids in its suburban areas. 409 areas (elementary catchment areas, but I will consider them as school districts) constitutes Chicago. 118 areas (school districts) constitute the suburb. When I mapped the kids locations, it is not that skewed. Drawing boundaries are reasonable, mainly because we have as many as 527 districts, compared to 594 kids.

The issue here is to assign an approximately equal number of Abused/Neglected kids to each judge, who goes to the same building. So, minimizing distance between child and judge location is not the issue. So, outliers are not issue.

As Richard pointed out earlier, one important crieterion is contiguity.

Can I use PROC CLUSTER? My concern is: How can I ensure the even distribution of A/N kids into each of 13 clusters? And how to ensure each cluster boundary is contiguous?

I tried to write a code: FREQ is used as weight using the number of A/N kids in district. How about districts that have zero kids?

proc cluster data=districts print=15 outtree=ward method=ward pseudo CCC;id district_id;var x y; freq n_kids; run; proc tree data=ward out=clusters ncl=13 horizontal spaces=2 ; id district_id;run;

Thanks, Duckhye

>>> "Sigurd Hermansen" <HERMANS1@WESTAT.com> 5/26/2006 12:47:43 PM >>> Duck-Hye: I'd take a close look first at the number of school districts that have zero abused/neglected kids who entered the system for the first time in 2005. A highly skewed distribution could make the locations of school districts irrelevant. In an extreme case, if all of the kids go to school in a single district, all boundary lines would likely go thru that district.

I gather that you are looking for an assignment model that will work well in the future as more cases arise. If you weight school districts by projected numbers of A/N kids, geographic clustering of the 527 districts would give you a starting point. At least you will be able to see how many clusters it takes to minimize distances among weighted school districts. Perhaps you could then ask David C for a step-wise method of increasing or decreasing the number of clusters ;> Happy Friday, Sig

-----Original Message----- From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu] On Behalf Of Duck-Hye Yang Sent: Friday, May 26, 2006 11:11 AM To: SAS-L@LISTSERV.VT.EDU Subject: grouping features based on position

Dear SAS-L,

Juvenile court wants each judge to get equal number of abused/neglected kids (cases) for fosterhome placement court hearing. Each judge is currently assigned a group of cases from a designated geographic area. The issue is that some judges have too many cases.

The task is to delineate boundaries of 13 geographical areas with equal number of cases (594 kids) who entered the system for the first time in 2005. The boundaries are supposed to be based on school districts. There are 527 school districts (polygon) and 594 kids.

The essence of solution should be 1) grouping school districts into 13 based on proximity but at the same time, 2) keeping approximately equal number of kids within each of the 13 groups.

I have information on kids' location and centroid points of school districts -- longitude/lattidute. Or I can arrange data in a way each district has the number of foster-care kids.

Another twist of the task is: Because the boundaries based on 2005 data may not be valid anymore 5 or 10 years from now, delineating boundaries may need to be modeded on some predictors (projected number of kids based on demographic/socioeconomic characteristics of school districts) so that adjustment be made each year.

Hope that someone will share his/her experience with me.

Thanks, Duckhye


Back to: Top of message | Previous page | Main SAS-L page