Date: Thu, 30 Jul 2009 15:38:49 -0700
Reply-To: "Nordlund, Dan (DSHS/RDA)" <NordlDJ@DSHS.WA.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Nordlund, Dan (DSHS/RDA)" <NordlDJ@DSHS.WA.GOV>
Subject: Re: SAS DateTime Format that sorts correctly is IS8601dt.
In-Reply-To: <200907302144.n6UH5Ko7001255@malibu.cc.uga.edu>
Content-Type: text/plain; charset=windows-1252
> -----Original Message-----
> From: Ya Huang [mailto:ya.huang@AMYLIN.COM]
> Sent: Thursday, July 30, 2009 2:45 PM
> To: SAS-L@LISTSERV.UGA.EDU; Nordlund, Dan (DSHS/RDA)
> Cc: Ya Huang
> Subject: Re: SAS DateTime Format that sorts correctly is IS8601dt.
>
> Dan,
>
> I totally understand the case you presented below.
> I guess my question now is "IF we only have one var with
> value of complete ISO datetime mixed with partial ISO datetime,
> do we always get the order AS IF the partial is filled
> with 0 or 0s?". For example, 2006 ordered first, because
> it is treated as if 2006-01-01T00:00:00.
>
> It seems to be the case for me, but I'm not sure. Maybe ISO
> format is designed to be consistent with the ASCII code
> rank order?
>
> Ya
>
>
> On Thu, 30 Jul 2009 14:24:37 -0700, Nordlund, Dan (DSHS/RDA)
> <NordlDJ@DSHS.WA.GOV> wrote:
>
> >> -----Original Message-----
> >> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> >> Huang, Ya
> >> Sent: Thursday, July 30, 2009 1:24 PM
> >> To: SAS-L@LISTSERV.UGA.EDU
> >> Subject: Re: SAS DateTime Format that sorts correctly is IS8601dt.
> >>
> >> Thanks to all who responded.
> >>
> >> My original question was more of a philosophical question. Though it was
> >>
> >> triggered by a question of sorting an ISO 8601 format date. My
> >> understanding
> >> to ISO 8601 format is that a datetime can be presented as a partial
> >> datetime:
> >>
> >> a complete date/time: 2006-11-04T10:29:23
> >> missing second: 2006-11-04T10:29
> >> missing all time: 2006-11-04
> >> missing month: 2006
> >>
> >> If the date is a complete ISO 8601 date, then I'm quite sure
> >> that sorting the string should get the same result as sorting the
> >> underline numeric. But when the partial date is involved, I'm not
> >> sure anymore. To make it even more complicated, imagine the string is
> >> not
> >> left aligned!
> >>
> >> data xx;
> >> length dt $50;
> >> dt='2006-11-04T10:29:00'; output;
> >> dt='2006-11-04T10:29:23'; output;
> >> dt='2006-11-04T10:29'; output;
> >> dt='2006-11-04T10'; output;
> >> dt='2006-11-04'; output;
> >> dt='2006-11'; output;
> >> dt='2006'; output;
> >> dt=' 2006'; output;
> >> dt=' 2006-11'; output;
> >> dt=' 2006-11-04'; output;
> >> dt=' 2006-11-04T10'; output;
> >> run;
> >>
> >> proc sort;
> >> by dt;
> >> run;
> >>
> >> proc print;
> >> run;
> >>
> >> Obs dt
> >>
> >> 1 2006-11
> >> 2 2006-11-04
> >> 3 2006
> >> 4 2006-11-04T10
> >> 5 2006
> >> 6 2006-11
> >> 7 2006-11-04
> >> 8 2006-11-04T10
> >> 9 2006-11-04T10:29
> >> 10 2006-11-04T10:29:00
> >> 11 2006-11-04T10:29:23
> >>
> >> It shows that if the string is left aligned, the sorting seems
> >> to make sense. I'm 99.9% sure the sorting is 'right'. But I can't
> >> be 100% sure it's always the case.
> >>
> >> Thanks
> >>
> >> Ya
> >>
> >>
> ><<<snip>>>
> >
> >If the string representation is only a partial representation of the
> underlying datetime value, there is no way to guarantee that sorting by the
> string representation will sort the same way as sorting on the full
> datetime value. And in fact one could easily end up with the incorrect
> sort order. Consider, the following two colums of data, a full datetime
> value and a partial string datetime representation.
> >
> >2006-12-05T00:00:00 2006
> >2006-11-04T10:29:23 2006-11-04
> >2006-11-04T10:00:00 2006-11-04T10:00:00
> >2006-10-01T00:00:00 2006-10-01T00:00:00
> >2007-11-04T10:29:23 2007
> >
> >If you sort by the partial values, the order will be incorrect for the
> full datetime values.
> >
> >Dan
> >
Well, I am not entirely sure of the situation that you are describing. It is not clear to me whether you are describing field that has been populated by inputting partial dates using a ISO8601 informat, or a character field that is a partial representation of an ISO8601 formatted value. If we are talking something like
data _null_;
dt=input('200604',B8601DT.);
put dt=datetime21.;
run;
dt=01APR2006:00:00:00
then SAS will fill in with the lowest possible values for the remainder of the parts, and the values will sort as you have suggested. If we are talking about char representations, then they will still sort in the same way, but I hesitate to say SAS "fills in with zeros". Character variables sort in character collating order from left to right (ASCII on Windows and Unix/Linux). If '2006' is compared to '20060406', they are equal through the first 4 characters, so the longer string collates (sorts) after the shorter string. But the effect is in fact similar to your description (if they are left justified as you previously noticed).
Hope this is helpful,
Dan
Daniel J. Nordlund
Washington State Department of Social and Health Services
Planning, Performance, and Accountability
Research and Data Analysis Division
Olympia, WA 98504-5204
|