Date: Sun, 14 Sep 2003 00:37:29 -0400
Reply-To: "Richard A. DeVenezia" <radevenz@IX.NETCOM.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Richard A. DeVenezia" <radevenz@IX.NETCOM.COM>
Subject: Re: Need code to analyze web logs from apache server
"bikerider7" <bay_bridge_tgv@yahoo.com> wrote in message
news:d35d6005.0309112000.38dcc3ac@posting.google.com...
> Hi SAS-Lers
>
> Does anyone have code to read web hit counters (weblogs) from an
> Apache server? The consultants are running some reports with Unix
> freeware, but I want to do some more in SAS.
>
> I would be very appreciative if anyone had some pre-written code for
> reading in the weblogs. I could not find any on the SAS help web
> page.
>
> thanks.
Here is one approach you can try.
rxchange transmutes the _infile_ buffer so that it can be read with a simple
input statement.
tweak as needed for lengths and types.
might also take a look at http://www.devenezia.com/php/http-log/
The SAS regexp is essentially a translation of this perl regex
( $l_clientAddress
, $l_rfc1413
, $l_username
, $l_localTime
, $l_httpRequest
, $l_statusCode
, $l_bytesSentToClient
, $l_referer
, $l_clientSoftware
)
= /^(\S+) (\S+) (\S+) \[(.+)\] \"(.+)\" (\S+) (\S+) \"(.*)\" \"(.*)\"/o;
since SAS regex can't return an array of found substrings, we use rxchange
to make the data inputable.
options ls = 72;
data log;
infile 'c:\temp\access_log' dsd;
if _n_ = 1 then do;
retain rx;
rx = rxparse
(
" $p "
|| " addr=<^w+> "
|| " $w+ rfc=<^w+> "
|| " $w+ user=<^w+> "
|| " $w+ '[' time=<^']'+> ']' "
|| " $w+ '""' req=<~'""'+> '""' "
|| " $w+ code=<^w+> "
|| " $w+ bytes=<^w+> "
|| " $w+ '""' ref=<~'""'+> '""' "
|| " $w+ '""' client=<~'""'+> '""' "
|| " to"
|| " '""'=addr'"", ' "
|| " '""'=rfc '"", ' "
|| " '""'=user'"", ' "
|| " '""'=time'"", ' "
|| " '""'=req '"", ' "
|| " '""'=code '"", ' "
|| " '""'=bytes '"", ' "
|| " '""'=ref '"", ' "
|| " '""'=client '"" ' "
);
end;
input @;
call rxchange (rx,1,_infile_,_infile_);
length addr rfc user time req code bytes ref client $50;
input addr--client;
keep addr--client;
run;
--
Richard A. DeVenezia
http://www.devenezia.com
|