Date: Fri, 11 Nov 2005 16:56:11 -0500
Reply-To: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject: Re: How to handle very large text strings in SAS ??
Content-Type: text/plain; charset="us-ascii"
Werner:
SAS will read a text file into different data structures. Why not
occasion strSeq
1 A
1 C
. .
. .
1 G
2 T
2 A
. .
. .
2 G
. .
. .
SQL grouping clauses handle counts of groups and other summary measures
reasonably well. SAS Data steps may work better than SAS SQL if you are
looking at sequences. In SQL, joins of patterns, e.g.
A
C
T
G
to the relation of occasion to strSeq will locate matches to the pattern
and work much the same as string comparison functions.
Reading text strings a character at a time into a very different data
structure, temporary arrays (SAS Data step,) will give you quick access
to substring sequences. Associative arrays, for example, work well with
many millions of bytes.
Sig
-----Original Message-----
From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu]
On Behalf Of Werner HELM
Sent: Friday, November 11, 2005 4:02 PM
To: SAS-L@LISTSERV.UGA.EDU
Cc: Werner HELM
Subject: How to handle very large text strings in SAS ??
Hi all :
We have to deal with one single text string per occasion, but this could
be 10 million (or more) characters long. It is a txt-file. How would you
bring it to SAS ?? Main purpose is the counting of certain substrings -
maybe with proc sql, with SAS or perl text string functions (in SAS).
I'd appreciate getting some good ideas.
Werner .