LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (November 2005, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 11 Nov 2005 16:56:11 -0500
Reply-To:     Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject:      Re: How to handle very large text strings in SAS ??
Comments: To: Werner HELM <Werner.E.Helm@GMX.DE>
Content-Type: text/plain; charset="us-ascii"

Werner: SAS will read a text file into different data structures. Why not

occasion strSeq 1 A 1 C . . . . 1 G 2 T 2 A . . . . 2 G . . . .

SQL grouping clauses handle counts of groups and other summary measures reasonably well. SAS Data steps may work better than SAS SQL if you are looking at sequences. In SQL, joins of patterns, e.g.


to the relation of occasion to strSeq will locate matches to the pattern and work much the same as string comparison functions.

Reading text strings a character at a time into a very different data structure, temporary arrays (SAS Data step,) will give you quick access to substring sequences. Associative arrays, for example, work well with many millions of bytes. Sig -----Original Message----- From: [] On Behalf Of Werner HELM Sent: Friday, November 11, 2005 4:02 PM To: SAS-L@LISTSERV.UGA.EDU Cc: Werner HELM Subject: How to handle very large text strings in SAS ??

Hi all :

We have to deal with one single text string per occasion, but this could be 10 million (or more) characters long. It is a txt-file. How would you bring it to SAS ?? Main purpose is the counting of certain substrings - maybe with proc sql, with SAS or perl text string functions (in SAS).

I'd appreciate getting some good ideas.

Werner .

Back to: Top of message | Previous page | Main SAS-L page