LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2006, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Tue, 28 Feb 2006 08:05:37 -0700
Reply-To:   Alan Churchill <SASL001@SAVIAN.NET>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Alan Churchill <SASL001@SAVIAN.NET>
Subject:   Re: Reading large and complex XML
Comments:   To: ben.powell@CLA.CO.UK
In-Reply-To:   <200602281443.k1SE4cYs007401@mailgw.cc.uga.edu>
Content-Type:   text/plain; charset="iso-8859-1"

Ben,

I don't think you can make the statement that a flat file and a SAS infile statement are always 10x faster. It depends on the sending and receiving ends and how well those are coded. Parsers vary a lot in speed as you know.

There are new technologies being discussed such as binary XML. I'm not sure where it will fall out but issues on SOAP speed won't be around forever. Too much is invested now in SOAs IMO.

On the Windows platform, look at Indigo (I forgot the official name) due to be released in Vista. I haven't played with it yet but it supposedly contains binary xml transports.

Alan

Alan Churchill Savian "Bridging SAS and Microsoft Technologies" www.savian.net

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of ben.powell@CLA.CO.UK Sent: Tuesday, February 28, 2006 7:43 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Reading large and complex XML

Sorry to drag this up but its relevent to what I'm looking at currently. This thread was going somewhere before it was diverted by some out-of-this-world profiteering.

In short - XML data transfer: esp. inter-operable messaging protocol around SOAP, which used to be Simple Object Access Protocol but is now Service Oriented Architectural Planning. Meant for Web Services.

Problem: increased processing time over "binary" formats. This is discussed quite usefully in wiki (!):

http://en.wikipedia.org/wiki/SOAP "Because of the lengthy XML format, SOAP is considerably slower than competing middleware technologies such as CORBA. Typically, SOAP is about 10 times slower than binary network protocols such as RMI or IIOP. Of course, this is not an issue when only small messages are sent."

As such any slow performance is not so much a SAS XML Engine limitation rather a limitation of the format itself. While it is possible to reduce bandwidth use of large XML messages by compression, set against the additional CPU overhead at either end, this does nothing to address the processing overhead required to parse the message, unless as (Sig?) suggested the formatting can at least partially be seperated from the data.

IOW a flat file and a SAS infile statement will always be (10x +) faster. The requirement for repeated large XML loads is arguably bad system architecture. Rather those loads should only be in the form of an update or query. Obviously as time goes on the capacity for XML transfers in conventional hardware will increase with processing / I/O...

HTH.

On Tue, 17 Jan 2006 09:41:35 -0500, Sigurd Hermansen <HERMANS1@WESTAT.COM> wrote:

>Alan: >Actually the schema comprises the whole of a database's metadata. Good so far ... The catch comes in where XML packages data element between tags. The schema predetermines the header and attributes of each data table. A table name links these metadata to columns and rows of data values. A table-name tag and end tag can mark the beginning and end of a table of delimited data values. > >In a rough sketch, > ....... ><Schema> > <Meta-Data-Table> > ...... > <End-Meta-Data-Table> > <Meta-Data-Table> > ...... > <End-Meta-Data-Table> ><End-Schema> ><Data-Table> > ...... ><End-Data-Table> > . > . ><Data-table> > ..... ><End-Data-Table> > >I would argue that transports or replications of very large databases would work better were it possible to append url's for each Data-Table to a Schema. Individual data tables will likely compress to a small fraction of full size and can be zcat'd through a RDBMS's bulk loader. > >An XML extension along these lines would take advantage of the separation of scheme and data and tabular representation of both. Domain and constraint tables in a Schema serve as a basis for validating contents of Data-Table's and triggering exceptions. >Sig > >________________________________ > >From: owner-sas-l@listserv.uga.edu on behalf of Alan Churchill >Sent: Mon 1/16/2006 8:41 PM >To: 'Sigurd Hermansen'; SAS-L@LISTSERV.UGA.EDU >Subject: RE: Reading large and complex XML > > > >Sig, > >What about the XML streams already containing a schema embedded at the top? >This is good XML practice and should already be there. A good XML parser >will be able to read in the schema and then do a forward read of the XML >parsing appropriately and breaking it into tables. > >The RDBMS's are starting to accommodate XML in and out of relational >structures. > >Alan > >Alan Churchill >Savian "Bridging SAS and Microsoft Technologies" >www.savian.net > > >-----Original Message----- >From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Sigurd >Hermansen >Sent: Monday, January 16, 2006 9:50 AM >To: SAS-L@LISTSERV.UGA.EDU >Subject: Re: Reading large and complex XML > >Jorgen: >A couple of years ago we attempted to load GB-sized XML files into SAS and >found the process unacceptably slow. Since then we opt for forward >engineering a data model (basically capturing SQL CREATE statements from >data modelling tools) and streaming data into SAS datasets. The same method >also works well when using RDBMS bulk load methods to transfer data into an >RDBMS. > >XML standards accommodate transfers of very complex data structures. >Relational database table structures have very simple data structures. >Repetitive tags and extra parsing really drags down performance of the SAS >XML engine. > >Perhaps those exporting data to your database could transfer metadata tables >in XML and provide the actual data as compressed 'flat files'. The Unix zcat >command in a SAS filename pipe streams data into SAS datasets very quickly >and efficiently. > >The current XML standards seem almost hostile to the idea of a relational >database. To me that seems shortsighted. While encapsulating databases in >one text streams sounds like a good idea, why would it not make equal sense >to encapsulate metadata in a header stream and support bulk loads into >related tables from separate data streams? > >Sig > >-----Original Message----- >From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu] On >Behalf Of Jørgen Mangor Iversen >Sent: Monday, January 16, 2006 8:13 AM >To: sas-l@uga.edu >Subject: Reading large and complex XML > > >Hi guys > >What do you do when you have to get a large (11BG) XML file into SAS? The >example in mind has a XMLMAP prepared with the SAS XML MAPPER tool, defining >14 tables with up to 33 million rows. The box is a powerfull HP-UX, SAS is >version 9.1.3 and the XML engine is hopeless! It proccesses each defined >table one at the time, at 7 hours a piece! This leads to a running time of >more then 4 days. This job has to be done daily, as you might imagine. > >Am I the only one who have ever come across such a problem? Is there a >debate somewhere or another forum where SAS vs. XML is discussed? > >Cheers, >Jørgen


Back to: Top of message | Previous page | Main SAS-L page