Enhancing parsing performance 2003-01-13 - By Brian Madigan
Turn validation off! DOMParser parser = new DOMParser( ); parser.setFeature ("http://xml.org/sax/features/validation",
false); or something to that effect. If I am not mistaken, that should stop any dtd validation from happening.
--- Jean Georges PERRIN <jgp@(protected)> wrote: > Hi, > > Thanks for the hope message! > > I was timing the whole method, I focused on parser > creation and parse time > now. > > I changed my code to: > public void load () { > DOMParser parser; > Logger log = > ThinStructureConfiguration.getInstance().getLogger(); > > try { > long start = System.currentTimeMillis(); > parser = new DOMParser(); > long stop = System.currentTimeMillis(); > log.finest ("Creating parser took " + (stop - > start) + " ms"); > } > catch (Exception e) { > log.severe ("Error: Unable to instantiate > parser"); > return; > } > > try { > long start = System.currentTimeMillis(); > parser.parse(m_file.toURI().toString()); > long stop = System.currentTimeMillis(); > log.finest ("Parsing of " + m_file.getName() + > " took " + (stop - > start) + " ms"); > m_document = parser.getDocument(); > } > catch (SAXParseException e) { > // ignore > } > catch (Exception e) { > String msg; > msg = ("Error: Parse error occurred, " + > e.getMessage()); > if (e instanceof SAXException) { > e = ((SAXException)e).getException(); > } > msg += '\n' + e.toString(); > log.severe (msg); > } > } > > Results are: > Jan 13, 2003 11:52:20 PM > com.awoma.ts.ui.impl.XHTML11Window load > FINEST: Creating parser took 251 ms > Jan 13, 2003 11:52:25 PM > com.awoma.ts.ui.impl.XHTML11Window load > FINEST: Parsing of emailpassword.xhtml took 5227 ms > Jan 13, 2003 11:52:25 PM com.awoma.ts.ui.Store add > INFO: Window definition emailpassword.xhtml added. > Jan 13, 2003 11:52:25 PM > com.awoma.ts.ui.impl.XHTML11Window load > FINEST: Creating parser took 10 ms > Jan 13, 2003 11:52:29 PM > com.awoma.ts.ui.impl.XHTML11Window load > FINEST: Parsing of emailpassword2.xhtml took 3085 ms > Jan 13, 2003 11:52:29 PM com.awoma.ts.ui.Store add > INFO: Window definition emailpassword2.xhtml added. > Jan 13, 2003 11:52:29 PM > com.awoma.ts.ui.impl.XHTML11Window load > FINEST: Creating parser took 0 ms > Jan 13, 2003 11:52:29 PM > com.awoma.ts.ui.impl.XHTML11Window load > FINEST: Parsing of emailpassword3.xhtml took 10 ms > Jan 13, 2003 11:52:29 PM com.awoma.ts.ui.Store add > INFO: Window definition emailpassword3.xhtml added. > Jan 13, 2003 11:52:29 PM > com.awoma.ts.ui.impl.XHTML11Window load > FINEST: Creating parser took 0 ms > Jan 13, 2003 11:52:31 PM > com.awoma.ts.ui.impl.XHTML11Window load > FINEST: Parsing of emailpassword4.xhtml took 2774 ms > > All files are identical, except #3 where I removed > all references to the > external world. > > I use Xerces J 2.2.1 (according to build.xml). > > Conclusions & questions: > 1/ Creation of DOMParser() is slow the first time, > but ridiculous > afterwards, so there is no need for enhancing that > much. > 2/ My parser seems to want to check the validity > through external > connection. How can I remove those without modifying > all my files? > > jgp > > > -----Original Message----- > > From: Simon Kitching > [mailto:simon@(protected)] > > Sent: Monday, January 13, 2003 23:24 > > To: jgp@(protected) > > Cc: xerces-j-user@(protected) > > Subject: Re: Enhancing parsing performance > > > > Hi Jean Georges, > > > > Firstly, does the document you are parsing contain > a DTD or schema > > reference? If it uses http://acme.com/xyz.dtd, > then much of your parsing > > time may actually be in retrieval of the remote > dtd. And if the > > dtd/schema is large then time will be spent > processing it. If this is > > the case, there are optimisations available for > both these problems. > > > > Secondly, you don't say exactly what you are > timing. Is it the complete > > application time, or the time taken by the method > you include below, or > > just the time for the parse method? > > > > Thirdly, you don't mention which version of Xerces > you are using... > > > > Providing information on the above would allow > people to provide better > > suggestions for you.. > > > > I certainly see better performance than you do, so > there is hope :-) > > > > Regards, > > > > Simon > > > > On Tue, 2003-01-14 at 10:56, Jean Georges PERRIN > wrote: > > > Hi, > > > > > > Thanks for those who helped me with cloning... > > > > > > I am a little surprised with performance. Maybe > there are some basic > > things > > > I am doing wrong. > > > > > > I am parsing a 3 Kb XHTML file and it takes me > about 4s, cloning the > > tree > > > takes me roughly a ridiculous amount of time > (10ms). This on an Athlon > > XP > > > 1800+ running XP (sure I could switch to Linux > but it is not planned for > > now > > > :) ). > > > > > > My code for parsing: > > > protected void load () { > > > DOMParser parser; > > > > > > try { > > > parser = new DOMParser(); > > > } > > > catch (Exception e) { > > > log.severe ("Error: Unable to instantiate > parser"); > > > return; > > > } > > > > > > try { > > > parser.parse(m_file.toURI().toString()); > > > m_document = parser.getDocument(); > > > } > > > catch (SAXParseException e) { > > > // ignore > > > } > > > catch (Exception e) { > > > String msg; > > > msg = ("Error: Parse error occurred, " + > e.getMessage()); > > > if (e instanceof SAXException) { > > > e = ((SAXException)e).getException(); > > > } > > > msg += '\n' + e.toString(); > > > log.severe (msg); > > > } > > > } > > > > > > Questions: > > > 1/ is static'ing my parser will enhance the > process? > > > 2/ can I "pre" create some objects I can reuse? > > > 3/ are there some eventual verification I can > turn === message truncated ===
__________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com
--------------------------------------------------------------------- To unsubscribe, e-mail: xerces-j-user-unsubscribe@(protected) For additional commands, e-mail: xerces-j-user-help@(protected)
|
|