Enhancing parsing performance 2003-01-13 - By Jean Georges PERRIN
Hi,
Thanks for the hope message!
I was timing the whole method, I focused on parser creation and parse time now.
I changed my code to: public void load () { DOMParser parser; Logger log = ThinStructureConfiguration.getInstance().getLogger();
try { long start = System.currentTimeMillis(); parser = new DOMParser(); long stop = System.currentTimeMillis(); log.finest ("Creating parser took " + (stop - start) + " ms"); } catch (Exception e) { log.severe ("Error: Unable to instantiate parser"); return; }
try { long start = System.currentTimeMillis(); parser.parse(m_file.toURI().toString()); long stop = System.currentTimeMillis(); log.finest ("Parsing of " + m_file.getName() + " took " + (stop - start) + " ms"); m_document = parser.getDocument(); } catch (SAXParseException e) { // ignore } catch (Exception e) { String msg; msg = ("Error: Parse error occurred, " + e.getMessage()); if (e instanceof SAXException) { e = ((SAXException)e).getException(); } msg += '\n' + e.toString(); log.severe (msg); } }
Results are: Jan 13, 2003 11:52:20 PM com.awoma.ts.ui.impl.XHTML11Window load FINEST: Creating parser took 251 ms Jan 13, 2003 11:52:25 PM com.awoma.ts.ui.impl.XHTML11Window load FINEST: Parsing of emailpassword.xhtml took 5227 ms Jan 13, 2003 11:52:25 PM com.awoma.ts.ui.Store add INFO: Window definition emailpassword.xhtml added. Jan 13, 2003 11:52:25 PM com.awoma.ts.ui.impl.XHTML11Window load FINEST: Creating parser took 10 ms Jan 13, 2003 11:52:29 PM com.awoma.ts.ui.impl.XHTML11Window load FINEST: Parsing of emailpassword2.xhtml took 3085 ms Jan 13, 2003 11:52:29 PM com.awoma.ts.ui.Store add INFO: Window definition emailpassword2.xhtml added. Jan 13, 2003 11:52:29 PM com.awoma.ts.ui.impl.XHTML11Window load FINEST: Creating parser took 0 ms Jan 13, 2003 11:52:29 PM com.awoma.ts.ui.impl.XHTML11Window load FINEST: Parsing of emailpassword3.xhtml took 10 ms Jan 13, 2003 11:52:29 PM com.awoma.ts.ui.Store add INFO: Window definition emailpassword3.xhtml added. Jan 13, 2003 11:52:29 PM com.awoma.ts.ui.impl.XHTML11Window load FINEST: Creating parser took 0 ms Jan 13, 2003 11:52:31 PM com.awoma.ts.ui.impl.XHTML11Window load FINEST: Parsing of emailpassword4.xhtml took 2774 ms
All files are identical, except #3 where I removed all references to the external world.
I use Xerces J 2.2.1 (according to build.xml).
Conclusions & questions: 1/ Creation of DOMParser() is slow the first time, but ridiculous afterwards, so there is no need for enhancing that much. 2/ My parser seems to want to check the validity through external connection. How can I remove those without modifying all my files?
jgp
> -----Original Message----- > From: Simon Kitching [mailto:simon@(protected)] > Sent: Monday, January 13, 2003 23:24 > To: jgp@(protected) > Cc: xerces-j-user@(protected) > Subject: Re: Enhancing parsing performance > > Hi Jean Georges, > > Firstly, does the document you are parsing contain a DTD or schema > reference? If it uses http://acme.com/xyz.dtd, then much of your parsing > time may actually be in retrieval of the remote dtd. And if the > dtd/schema is large then time will be spent processing it. If this is > the case, there are optimisations available for both these problems. > > Secondly, you don't say exactly what you are timing. Is it the complete > application time, or the time taken by the method you include below, or > just the time for the parse method? > > Thirdly, you don't mention which version of Xerces you are using... > > Providing information on the above would allow people to provide better > suggestions for you.. > > I certainly see better performance than you do, so there is hope :-) > > Regards, > > Simon > > On Tue, 2003-01-14 at 10:56, Jean Georges PERRIN wrote: > > Hi, > > > > Thanks for those who helped me with cloning... > > > > I am a little surprised with performance. Maybe there are some basic > things > > I am doing wrong. > > > > I am parsing a 3 Kb XHTML file and it takes me about 4s, cloning the > tree > > takes me roughly a ridiculous amount of time (10ms). This on an Athlon > XP > > 1800+ running XP (sure I could switch to Linux but it is not planned for > now > > :) ). > > > > My code for parsing: > > protected void load () { > > DOMParser parser; > > > > try { > > parser = new DOMParser(); > > } > > catch (Exception e) { > > log.severe ("Error: Unable to instantiate parser"); > > return; > > } > > > > try { > > parser.parse(m_file.toURI().toString()); > > m_document = parser.getDocument(); > > } > > catch (SAXParseException e) { > > // ignore > > } > > catch (Exception e) { > > String msg; > > msg = ("Error: Parse error occurred, " + e.getMessage()); > > if (e instanceof SAXException) { > > e = ((SAXException)e).getException(); > > } > > msg += '\n' + e.toString(); > > log.severe (msg); > > } > > } > > > > Questions: > > 1/ is static'ing my parser will enhance the process? > > 2/ can I "pre" create some objects I can reuse? > > 3/ are there some eventual verification I can turn off? > > > > My code for cloning: > > public Object clone() { > > XHTML11Window win = new XHTML11Window(m_file); > > win.m_document = new DocumentImpl(); > > win.m_document.importNode(m_document.getDocumentElement(), true); > > > > return win; > > } > > > > I haven't checked that they really were cloned, but it looks as if they > > were... > > > > Any tips are more than welcome! > > > > Jean Georges PERRIN > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: xerces-j-user-unsubscribe@(protected) > > For additional commands, e-mail: xerces-j-user-help@(protected) > > > >
--------------------------------------------------------------------- To unsubscribe, e-mail: xerces-j-user-unsubscribe@(protected) For additional commands, e-mail: xerces-j-user-help@(protected)
|
|