Subjects
Home
VOTE Move XML Commons to Xerces
Commented: (XERCESJ 589) Bug with pattern restriction on long strings
: Xerces J 2 8 1 Release on Wednesday, September 13th
: Xerces J 2 9 0 Release on Wednesday, November 22nd
Commented: (XERCESJ 1066) Restriction+choice+substitutionGroup error
Commented: (XERCESJ 1178) Error getting prefix for an attribute with no n
Updated: (XERCESJ 1244) XMLSchemaValidator does not contribute element 's
Some consideration about the xerces DOM implementation
Updated: (XERCESJ 1066) Restriction+choice+substitutionGroup error
Commented: (XERCESJ 1227) Poor performance / OutOfMemoryError for sequenc
retain exception stack traces
Updated: (XERCESJ 1193) NPE or hang when parsing using the "continue afte
Future of NekoHTML
Commented: (XERCESJ 1203) NPE in XMLDTDProcessor
DOM Level 3 APIs for Xalan J and a new Xalan release (2 7 1)
: xml commons external 1 3 04 Release on Wednesday, November 22nd
Commented: (XERCESJ 1247) Incorrect location information on SAX when usin
XInclude exceptions how to mirror Xerces J functionality into Xerces C++?
First proposal on SoC project "Add support for the StAX (JSR 173) cursor API
: xml commons resolver 1 2 Release on Wednesday, November 22nd
Typo in RangeToken java Please check
Validator features
java lang ClassCastException when adopting Node
using the org apache xerces impl xs identity package
Updated: (XERCESJ 1257) buffer overflow in UTF8Reader for characters out
Problem with ref attributes and schema validation
Updated: (XERCESJ 122) XMLSchemaValidator does not contribute element 's d
Performance problem under load Xerces with Weblogic 9 x
remove ignored memory allocation
Commented: (XERCESJ 1177) SAXXMLStreamReader doesn 't always report namesp
Commented: (XERCESJ 977) Null pointer exception during DOM parsing
Commented: (XERCESJ 1197) Code cleanup for org apache xml serialize
Commented: (XERCESJ 1201) Initial contribution for StAX Event API
Updated: (XERCESJ 1061) Regex "$ " and "^ " characters treated as special c
Commented: (XERCESJ 1199) SAXXMLStreamReader should attempt to register a
Commented: (XERCESJ 1061) Regex "$ " and "^ " characters treated as special
Updated: (XERCESJ 589) Bug with pattern restriction on long strings
StackOverflow
xerces Range unnecessarily not garbage collectable if not detached
Updated: (XERCESJ 1178) Error getting prefix for an attribute with no nam
Bug in xs:redefine
Commented: (XERCESJ 1204) Can not set XMLEntityResolver for LSParser
Updated: (XERCESJ 1253) Prototype for SoC2007 project "Add support for th
Updated: (XERCESJ 1259) Add SteamFilter Function to SoC2007 project "Add
Assigned: (XERCESJ 444) SAXException thrown by EntityResolver is reported
Google Summer of Code 2007
Xerces J and XInclude relative path issue
Assigned: (XERCESJ 206) Stack overflow when using a schema validation
Commented: (XERCESJ 1215) Restrictions involving two levels of substituti
Closed: (XERCESJ 1203) NPE in XMLDTDProcessor
non overriding equals methoda
Resolved: (XERCESJ 1079) invalid value returned for TOTALDIGITS facet in
Xerces AS3 port
Updated: (XERCESJ 325) Regular Expression; Pattern "| " clause order de
Updated: (XERCESJ 1196) Javadoc generation fails on Java SE 5 0
Closed: (XERCESJ 1202) DTD validation on XIncluded documents when the sch
Created: (XERCESJ 1124) Nonspecific schema error message
a bug in xerces
Updated: (XERCESJ 1201) Initial contribution for StAX Event API
Closed: (XERCESJ 1254) Empty uris in targetNamespace attribute not report
Links
Home
Oracle database error code
 
Search:  
Power your search with and, or, +, -, or "some phrase" operators.
Enhancing parsing performance

Enhancing parsing performance

2003-01-13       - By Jean Georges PERRIN
Reply:     1     2     3     4     5     6     7     8     9     10  

Hi,

Thanks for the hope message!

I was timing the whole method, I focused on parser creation and parse time
now.

I changed my code to:
 public void load () {
   DOMParser parser;
   Logger log = ThinStructureConfiguration.getInstance().getLogger();

   try {
     long start = System.currentTimeMillis();
     parser = new DOMParser();
     long stop = System.currentTimeMillis();
     log.finest ("Creating parser took " + (stop - start) + " ms");
   }
   catch (Exception e) {
     log.severe ("Error: Unable to instantiate parser");
     return;
   }

   try {
     long start = System.currentTimeMillis();
     parser.parse(m_file.toURI().toString());
     long stop = System.currentTimeMillis();
     log.finest ("Parsing of " + m_file.getName() + " took " + (stop -
start) + " ms");
     m_document = parser.getDocument();
   }
   catch (SAXParseException e) {
     // ignore
   }
   catch (Exception e) {
     String msg;
     msg = ("Error: Parse error occurred, " + e.getMessage());
     if (e instanceof SAXException) {
       e = ((SAXException)e).getException();
     }
     msg += '\n' + e.toString();
     log.severe (msg);
   }
 }

Results are:
Jan 13, 2003 11:52:20 PM com.awoma.ts.ui.impl.XHTML11Window load
FINEST: Creating parser took 251 ms
Jan 13, 2003 11:52:25 PM com.awoma.ts.ui.impl.XHTML11Window load
FINEST: Parsing of emailpassword.xhtml took 5227 ms
Jan 13, 2003 11:52:25 PM com.awoma.ts.ui.Store add
INFO: Window definition emailpassword.xhtml added.
Jan 13, 2003 11:52:25 PM com.awoma.ts.ui.impl.XHTML11Window load
FINEST: Creating parser took 10 ms
Jan 13, 2003 11:52:29 PM com.awoma.ts.ui.impl.XHTML11Window load
FINEST: Parsing of emailpassword2.xhtml took 3085 ms
Jan 13, 2003 11:52:29 PM com.awoma.ts.ui.Store add
INFO: Window definition emailpassword2.xhtml added.
Jan 13, 2003 11:52:29 PM com.awoma.ts.ui.impl.XHTML11Window load
FINEST: Creating parser took 0 ms
Jan 13, 2003 11:52:29 PM com.awoma.ts.ui.impl.XHTML11Window load
FINEST: Parsing of emailpassword3.xhtml took 10 ms
Jan 13, 2003 11:52:29 PM com.awoma.ts.ui.Store add
INFO: Window definition emailpassword3.xhtml added.
Jan 13, 2003 11:52:29 PM com.awoma.ts.ui.impl.XHTML11Window load
FINEST: Creating parser took 0 ms
Jan 13, 2003 11:52:31 PM com.awoma.ts.ui.impl.XHTML11Window load
FINEST: Parsing of emailpassword4.xhtml took 2774 ms

All files are identical, except #3 where I removed all references to the
external world.

I use Xerces J 2.2.1 (according to build.xml).

Conclusions & questions:
1/ Creation of DOMParser() is slow the first time, but ridiculous
afterwards, so there is no need for enhancing that much.
2/ My parser seems to want to check the validity through external
connection. How can I remove those without modifying all my files?

jgp

> -----Original Message-----
> From: Simon Kitching [mailto:simon@(protected)]
> Sent: Monday, January 13, 2003 23:24
> To: jgp@(protected)
> Cc: xerces-j-user@(protected)
> Subject: Re: Enhancing parsing performance
>
> Hi Jean Georges,
>
> Firstly, does the document you are parsing contain a DTD or schema
> reference? If it uses http://acme.com/xyz.dtd, then much of your parsing
> time may actually be in retrieval of the remote dtd. And if the
> dtd/schema is large then time will be spent processing it. If this is
> the case, there are optimisations available for both these problems.
>
> Secondly, you don't say exactly what you are timing. Is it the complete
> application time, or the time taken by the method you include below, or
> just the time for the parse method?
>
> Thirdly, you don't mention which version of Xerces you are using...
>
> Providing information on the above would allow people to provide better
> suggestions for you..
>
> I certainly see better performance than you do, so there is hope :-)
>
> Regards,
>
> Simon
>
> On Tue, 2003-01-14 at 10:56, Jean Georges PERRIN wrote:
> > Hi,
> >
> > Thanks for those who helped me with cloning...
> >
> > I am a little surprised with performance. Maybe there are some basic
> things
> > I am doing wrong.
> >
> > I am parsing a 3 Kb XHTML file and it takes me about 4s, cloning the
> tree
> > takes me roughly a ridiculous amount of time (10ms). This on an Athlon
> XP
> > 1800+ running XP (sure I could switch to Linux but it is not planned for
> now
> > :) ).
> >
> > My code for parsing:
> >   protected void load () {
> >     DOMParser parser;
> >
> >     try {
> >       parser = new DOMParser();
> >     }
> >     catch (Exception e) {
> >       log.severe ("Error: Unable to instantiate parser");
> >       return;
> >     }
> >
> >     try {
> >       parser.parse(m_file.toURI().toString());
> >       m_document = parser.getDocument();
> >     }
> >     catch (SAXParseException e) {
> >       // ignore
> >     }
> >     catch (Exception e) {
> >       String msg;
> >       msg = ("Error: Parse error occurred, " + e.getMessage());
> >       if (e instanceof SAXException) {
> >         e = ((SAXException)e).getException();
> >       }
> >       msg += '\n' + e.toString();
> >       log.severe (msg);
> >     }
> >   }
> >
> > Questions:
> > 1/ is static'ing my parser will enhance the process?
> > 2/ can I "pre" create some objects I can reuse?
> > 3/ are there some eventual verification I can turn off?
> >
> > My code for cloning:
> >   public Object clone() {
> >     XHTML11Window win = new XHTML11Window(m_file);
> >     win.m_document = new DocumentImpl();
> >     win.m_document.importNode(m_document.getDocumentElement(), true);
> >
> >     return win;
> >   }
> >
> > I haven't checked that they really were cloned, but it looks as if they
> > were...
> >
> > Any tips are more than welcome!
> >
> > Jean Georges PERRIN
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: xerces-j-user-unsubscribe@(protected)
> > For additional commands, e-mail: xerces-j-user-help@(protected)
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@(protected)
For additional commands, e-mail: xerces-j-user-help@(protected)