Subjects
Home
VOTE Move XML Commons to Xerces
Commented: (XERCESJ 589) Bug with pattern restriction on long strings
: Xerces J 2 8 1 Release on Wednesday, September 13th
: Xerces J 2 9 0 Release on Wednesday, November 22nd
Commented: (XERCESJ 1066) Restriction+choice+substitutionGroup error
Commented: (XERCESJ 1178) Error getting prefix for an attribute with no n
Updated: (XERCESJ 1244) XMLSchemaValidator does not contribute element 's
Some consideration about the xerces DOM implementation
Updated: (XERCESJ 1066) Restriction+choice+substitutionGroup error
Commented: (XERCESJ 1227) Poor performance / OutOfMemoryError for sequenc
retain exception stack traces
Updated: (XERCESJ 1193) NPE or hang when parsing using the "continue afte
Future of NekoHTML
Commented: (XERCESJ 1203) NPE in XMLDTDProcessor
DOM Level 3 APIs for Xalan J and a new Xalan release (2 7 1)
: xml commons external 1 3 04 Release on Wednesday, November 22nd
Commented: (XERCESJ 1247) Incorrect location information on SAX when usin
XInclude exceptions how to mirror Xerces J functionality into Xerces C++?
First proposal on SoC project "Add support for the StAX (JSR 173) cursor API
: xml commons resolver 1 2 Release on Wednesday, November 22nd
Typo in RangeToken java Please check
Validator features
java lang ClassCastException when adopting Node
using the org apache xerces impl xs identity package
Updated: (XERCESJ 1257) buffer overflow in UTF8Reader for characters out
Problem with ref attributes and schema validation
Updated: (XERCESJ 122) XMLSchemaValidator does not contribute element 's d
Performance problem under load Xerces with Weblogic 9 x
remove ignored memory allocation
Commented: (XERCESJ 1177) SAXXMLStreamReader doesn 't always report namesp
Commented: (XERCESJ 977) Null pointer exception during DOM parsing
Commented: (XERCESJ 1197) Code cleanup for org apache xml serialize
Commented: (XERCESJ 1201) Initial contribution for StAX Event API
Updated: (XERCESJ 1061) Regex "$ " and "^ " characters treated as special c
Commented: (XERCESJ 1199) SAXXMLStreamReader should attempt to register a
Commented: (XERCESJ 1061) Regex "$ " and "^ " characters treated as special
Updated: (XERCESJ 589) Bug with pattern restriction on long strings
StackOverflow
xerces Range unnecessarily not garbage collectable if not detached
Updated: (XERCESJ 1178) Error getting prefix for an attribute with no nam
Bug in xs:redefine
Commented: (XERCESJ 1204) Can not set XMLEntityResolver for LSParser
Updated: (XERCESJ 1253) Prototype for SoC2007 project "Add support for th
Updated: (XERCESJ 1259) Add SteamFilter Function to SoC2007 project "Add
Assigned: (XERCESJ 444) SAXException thrown by EntityResolver is reported
Google Summer of Code 2007
Xerces J and XInclude relative path issue
Assigned: (XERCESJ 206) Stack overflow when using a schema validation
Commented: (XERCESJ 1215) Restrictions involving two levels of substituti
Closed: (XERCESJ 1203) NPE in XMLDTDProcessor
non overriding equals methoda
Resolved: (XERCESJ 1079) invalid value returned for TOTALDIGITS facet in
Xerces AS3 port
Updated: (XERCESJ 325) Regular Expression; Pattern "| " clause order de
Updated: (XERCESJ 1196) Javadoc generation fails on Java SE 5 0
Closed: (XERCESJ 1202) DTD validation on XIncluded documents when the sch
Created: (XERCESJ 1124) Nonspecific schema error message
a bug in xerces
Updated: (XERCESJ 1201) Initial contribution for StAX Event API
Closed: (XERCESJ 1254) Empty uris in targetNamespace attribute not report
Links
Home
Oracle database error code
 
Search:  
Power your search with and, or, +, -, or "some phrase" operators.
Enhancing parsing performance

Enhancing parsing performance

2003-01-14       - By Jean Georges PERRIN
Reply:     1     2     3     4     5     6     7     8     9     10  

Thanks Simon,

Do you have a code fragment that would illustrates your (b) approach?

jgp

> -----Original Message-----
> From: Simon Kitching [mailto:simon@(protected)]
> Sent: Tuesday, January 14, 2003 02:03
> To: xerces-j-user@(protected)
> Cc: jgp@(protected)
> Subject: RE: Enhancing parsing performance
>
> Hi,
>
> > Turn validation off!
>
> Unfortunately, turning validation off won't speed things up very much.
>
> Essentially, disabling validation only *suppresses* error messages about
> invalid input. The DTD or schema is still processed because it can
> contain things like default attribute values or entity definitions which
> should be applied even when validation is disabled.
>
> If you are really sure that the DTD or schema doesn't contain any data
> that will affect the xml document being parsed, then you can either:
> (a) use a feature like:
> http://xml.apache.org/xerces2-j/features.html#external-parameter-entities
> or
> http://xml.apache.org/xerces2-j/features.html#nonvalidating.load-dtd-
> grammar
> or
> (b) register a custom EntityResolver object with the parser, which
> returns an empty DTD or schema when asked for the external one
>
> I'm not sure which of the features listed in (a) above is the one you
> want. I use approach (b) currently.
>
> If you do need to process the DTD, then you can use a custom
> EntityResolver to look up a locally cached version. I don't know if
> there are already implementations for catalog lookup, etc. or if you
> will have to roll your own.
>
> Hope this helps,
>
> Simon
>
> On Tue, 2003-01-14 at 12:18, Brian Madigan wrote:
> > DOMParser parser = new DOMParser( );
> > parser.setFeature
> >             ("http://xml.org/sax/features/validation",
> >
> >             false);
> > or something to that effect. If I am not mistaken,
> > that should stop any dtd validation from happening.
> >
> > --- Jean Georges PERRIN <jgp@(protected)> wrote:
> > > Hi,
> > >
> > > Thanks for the hope message!
> > >
> > > I was timing the whole method, I focused on parser
> > > creation and parse time
> > > now.
> > >
> > > I changed my code to:
> > >   public void load () {
> > >     DOMParser parser;
> > >     Logger log =
> > >
> > ThinStructureConfiguration.getInstance().getLogger();
> > >
> > >     try {
> > >       long start = System.currentTimeMillis();
> > >       parser = new DOMParser();
> > >       long stop = System.currentTimeMillis();
> > >       log.finest ("Creating parser took " + (stop -
> > > start) + " ms");
> > >     }
> > >     catch (Exception e) {
> > >       log.severe ("Error: Unable to instantiate
> > > parser");
> > >       return;
> > >     }
> > >
> > >     try {
> > >       long start = System.currentTimeMillis();
> > >       parser.parse(m_file.toURI().toString());
> > >       long stop = System.currentTimeMillis();
> > >       log.finest ("Parsing of " + m_file.getName() +
> > > " took " + (stop -
> > > start) + " ms");
> > >       m_document = parser.getDocument();
> > >     }
> > >     catch (SAXParseException e) {
> > >       // ignore
> > >     }
> > >     catch (Exception e) {
> > >       String msg;
> > >       msg = ("Error: Parse error occurred, " +
> > > e.getMessage());
> > >       if (e instanceof SAXException) {
> > >         e = ((SAXException)e).getException();
> > >       }
> > >       msg += '\n' + e.toString();
> > >       log.severe (msg);
> > >     }
> > >   }
> > >
> > > Results are:
> > > Jan 13, 2003 11:52:20 PM
> > > com.awoma.ts.ui.impl.XHTML11Window load
> > > FINEST: Creating parser took 251 ms
> > > Jan 13, 2003 11:52:25 PM
> > > com.awoma.ts.ui.impl.XHTML11Window load
> > > FINEST: Parsing of emailpassword.xhtml took 5227 ms
> > > Jan 13, 2003 11:52:25 PM com.awoma.ts.ui.Store add
> > > INFO: Window definition emailpassword.xhtml added.
> > > Jan 13, 2003 11:52:25 PM
> > > com.awoma.ts.ui.impl.XHTML11Window load
> > > FINEST: Creating parser took 10 ms
> > > Jan 13, 2003 11:52:29 PM
> > > com.awoma.ts.ui.impl.XHTML11Window load
> > > FINEST: Parsing of emailpassword2.xhtml took 3085 ms
> > > Jan 13, 2003 11:52:29 PM com.awoma.ts.ui.Store add
> > > INFO: Window definition emailpassword2.xhtml added.
> > > Jan 13, 2003 11:52:29 PM
> > > com.awoma.ts.ui.impl.XHTML11Window load
> > > FINEST: Creating parser took 0 ms
> > > Jan 13, 2003 11:52:29 PM
> > > com.awoma.ts.ui.impl.XHTML11Window load
> > > FINEST: Parsing of emailpassword3.xhtml took 10 ms
> > > Jan 13, 2003 11:52:29 PM com.awoma.ts.ui.Store add
> > > INFO: Window definition emailpassword3.xhtml added.
> > > Jan 13, 2003 11:52:29 PM
> > > com.awoma.ts.ui.impl.XHTML11Window load
> > > FINEST: Creating parser took 0 ms
> > > Jan 13, 2003 11:52:31 PM
> > > com.awoma.ts.ui.impl.XHTML11Window load
> > > FINEST: Parsing of emailpassword4.xhtml took 2774 ms
> > >
> > > All files are identical, except #3 where I removed
> > > all references to the
> > > external world.
> > >
> > > I use Xerces J 2.2.1 (according to build.xml).
> > >
> > > Conclusions & questions:
> > > 1/ Creation of DOMParser() is slow the first time,
> > > but ridiculous
> > > afterwards, so there is no need for enhancing that
> > > much.
> > > 2/ My parser seems to want to check the validity
> > > through external
> > > connection. How can I remove those without modifying
> > > all my files?
> > >
> > > jgp
> > >
> > > > -----Original Message-----
> > > > From: Simon Kitching
> > > [mailto:simon@(protected)]
> > > > Sent: Monday, January 13, 2003 23:24
> > > > To: jgp@(protected)
> > > > Cc: xerces-j-user@(protected)
> > > > Subject: Re: Enhancing parsing performance
> > > >
> > > > Hi Jean Georges,
> > > >
> > > > Firstly, does the document you are parsing contain
> > > a DTD or schema
> > > > reference? If it uses http://acme.com/xyz.dtd,
> > > then much of your parsing
> > > > time may actually be in retrieval of the remote
> > > dtd. And if the
> > > > dtd/schema is large then time will be spent
> > > processing it. If this is
> > > > the case, there are optimisations available for
> > > both these problems.
> > > >
> > > > Secondly, you don't say exactly what you are
> > > timing. Is it the complete
> > > > application time, or the time taken by the method
> > > you include below, or
> > > > just the time for the parse method?
> > > >
> > > > Thirdly, you don't mention which version of Xerces
> > > you are using...
> > > >
> > > > Providing information on the above would allow
> > > people to provide better
> > > > suggestions for you..
> > > >
> > > > I certainly see better performance than you do, so
> > > there is hope :-)
> > > >
> > > > Regards,
> > > >
> > > > Simon
> > > >
> > > > On Tue, 2003-01-14 at 10:56, Jean Georges PERRIN
> > > wrote:
> > > > > Hi,
> > > > >
> > > > > Thanks for those who helped me with cloning...
> > > > >
> > > > > I am a little surprised with performance. Maybe
> > > there are some basic
> > > > things
> > > > > I am doing wrong.
> > > > >
> > > > > I am parsing a 3 Kb XHTML file and it takes me
> > > about 4s, cloning the
> > > > tree
> > > > > takes me roughly a ridiculous amount of time
> > > (10ms). This on an Athlon
> > > > XP
> > > > > 1800+ running XP (sure I could switch to Linux
> > > but it is not planned for
> > > > now
> > > > > :) ).
> > > > >
> > > > > My code for parsing:
> > > > >   protected void load () {
> > > > >     DOMParser parser;
> > > > >
> > > > >     try {
> > > > >       parser = new DOMParser();
> > > > >     }
> > > > >     catch (Exception e) {
> > > > >       log.severe ("Error: Unable to instantiate
> > > parser");
> > > > >       return;
> > > > >     }
> > > > >
> > > > >     try {
> > > > >       parser.parse(m_file.toURI().toString());
> > > > >       m_document = parser.getDocument();
> > > > >     }
> > > > >     catch (SAXParseException e) {
> > > > >       // ignore
> > > > >     }
> > > > >     catch (Exception e) {
> > > > >       String msg;
> > > > >       msg = ("Error: Parse error occurred, " +
> > > e.getMessage());
> > > > >       if (e instanceof SAXException) {
> > > > >         e = ((SAXException)e).getException();
> > > > >       }
> > > > >       msg += '\n' + e.toString();
> > > > >       log.severe (msg);
> > > > >     }
> > > > >   }
> > > > >
> > > > > Questions:
> > > > > 1/ is static'ing my parser will enhance the
> > > process?
> > > > > 2/ can I "pre" create some objects I can reuse?
> > > > > 3/ are there some eventual verification I can
> > > turn
> > === message truncated ===
> >
> >
> > __________________________________________________
> > Do you Yahoo!?
> > Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> > http://mailplus.yahoo.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: xerces-j-user-unsubscribe@(protected)
> > For additional commands, e-mail: xerces-j-user-help@(protected)
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@(protected)
For additional commands, e-mail: xerces-j-user-help@(protected)