Subjects
Home
VOTE Move XML Commons to Xerces
Commented: (XERCESJ 589) Bug with pattern restriction on long strings
: Xerces J 2 8 1 Release on Wednesday, September 13th
: Xerces J 2 9 0 Release on Wednesday, November 22nd
Commented: (XERCESJ 1066) Restriction+choice+substitutionGroup error
Commented: (XERCESJ 1178) Error getting prefix for an attribute with no n
Updated: (XERCESJ 1244) XMLSchemaValidator does not contribute element 's
Some consideration about the xerces DOM implementation
Updated: (XERCESJ 1066) Restriction+choice+substitutionGroup error
Commented: (XERCESJ 1227) Poor performance / OutOfMemoryError for sequenc
retain exception stack traces
Updated: (XERCESJ 1193) NPE or hang when parsing using the "continue afte
Future of NekoHTML
Commented: (XERCESJ 1203) NPE in XMLDTDProcessor
DOM Level 3 APIs for Xalan J and a new Xalan release (2 7 1)
: xml commons external 1 3 04 Release on Wednesday, November 22nd
Commented: (XERCESJ 1247) Incorrect location information on SAX when usin
XInclude exceptions how to mirror Xerces J functionality into Xerces C++?
First proposal on SoC project "Add support for the StAX (JSR 173) cursor API
: xml commons resolver 1 2 Release on Wednesday, November 22nd
Typo in RangeToken java Please check
Validator features
java lang ClassCastException when adopting Node
using the org apache xerces impl xs identity package
Updated: (XERCESJ 1257) buffer overflow in UTF8Reader for characters out
Problem with ref attributes and schema validation
Updated: (XERCESJ 122) XMLSchemaValidator does not contribute element 's d
Performance problem under load Xerces with Weblogic 9 x
remove ignored memory allocation
Commented: (XERCESJ 1177) SAXXMLStreamReader doesn 't always report namesp
Commented: (XERCESJ 977) Null pointer exception during DOM parsing
Commented: (XERCESJ 1197) Code cleanup for org apache xml serialize
Commented: (XERCESJ 1201) Initial contribution for StAX Event API
Updated: (XERCESJ 1061) Regex "$ " and "^ " characters treated as special c
Commented: (XERCESJ 1199) SAXXMLStreamReader should attempt to register a
Commented: (XERCESJ 1061) Regex "$ " and "^ " characters treated as special
Updated: (XERCESJ 589) Bug with pattern restriction on long strings
StackOverflow
xerces Range unnecessarily not garbage collectable if not detached
Updated: (XERCESJ 1178) Error getting prefix for an attribute with no nam
Bug in xs:redefine
Commented: (XERCESJ 1204) Can not set XMLEntityResolver for LSParser
Updated: (XERCESJ 1253) Prototype for SoC2007 project "Add support for th
Updated: (XERCESJ 1259) Add SteamFilter Function to SoC2007 project "Add
Assigned: (XERCESJ 444) SAXException thrown by EntityResolver is reported
Google Summer of Code 2007
Xerces J and XInclude relative path issue
Assigned: (XERCESJ 206) Stack overflow when using a schema validation
Commented: (XERCESJ 1215) Restrictions involving two levels of substituti
Closed: (XERCESJ 1203) NPE in XMLDTDProcessor
non overriding equals methoda
Resolved: (XERCESJ 1079) invalid value returned for TOTALDIGITS facet in
Xerces AS3 port
Updated: (XERCESJ 325) Regular Expression; Pattern "| " clause order de
Updated: (XERCESJ 1196) Javadoc generation fails on Java SE 5 0
Closed: (XERCESJ 1202) DTD validation on XIncluded documents when the sch
Created: (XERCESJ 1124) Nonspecific schema error message
a bug in xerces
Updated: (XERCESJ 1201) Initial contribution for StAX Event API
Closed: (XERCESJ 1254) Empty uris in targetNamespace attribute not report
Links
Home
Oracle database error code
 
Search:  
Power your search with and, or, +, -, or "some phrase" operators.
Enhancing parsing performance

Enhancing parsing performance

2003-01-13       - By Simon Kitching
Reply:     1     2     3     4     5     6     7     8     9     10  

Hi,

> Turn validation off!

Unfortunately, turning validation off won't speed things up very much.

Essentially, disabling validation only *suppresses* error messages about
invalid input. The DTD or schema is still processed because it can
contain things like default attribute values or entity definitions which
should be applied even when validation is disabled.

If you are really sure that the DTD or schema doesn't contain any data
that will affect the xml document being parsed, then you can either:
(a) use a feature like:
http://xml.apache.org/xerces2-j/features.html#external-parameter-entities
or
http://xml.apache.org/xerces2-j/features.html#nonvalidating.load-dtd-grammar
or
(b) register a custom EntityResolver object with the parser, which
returns an empty DTD or schema when asked for the external one

I'm not sure which of the features listed in (a) above is the one you
want. I use approach (b) currently.

If you do need to process the DTD, then you can use a custom
EntityResolver to look up a locally cached version. I don't know if
there are already implementations for catalog lookup, etc. or if you
will have to roll your own.

Hope this helps,

Simon

On Tue, 2003-01-14 at 12:18, Brian Madigan wrote:
> DOMParser parser = new DOMParser( );
> parser.setFeature
>             ("http://xml.org/sax/features/validation",
>
>             false);
> or something to that effect. If I am not mistaken,
> that should stop any dtd validation from happening.
>
> --- Jean Georges PERRIN <jgp@(protected)> wrote:
> > Hi,
> >
> > Thanks for the hope message!
> >
> > I was timing the whole method, I focused on parser
> > creation and parse time
> > now.
> >
> > I changed my code to:
> >   public void load () {
> >     DOMParser parser;
> >     Logger log =
> >
> ThinStructureConfiguration.getInstance().getLogger();
> >    
> >     try {
> >       long start = System.currentTimeMillis();
> >       parser = new DOMParser();
> >       long stop = System.currentTimeMillis();
> >       log.finest ("Creating parser took " + (stop -
> > start) + " ms");
> >     }
> >     catch (Exception e) {
> >       log.severe ("Error: Unable to instantiate
> > parser");
> >       return;
> >     }
> >
> >     try {
> >       long start = System.currentTimeMillis();
> >       parser.parse(m_file.toURI().toString());
> >       long stop = System.currentTimeMillis();
> >       log.finest ("Parsing of " + m_file.getName() +
> > " took " + (stop -
> > start) + " ms");
> >       m_document = parser.getDocument();
> >     }
> >     catch (SAXParseException e) {
> >       // ignore
> >     }
> >     catch (Exception e) {
> >       String msg;
> >       msg = ("Error: Parse error occurred, " +
> > e.getMessage());
> >       if (e instanceof SAXException) {
> >         e = ((SAXException)e).getException();
> >       }
> >       msg += '\n' + e.toString();
> >       log.severe (msg);
> >     }
> >   }
> >
> > Results are:
> > Jan 13, 2003 11:52:20 PM
> > com.awoma.ts.ui.impl.XHTML11Window load
> > FINEST: Creating parser took 251 ms
> > Jan 13, 2003 11:52:25 PM
> > com.awoma.ts.ui.impl.XHTML11Window load
> > FINEST: Parsing of emailpassword.xhtml took 5227 ms
> > Jan 13, 2003 11:52:25 PM com.awoma.ts.ui.Store add
> > INFO: Window definition emailpassword.xhtml added.
> > Jan 13, 2003 11:52:25 PM
> > com.awoma.ts.ui.impl.XHTML11Window load
> > FINEST: Creating parser took 10 ms
> > Jan 13, 2003 11:52:29 PM
> > com.awoma.ts.ui.impl.XHTML11Window load
> > FINEST: Parsing of emailpassword2.xhtml took 3085 ms
> > Jan 13, 2003 11:52:29 PM com.awoma.ts.ui.Store add
> > INFO: Window definition emailpassword2.xhtml added.
> > Jan 13, 2003 11:52:29 PM
> > com.awoma.ts.ui.impl.XHTML11Window load
> > FINEST: Creating parser took 0 ms
> > Jan 13, 2003 11:52:29 PM
> > com.awoma.ts.ui.impl.XHTML11Window load
> > FINEST: Parsing of emailpassword3.xhtml took 10 ms
> > Jan 13, 2003 11:52:29 PM com.awoma.ts.ui.Store add
> > INFO: Window definition emailpassword3.xhtml added.
> > Jan 13, 2003 11:52:29 PM
> > com.awoma.ts.ui.impl.XHTML11Window load
> > FINEST: Creating parser took 0 ms
> > Jan 13, 2003 11:52:31 PM
> > com.awoma.ts.ui.impl.XHTML11Window load
> > FINEST: Parsing of emailpassword4.xhtml took 2774 ms
> >
> > All files are identical, except #3 where I removed
> > all references to the
> > external world.
> >
> > I use Xerces J 2.2.1 (according to build.xml).
> >
> > Conclusions & questions:
> > 1/ Creation of DOMParser() is slow the first time,
> > but ridiculous
> > afterwards, so there is no need for enhancing that
> > much.
> > 2/ My parser seems to want to check the validity
> > through external
> > connection. How can I remove those without modifying
> > all my files?
> >
> > jgp
> >
> > > -----Original Message-----
> > > From: Simon Kitching
> > [mailto:simon@(protected)]
> > > Sent: Monday, January 13, 2003 23:24
> > > To: jgp@(protected)
> > > Cc: xerces-j-user@(protected)
> > > Subject: Re: Enhancing parsing performance
> > >
> > > Hi Jean Georges,
> > >
> > > Firstly, does the document you are parsing contain
> > a DTD or schema
> > > reference? If it uses http://acme.com/xyz.dtd,
> > then much of your parsing
> > > time may actually be in retrieval of the remote
> > dtd. And if the
> > > dtd/schema is large then time will be spent
> > processing it. If this is
> > > the case, there are optimisations available for
> > both these problems.
> > >
> > > Secondly, you don't say exactly what you are
> > timing. Is it the complete
> > > application time, or the time taken by the method
> > you include below, or
> > > just the time for the parse method?
> > >
> > > Thirdly, you don't mention which version of Xerces
> > you are using...
> > >
> > > Providing information on the above would allow
> > people to provide better
> > > suggestions for you..
> > >
> > > I certainly see better performance than you do, so
> > there is hope :-)
> > >
> > > Regards,
> > >
> > > Simon
> > >
> > > On Tue, 2003-01-14 at 10:56, Jean Georges PERRIN
> > wrote:
> > > > Hi,
> > > >
> > > > Thanks for those who helped me with cloning...
> > > >
> > > > I am a little surprised with performance. Maybe
> > there are some basic
> > > things
> > > > I am doing wrong.
> > > >
> > > > I am parsing a 3 Kb XHTML file and it takes me
> > about 4s, cloning the
> > > tree
> > > > takes me roughly a ridiculous amount of time
> > (10ms). This on an Athlon
> > > XP
> > > > 1800+ running XP (sure I could switch to Linux
> > but it is not planned for
> > > now
> > > > :) ).
> > > >
> > > > My code for parsing:
> > > >   protected void load () {
> > > >     DOMParser parser;
> > > >
> > > >     try {
> > > >       parser = new DOMParser();
> > > >     }
> > > >     catch (Exception e) {
> > > >       log.severe ("Error: Unable to instantiate
> > parser");
> > > >       return;
> > > >     }
> > > >
> > > >     try {
> > > >       parser.parse(m_file.toURI().toString());
> > > >       m_document = parser.getDocument();
> > > >     }
> > > >     catch (SAXParseException e) {
> > > >       // ignore
> > > >     }
> > > >     catch (Exception e) {
> > > >       String msg;
> > > >       msg = ("Error: Parse error occurred, " +
> > e.getMessage());
> > > >       if (e instanceof SAXException) {
> > > >         e = ((SAXException)e).getException();
> > > >       }
> > > >       msg += '\n' + e.toString();
> > > >       log.severe (msg);
> > > >     }
> > > >   }
> > > >
> > > > Questions:
> > > > 1/ is static'ing my parser will enhance the
> > process?
> > > > 2/ can I "pre" create some objects I can reuse?
> > > > 3/ are there some eventual verification I can
> > turn
> === message truncated ===
>
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> http://mailplus.yahoo.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@(protected)
> For additional commands, e-mail: xerces-j-user-help@(protected)
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@(protected)
For additional commands, e-mail: xerces-j-user-help@(protected)