Subjects
Home
VOTE Move XML Commons to Xerces
Commented: (XERCESJ 589) Bug with pattern restriction on long strings
: Xerces J 2 8 1 Release on Wednesday, September 13th
: Xerces J 2 9 0 Release on Wednesday, November 22nd
Commented: (XERCESJ 1066) Restriction+choice+substitutionGroup error
Commented: (XERCESJ 1178) Error getting prefix for an attribute with no n
Updated: (XERCESJ 1244) XMLSchemaValidator does not contribute element 's
Some consideration about the xerces DOM implementation
Updated: (XERCESJ 1066) Restriction+choice+substitutionGroup error
Commented: (XERCESJ 1227) Poor performance / OutOfMemoryError for sequenc
retain exception stack traces
Updated: (XERCESJ 1193) NPE or hang when parsing using the "continue afte
Future of NekoHTML
Commented: (XERCESJ 1203) NPE in XMLDTDProcessor
DOM Level 3 APIs for Xalan J and a new Xalan release (2 7 1)
: xml commons external 1 3 04 Release on Wednesday, November 22nd
Commented: (XERCESJ 1247) Incorrect location information on SAX when usin
XInclude exceptions how to mirror Xerces J functionality into Xerces C++?
First proposal on SoC project "Add support for the StAX (JSR 173) cursor API
: xml commons resolver 1 2 Release on Wednesday, November 22nd
Typo in RangeToken java Please check
Validator features
java lang ClassCastException when adopting Node
using the org apache xerces impl xs identity package
Updated: (XERCESJ 1257) buffer overflow in UTF8Reader for characters out
Problem with ref attributes and schema validation
Updated: (XERCESJ 122) XMLSchemaValidator does not contribute element 's d
Performance problem under load Xerces with Weblogic 9 x
remove ignored memory allocation
Commented: (XERCESJ 1177) SAXXMLStreamReader doesn 't always report namesp
Commented: (XERCESJ 977) Null pointer exception during DOM parsing
Commented: (XERCESJ 1197) Code cleanup for org apache xml serialize
Commented: (XERCESJ 1201) Initial contribution for StAX Event API
Updated: (XERCESJ 1061) Regex "$ " and "^ " characters treated as special c
Commented: (XERCESJ 1199) SAXXMLStreamReader should attempt to register a
Commented: (XERCESJ 1061) Regex "$ " and "^ " characters treated as special
Updated: (XERCESJ 589) Bug with pattern restriction on long strings
StackOverflow
xerces Range unnecessarily not garbage collectable if not detached
Updated: (XERCESJ 1178) Error getting prefix for an attribute with no nam
Bug in xs:redefine
Commented: (XERCESJ 1204) Can not set XMLEntityResolver for LSParser
Updated: (XERCESJ 1253) Prototype for SoC2007 project "Add support for th
Updated: (XERCESJ 1259) Add SteamFilter Function to SoC2007 project "Add
Assigned: (XERCESJ 444) SAXException thrown by EntityResolver is reported
Google Summer of Code 2007
Xerces J and XInclude relative path issue
Assigned: (XERCESJ 206) Stack overflow when using a schema validation
Commented: (XERCESJ 1215) Restrictions involving two levels of substituti
Closed: (XERCESJ 1203) NPE in XMLDTDProcessor
non overriding equals methoda
Resolved: (XERCESJ 1079) invalid value returned for TOTALDIGITS facet in
Xerces AS3 port
Updated: (XERCESJ 325) Regular Expression; Pattern "| " clause order de
Updated: (XERCESJ 1196) Javadoc generation fails on Java SE 5 0
Closed: (XERCESJ 1202) DTD validation on XIncluded documents when the sch
Created: (XERCESJ 1124) Nonspecific schema error message
a bug in xerces
Updated: (XERCESJ 1201) Initial contribution for StAX Event API
Closed: (XERCESJ 1254) Empty uris in targetNamespace attribute not report
Links
Home
Oracle database error code
 
Search:  
Power your search with and, or, +, -, or "some phrase" operators.
Unexplained performance feature with SAXParser

Unexplained performance feature with SAXParser

2007-05-25       - By Vasilescu, Andrei
Reply:     1     2  

Hello,

My name is Andrei Vasilescu and I have been creating a small SAX parser that
is suited for less demanding environments. In the process, I have been
constantly comparing it to Xerces and I uncovered a performance issue in the
SAXParser of Xerces that I can't find a good explanation for.

The XML files that I am testing with have the following structure:
- site tag
- entry tags of the form: <tagName tagAttribute = "tagValue"></tagName>
- end of site tag

Here are two variations of this structure. The first one is totally random,
all tagNames have 11 random characters, all tagAttributes have 7 random
characters and all values are 9 characters long. An example follows:

---Random.xml---
<?xml version="1.0" standalone="yes"?>

<site>

<npfunmptchh vmrdcf0="lemidtnll"></npfunmptchh>

<bfgtqdpguoq  duuseo0="qriilbnrj"></bfgtqdpguoq>

...

</site>



The second one has the exact same lengths of the three pieces of information,
and all tagNames are the same: testtestabc. An example follows:



---Constant.xml---

<?xml version="1.0" standalone="yes"?>

<site>

<testtestabc  jhejko0="nejqmdudg"></testtestttt>

<testtestabc  pfucbp0="ecgrspgjo"></testtestttt>

...

</site>





I have toyed with them for quite a while and for files containing 70k tags
(with immediate closing tags) the performance in reading the input is divided
by a factor of almost 4 in the case where the tags have the same name. The
following messages are from a small Java program I am using to test this.



---Random.xml---

parser name: com.sun.org.apache.xerces.internal.parsers.SAXParser

Beginning Document

End of Document Reached

parse time = 16984



---Constant.xml---

parser name: com.sun.org.apache.xerces.internal.parsers.SAXParser

Beginning Document

End of Document Reached

parse time = 4563



The times are consistent between different runs of the parser.



I would appreciate any help in understanding the inner workings of Xerces
that make this happen.



Thank you,



Andrei Vasilescu



P.S.: In experimenting even further I tried making up tags with constant and
variable parts such as :"testtest" followed by the number of the tag
(0-70000). The performance is close to that of totally random, the time of
parsing being around 15563.








---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@(protected)
For additional commands, e-mail: j-dev-help@(protected)