Unexplained performance feature with SAXParser 2007-05-25 - By Vasilescu, Andrei
Hello,
My name is Andrei Vasilescu and I have been creating a small SAX parser that is suited for less demanding environments. In the process, I have been constantly comparing it to Xerces and I uncovered a performance issue in the SAXParser of Xerces that I can't find a good explanation for.
The XML files that I am testing with have the following structure: - site tag - entry tags of the form: <tagName tagAttribute = "tagValue"></tagName> - end of site tag
Here are two variations of this structure. The first one is totally random, all tagNames have 11 random characters, all tagAttributes have 7 random characters and all values are 9 characters long. An example follows:
---Random.xml--- <?xml version="1.0" standalone="yes"?>
<site>
<npfunmptchh vmrdcf0="lemidtnll"></npfunmptchh>
<bfgtqdpguoq duuseo0="qriilbnrj"></bfgtqdpguoq>
...
</site>
The second one has the exact same lengths of the three pieces of information, and all tagNames are the same: testtestabc. An example follows:
---Constant.xml---
<?xml version="1.0" standalone="yes"?>
<site>
<testtestabc jhejko0="nejqmdudg"></testtestttt>
<testtestabc pfucbp0="ecgrspgjo"></testtestttt>
...
</site>
I have toyed with them for quite a while and for files containing 70k tags (with immediate closing tags) the performance in reading the input is divided by a factor of almost 4 in the case where the tags have the same name. The following messages are from a small Java program I am using to test this.
---Random.xml---
parser name: com.sun.org.apache.xerces.internal.parsers.SAXParser
Beginning Document
End of Document Reached
parse time = 16984
---Constant.xml---
parser name: com.sun.org.apache.xerces.internal.parsers.SAXParser
Beginning Document
End of Document Reached
parse time = 4563
The times are consistent between different runs of the parser.
I would appreciate any help in understanding the inner workings of Xerces that make this happen.
Thank you,
Andrei Vasilescu
P.S.: In experimenting even further I tried making up tags with constant and variable parts such as :"testtest" followed by the number of the tag (0-70000). The performance is close to that of totally random, the time of parsing being around 15563.
--------------------------------------------------------------------- To unsubscribe, e-mail: j-dev-unsubscribe@(protected) For additional commands, e-mail: j-dev-help@(protected)
|
|