Subjects
Home
VOTE Move XML Commons to Xerces
Commented: (XERCESJ 589) Bug with pattern restriction on long strings
: Xerces J 2 8 1 Release on Wednesday, September 13th
: Xerces J 2 9 0 Release on Wednesday, November 22nd
Commented: (XERCESJ 1066) Restriction+choice+substitutionGroup error
Commented: (XERCESJ 1178) Error getting prefix for an attribute with no n
Updated: (XERCESJ 1244) XMLSchemaValidator does not contribute element 's
Some consideration about the xerces DOM implementation
Updated: (XERCESJ 1066) Restriction+choice+substitutionGroup error
Commented: (XERCESJ 1227) Poor performance / OutOfMemoryError for sequenc
retain exception stack traces
Updated: (XERCESJ 1193) NPE or hang when parsing using the "continue afte
Future of NekoHTML
Commented: (XERCESJ 1203) NPE in XMLDTDProcessor
DOM Level 3 APIs for Xalan J and a new Xalan release (2 7 1)
: xml commons external 1 3 04 Release on Wednesday, November 22nd
Commented: (XERCESJ 1247) Incorrect location information on SAX when usin
XInclude exceptions how to mirror Xerces J functionality into Xerces C++?
First proposal on SoC project "Add support for the StAX (JSR 173) cursor API
: xml commons resolver 1 2 Release on Wednesday, November 22nd
Typo in RangeToken java Please check
Validator features
java lang ClassCastException when adopting Node
using the org apache xerces impl xs identity package
Updated: (XERCESJ 1257) buffer overflow in UTF8Reader for characters out
Problem with ref attributes and schema validation
Updated: (XERCESJ 122) XMLSchemaValidator does not contribute element 's d
Performance problem under load Xerces with Weblogic 9 x
remove ignored memory allocation
Commented: (XERCESJ 1177) SAXXMLStreamReader doesn 't always report namesp
Commented: (XERCESJ 977) Null pointer exception during DOM parsing
Commented: (XERCESJ 1197) Code cleanup for org apache xml serialize
Commented: (XERCESJ 1201) Initial contribution for StAX Event API
Updated: (XERCESJ 1061) Regex "$ " and "^ " characters treated as special c
Commented: (XERCESJ 1199) SAXXMLStreamReader should attempt to register a
Commented: (XERCESJ 1061) Regex "$ " and "^ " characters treated as special
Updated: (XERCESJ 589) Bug with pattern restriction on long strings
StackOverflow
xerces Range unnecessarily not garbage collectable if not detached
Updated: (XERCESJ 1178) Error getting prefix for an attribute with no nam
Bug in xs:redefine
Commented: (XERCESJ 1204) Can not set XMLEntityResolver for LSParser
Updated: (XERCESJ 1253) Prototype for SoC2007 project "Add support for th
Updated: (XERCESJ 1259) Add SteamFilter Function to SoC2007 project "Add
Assigned: (XERCESJ 444) SAXException thrown by EntityResolver is reported
Google Summer of Code 2007
Xerces J and XInclude relative path issue
Assigned: (XERCESJ 206) Stack overflow when using a schema validation
Commented: (XERCESJ 1215) Restrictions involving two levels of substituti
Closed: (XERCESJ 1203) NPE in XMLDTDProcessor
non overriding equals methoda
Resolved: (XERCESJ 1079) invalid value returned for TOTALDIGITS facet in
Xerces AS3 port
Updated: (XERCESJ 325) Regular Expression; Pattern "| " clause order de
Updated: (XERCESJ 1196) Javadoc generation fails on Java SE 5 0
Closed: (XERCESJ 1202) DTD validation on XIncluded documents when the sch
Created: (XERCESJ 1124) Nonspecific schema error message
a bug in xerces
Updated: (XERCESJ 1201) Initial contribution for StAX Event API
Closed: (XERCESJ 1254) Empty uris in targetNamespace attribute not report
Links
Home
Oracle database error code
 
Search:  
Power your search with and, or, +, -, or "some phrase" operators.
Element.getElementsByTagName Efficiency

Element.getElementsByTagName Efficiency

2003-01-23       - By Hanson, Matthew
Reply:     1     2     3     4  

Hi - Yes, I have already looked at the source, and from what I saw, Element
uses the TreeWalker to iterate across an array.  My concern is that the
access time for a linear search for specific nodes based on the tag vals of
3 or more attributes may be a bad fit as the number of children gets to the
hundreds or thousands.

The other approach that I am considering is keeping the binary tree
implementation until I need to output the data - as xml.  The main reason
that I want to do this is to be able to potentially create many output
formats.  This might be a compromise solution.

Matt Hanson

-----Original Message-----
From: Joseph Kesselman [mailto:keshlam@(protected)]
Sent: Wednesday, January 22, 2003 4:36 PM
To: xerces-j-user@(protected)
Subject: Re: Element.getElementsByTagName Efficiency



> what is the underlying architecture of the Element class and its impact
> on getElementsByTagName

Xerces is open source, and the DOM implementation isn't particularly
complicated code; reading the code might be the best way to get the details
you're looking for.

(I could discuss how it used to work -- basic tree-walk looking for matching
nodes, plus caching for faster re-access, plus an extremely annoying cache
flush to match the DOM's requirement that Nodelist be a "live view" -- but I
haven't checked the implementation to see if it still works that way. I'd
expect it does, though.)

You'll probably also want to look at the DOM Level 2 TreeWalker and
NodeIterator as possible alternatives. With appropriate filtering they can
perform the same kind of search, more efficiently if your filter understands
how to skip irrelevant subtrees, and they avoid most or all of the
cache/flush overhead.

______________________________________
Joe Kesselman  / IBM Research



<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">


<META content="MSHTML 5.50.4916.2300" name=GENERATOR></HEAD>
<BODY>
<DIV><SPAN class=424220613-23012003><FONT face=Arial color=#0000ff size=2>Hi -
Yes, I have already looked at the source, and from what I saw, Element&nbsp
;uses
the TreeWalker to iterate across an array.&nbsp; My concern is that the access
time for a linear search for specific nodes based on the tag vals of 3 or more
attributes may be a bad fit as the number of children gets to the hundreds or
thousands.</FONT></SPAN></DIV>
<DIV><SPAN class=424220613-23012003><FONT face=Arial color=#0000ff
size=2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=424220613-23012003><FONT face=Arial color=#0000ff size=2>The
other approach that I am considering is keeping the binary tree implementation
until I need to output the data - as xml.&nbsp; The main reason that I want to
do this is to be able to potentially create many output formats.&nbsp; This
might be a compromise solution.</FONT></SPAN></DIV>
<DIV><SPAN class=424220613-23012003><FONT face=Arial color=#0000ff
size=2></FONT></SPAN>&nbsp;</DIV>
<DIV><SPAN class=424220613-23012003><FONT face=Arial color=#0000ff size=2>Matt
Hanson</FONT></SPAN></DIV>
<BLOCKQUOTE>
 <DIV class=OutlookMessageHeader dir=ltr align=left><FONT face=Tahoma
 size=2>-----Original Message-----<BR><B>From:</B> Joseph Kesselman
 [mailto:keshlam@(protected)]<BR><B>Sent:</B> Wednesday, January 22, 2003 4:36
 PM<BR><B>To:</B> xerces-j-user@(protected)<BR><B>Subject:</B> Re:
 Element.getElementsByTagName Efficiency<BR><BR></FONT></DIV><BR><FONT
 face=sans-serif size=2>&gt;</FONT><FONT size=2><TT> what is the underlying
 architecture of the Element class and its impact</TT></FONT> <BR><FONT
 size=2><TT>&gt; on getElementsByTagName</TT></FONT> <BR><BR><FONT
 size=2><TT>Xerces is open source, and the DOM implementation isn't
 particularly complicated code; reading the code might be the best way to get
 the details you're looking for. </TT></FONT><BR><BR><FONT size=2><TT>(I could
 discuss how it used to work -- basic tree-walk looking for matching nodes,
 plus caching for faster re-access, plus an extremely annoying cache flush to
 match the DOM's requirement that Nodelist be a "live view" -- but I haven't
 checked the implementation to see if it still works that way. I'd expect it
 does, though.)</TT></FONT> <BR><BR><FONT size=2><TT>You'll probably also want
 to look at the DOM Level 2 TreeWalker and NodeIterator as possible
 alternatives. With appropriate filtering they can perform the same kind of
 search, more efficiently if your filter understands how to skip irrelevant
 subtrees, and they avoid most or all of the cache/flush overhead.</TT></FONT>
 <BR><FONT face=sans-serif
 size=2><BR>______________________________________<BR>Joe Kesselman &nbsp;/
IBM
 Research<BR></BLOCKQUOTE></FONT></BODY></HTML>