Element.getElementsByTagName Efficiency 2003-01-23 - By Hanson, Matthew
Hi - Yes, I have already looked at the source, and from what I saw, Element uses the TreeWalker to iterate across an array. My concern is that the access time for a linear search for specific nodes based on the tag vals of 3 or more attributes may be a bad fit as the number of children gets to the hundreds or thousands. The other approach that I am considering is keeping the binary tree implementation until I need to output the data - as xml. The main reason that I want to do this is to be able to potentially create many output formats. This might be a compromise solution. Matt Hanson
-----Original Message----- From: Joseph Kesselman [mailto:keshlam@(protected)] Sent: Wednesday, January 22, 2003 4:36 PM To: xerces-j-user@(protected) Subject: Re: Element.getElementsByTagName Efficiency
> what is the underlying architecture of the Element class and its impact > on getElementsByTagName
Xerces is open source, and the DOM implementation isn't particularly complicated code; reading the code might be the best way to get the details you're looking for.
(I could discuss how it used to work -- basic tree-walk looking for matching nodes, plus caching for faster re-access, plus an extremely annoying cache flush to match the DOM's requirement that Nodelist be a "live view" -- but I haven't checked the implementation to see if it still works that way. I'd expect it does, though.)
You'll probably also want to look at the DOM Level 2 TreeWalker and NodeIterator as possible alternatives. With appropriate filtering they can perform the same kind of search, more efficiently if your filter understands how to skip irrelevant subtrees, and they avoid most or all of the cache/flush overhead.
______________________________________ Joe Kesselman / IBM Research
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <HTML><HEAD> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META content="MSHTML 5.50.4916.2300" name=GENERATOR></HEAD> <BODY> <DIV><SPAN class=424220613-23012003><FONT face=Arial color=#0000ff size=2>Hi - Yes, I have already looked at the source, and from what I saw, Element  ;uses the TreeWalker to iterate across an array. My concern is that the access time for a linear search for specific nodes based on the tag vals of 3 or more attributes may be a bad fit as the number of children gets to the hundreds or thousands.</FONT></SPAN></DIV> <DIV><SPAN class=424220613-23012003><FONT face=Arial color=#0000ff size=2></FONT></SPAN> </DIV> <DIV><SPAN class=424220613-23012003><FONT face=Arial color=#0000ff size=2>The other approach that I am considering is keeping the binary tree implementation until I need to output the data - as xml. The main reason that I want to do this is to be able to potentially create many output formats. This might be a compromise solution.</FONT></SPAN></DIV> <DIV><SPAN class=424220613-23012003><FONT face=Arial color=#0000ff size=2></FONT></SPAN> </DIV> <DIV><SPAN class=424220613-23012003><FONT face=Arial color=#0000ff size=2>Matt Hanson</FONT></SPAN></DIV> <BLOCKQUOTE> <DIV class=OutlookMessageHeader dir=ltr align=left><FONT face=Tahoma size=2>-----Original Message-----<BR><B>From:</B> Joseph Kesselman [mailto:keshlam@(protected)]<BR><B>Sent:</B> Wednesday, January 22, 2003 4:36 PM<BR><B>To:</B> xerces-j-user@(protected)<BR><B>Subject:</B> Re: Element.getElementsByTagName Efficiency<BR><BR></FONT></DIV><BR><FONT face=sans-serif size=2>></FONT><FONT size=2><TT> what is the underlying architecture of the Element class and its impact</TT></FONT> <BR><FONT size=2><TT>> on getElementsByTagName</TT></FONT> <BR><BR><FONT size=2><TT>Xerces is open source, and the DOM implementation isn't particularly complicated code; reading the code might be the best way to get the details you're looking for. </TT></FONT><BR><BR><FONT size=2><TT>(I could discuss how it used to work -- basic tree-walk looking for matching nodes, plus caching for faster re-access, plus an extremely annoying cache flush to match the DOM's requirement that Nodelist be a "live view" -- but I haven't checked the implementation to see if it still works that way. I'd expect it does, though.)</TT></FONT> <BR><BR><FONT size=2><TT>You'll probably also want to look at the DOM Level 2 TreeWalker and NodeIterator as possible alternatives. With appropriate filtering they can perform the same kind of search, more efficiently if your filter understands how to skip irrelevant subtrees, and they avoid most or all of the cache/flush overhead.</TT></FONT> <BR><FONT face=sans-serif size=2><BR>______________________________________<BR>Joe Kesselman / IBM Research<BR></BLOCKQUOTE></FONT></BODY></HTML>
|
|