Subjects
Home
VOTE Move XML Commons to Xerces
Commented: (XERCESJ 589) Bug with pattern restriction on long strings
: Xerces J 2 8 1 Release on Wednesday, September 13th
: Xerces J 2 9 0 Release on Wednesday, November 22nd
Commented: (XERCESJ 1066) Restriction+choice+substitutionGroup error
Commented: (XERCESJ 1178) Error getting prefix for an attribute with no n
Updated: (XERCESJ 1244) XMLSchemaValidator does not contribute element 's
Some consideration about the xerces DOM implementation
Updated: (XERCESJ 1066) Restriction+choice+substitutionGroup error
Commented: (XERCESJ 1227) Poor performance / OutOfMemoryError for sequenc
retain exception stack traces
Updated: (XERCESJ 1193) NPE or hang when parsing using the "continue afte
Future of NekoHTML
Commented: (XERCESJ 1203) NPE in XMLDTDProcessor
DOM Level 3 APIs for Xalan J and a new Xalan release (2 7 1)
: xml commons external 1 3 04 Release on Wednesday, November 22nd
Commented: (XERCESJ 1247) Incorrect location information on SAX when usin
XInclude exceptions how to mirror Xerces J functionality into Xerces C++?
First proposal on SoC project "Add support for the StAX (JSR 173) cursor API
: xml commons resolver 1 2 Release on Wednesday, November 22nd
Typo in RangeToken java Please check
Validator features
java lang ClassCastException when adopting Node
using the org apache xerces impl xs identity package
Updated: (XERCESJ 1257) buffer overflow in UTF8Reader for characters out
Problem with ref attributes and schema validation
Updated: (XERCESJ 122) XMLSchemaValidator does not contribute element 's d
Performance problem under load Xerces with Weblogic 9 x
remove ignored memory allocation
Commented: (XERCESJ 1177) SAXXMLStreamReader doesn 't always report namesp
Commented: (XERCESJ 977) Null pointer exception during DOM parsing
Commented: (XERCESJ 1197) Code cleanup for org apache xml serialize
Commented: (XERCESJ 1201) Initial contribution for StAX Event API
Updated: (XERCESJ 1061) Regex "$ " and "^ " characters treated as special c
Commented: (XERCESJ 1199) SAXXMLStreamReader should attempt to register a
Commented: (XERCESJ 1061) Regex "$ " and "^ " characters treated as special
Updated: (XERCESJ 589) Bug with pattern restriction on long strings
StackOverflow
xerces Range unnecessarily not garbage collectable if not detached
Updated: (XERCESJ 1178) Error getting prefix for an attribute with no nam
Bug in xs:redefine
Commented: (XERCESJ 1204) Can not set XMLEntityResolver for LSParser
Updated: (XERCESJ 1253) Prototype for SoC2007 project "Add support for th
Updated: (XERCESJ 1259) Add SteamFilter Function to SoC2007 project "Add
Assigned: (XERCESJ 444) SAXException thrown by EntityResolver is reported
Google Summer of Code 2007
Xerces J and XInclude relative path issue
Assigned: (XERCESJ 206) Stack overflow when using a schema validation
Commented: (XERCESJ 1215) Restrictions involving two levels of substituti
Closed: (XERCESJ 1203) NPE in XMLDTDProcessor
non overriding equals methoda
Resolved: (XERCESJ 1079) invalid value returned for TOTALDIGITS facet in
Xerces AS3 port
Updated: (XERCESJ 325) Regular Expression; Pattern "| " clause order de
Updated: (XERCESJ 1196) Javadoc generation fails on Java SE 5 0
Closed: (XERCESJ 1202) DTD validation on XIncluded documents when the sch
Created: (XERCESJ 1124) Nonspecific schema error message
a bug in xerces
Updated: (XERCESJ 1201) Initial contribution for StAX Event API
Closed: (XERCESJ 1254) Empty uris in targetNamespace attribute not report
Links
Home
Oracle database error code
 
Search:  
Power your search with and, or, +, -, or "some phrase" operators.
Commented: (XERCESJ-977) Null pointer exception during DOM parsing

Commented: (XERCESJ-977) Null pointer exception during DOM parsing

2004-10-28       - By Ed Tyrrill (JIRA)
Reply:     1     2     3     4     5  

    [ http://issues.apache.org/jira/browse/XERCESJ-977?page=comments#action
_54756 ]
   
Ed Tyrrill commented on XERCESJ-977:
------------------------------------

I ran into this same problem using the xerces parser that is packaged with
java 5.0.  I did some investigation, and discovered the resason for the
problem.  First, you can download an xml document and dtd that will allow
you to reproduce the problem from:

ftp://ftp.avamar.com/pub/files/sun/x6.xml
ftp://ftp.avamar.com/pub/files/sun/event_catalog.dtd

All of my investigation was using the code that comes with jdk1.5.0.  When
I compared this code to xerces 2.6.2 they appear to be virually identicle.
The fix is not simple because it really requires a minor design change.
I'll describe in detail what is happening so that this problem may be
fixed.

In com.sun.org.apache.xerces.internal.dom.DeferredDocumentImpl there
are a bunch of two dimensional arrays that keep track of the values
and structure of the document.  One of these arrays, fNodePrevSib
keeps track of the previous sibling in the tree of the current node.
Now the problem is that the value of -1 is used to indicate there is
no previous sibling.  This is a problem because the value -1 is also
used to indicate that that index in the array is unused.

Now a little bit about the two dimensional arrays.  These arrays
allocate new "chunks" as the parsing proceeds, and will dereference
the chunks so they can be garbage collected if a chunk becomes empty.
The NPE is occurring because it thinks a chunk in fNodePrevSib is
empty, frees it, and then goes back to get more previous sibling
information from it.

So why is a chunk becoming empty?  The xml file we are trying to parse
uses a lot of entities.  When DeferredDocumentImpl finds an entity it
places the entity name in one index, and then the replacement string
for the entity is placed in the next index, and then it goes back,
and actually replaces the entity name with the replacement string.
Just by chance the entity is placed in the last index in chunk 11.
Then the replacement string is over 64 characters long so it gets
broken in two, and is placed in the first two indexes of chunk 12.
The first part of the replacement string has no previous sibling
so when it is added to chunk 12 the use count is not incremented on
fNodePrevSib.  When the second part of the string is added the usage
count on fNodePrevSib becomes 1.

The next thing that happens is replacing the entity with it's
replacement string.  So the first part of the string is taken out of
chunk 12.  The reference count on chunk 12 of fNodePrevSib is
decremented to 0, and the chunk is dereferenced (set to null).  So
when we go to get the second half of the string we get the NPE trying
to access the null chunk.

So the real problem is that the dual use of the -1 value causes the
usage count on the chunks to get off.  This only ever matters when
you delete enough stuff to stop using an entire chunk

You might ask, where is this all happening in the code.  Let me describe
that now.  In the appendChild() method on line 673, getChunkIndex() is
called to get the index of the previous child node.  So that index is -1
for the first half of the entity replacement string.  That value, olast,
is passed into setChunkIndex() to set the value -1.  If you go down to
setChunkIndex() you will see that on 1977 that if the "value" parameter
is -1 that instead of storing that value that it calls clearChunkIndex()
instead of storing the value.  The second previous sibling info is then
correctly stored in the next call to appendChild().  Next when the entity
replacement is being performed insertBefore() is called.  In
insertBefore(), the second call to setChunkIndex() again has a value of
-1.  This causes setChunkIndex() to call clearChunkIndex() again, and
this time the code on line 2038 is run, causing the chunk to be set to
null.  Soon after that another call is made to insertBefore(), which
causes the NPE.

I hope this gives you all the information you will need to resolve this
issue.

Thanks,
Ed Tyrrill

> Null pointer exception during DOM parsing
> -----------------------------------------
>
>          Key: XERCESJ-977
>          URL: http://issues.apache.org/jira/browse/XERCESJ-977
>      Project: Xerces2-J
>         Type: Bug
>   Components: DOM
>     Versions: 2.6.2
>     Reporter: Emily Horton

>
> We are parsing large numbers of xml files with DOM and are very occasionally
getting a null pointer exception when parsing.  In this case we tracked the
problem down to a point in the text where there was a quoted attribute inside
quoted text:
> “[a]nimals should be housed in facilities dedicated to or assigned for
that purpose...<bibr rid="b2"/>&rdquo;
> Any of the following changes to the document would get rid of the null
pointer exception and allow parsing:
> 1) Changing the bibr tag to a different without any attributes.
> 2) Removing the outside quotes.
> 3) Moving the bibr tag to outside the quotes.
> Here is the stack trace for the error:
> 522316528 [Thread-200] ERROR -> org.apache.xerces.dom.DeferredDocumentImpl
.setChunkIndex(Unknown Source)
> 522316529 [Thread-200] ERROR -> org.apache.xerces.dom.DeferredDocumentImpl
.insertBefore(Unknown Source)
> 522316529 [Thread-200] ERROR -> org.apache.xerces.parsers.AbstractDOMParser
.endGeneralEntity(Unknown Source)
> 522316529 [Thread-200] ERROR -> org.apache.xerces.impl.dtd.XMLDTDValidator
.endGeneralEntity(Unknown Source)
> 522316529 [Thread-200] ERROR -> org.apache.xerces.impl
.XMLDocumentFragmentScannerImpl.endEntity(Unknown Source)
> 522316530 [Thread-200] ERROR -> org.apache.xerces.impl.XMLDocumentScannerImpl
.endEntity(Unknown Source)
> 522316530 [Thread-200] ERROR -> org.apache.xerces.impl.XMLEntityManager
.endEntity(Unknown Source)
> 522316530 [Thread-200] ERROR -> org.apache.xerces.impl.XMLEntityScanner.load
(Unknown Source)
> 522316530 [Thread-200] ERROR -> org.apache.xerces.impl.XMLEntityScanner
.scanContent(Unknown Source)
> 522316530 [Thread-200] ERROR -> org.apache.xerces.impl
.XMLDocumentFragmentScannerImpl.scanContent(Unknown Source)
> 522316530 [Thread-200] ERROR -> org.apache.xerces.impl
.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
> 522316530 [Thread-200] ERROR -> org.apache.xerces.impl
.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
> 522316531 [Thread-200] ERROR -> org.apache.xerces.parsers.XML11Configuration
.parse(Unknown Source)
> 522316531 [Thread-200] ERROR -> org.apache.xerces.parsers.DTDConfiguration
.parse(Unknown Source)
> 522316531 [Thread-200] ERROR -> org.apache.xerces.parsers.XMLParser.parse
(Unknown Source)
> 522316531 [Thread-200] ERROR -> org.apache.xerces.parsers.DOMParser.parse
(Unknown Source)
> 522316531 [Thread-200] ERROR -> org.apache.xerces.jaxp.DocumentBuilderImpl
.parse(Unknown Source)
> 522316531 [Thread-200] ERROR -> javax.xml.parsers.DocumentBuilder.parse
(Unknown Source)

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
  http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
  http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@(protected)
For additional commands, e-mail: xerces-j-dev-help@(protected)