Subjects
Home
VOTE Move XML Commons to Xerces
Commented: (XERCESJ 589) Bug with pattern restriction on long strings
: Xerces J 2 8 1 Release on Wednesday, September 13th
: Xerces J 2 9 0 Release on Wednesday, November 22nd
Commented: (XERCESJ 1066) Restriction+choice+substitutionGroup error
Commented: (XERCESJ 1178) Error getting prefix for an attribute with no n
Updated: (XERCESJ 1244) XMLSchemaValidator does not contribute element 's
Some consideration about the xerces DOM implementation
Updated: (XERCESJ 1066) Restriction+choice+substitutionGroup error
Commented: (XERCESJ 1227) Poor performance / OutOfMemoryError for sequenc
retain exception stack traces
Updated: (XERCESJ 1193) NPE or hang when parsing using the "continue afte
Future of NekoHTML
Commented: (XERCESJ 1203) NPE in XMLDTDProcessor
DOM Level 3 APIs for Xalan J and a new Xalan release (2 7 1)
: xml commons external 1 3 04 Release on Wednesday, November 22nd
Commented: (XERCESJ 1247) Incorrect location information on SAX when usin
XInclude exceptions how to mirror Xerces J functionality into Xerces C++?
First proposal on SoC project "Add support for the StAX (JSR 173) cursor API
: xml commons resolver 1 2 Release on Wednesday, November 22nd
Typo in RangeToken java Please check
Validator features
java lang ClassCastException when adopting Node
using the org apache xerces impl xs identity package
Updated: (XERCESJ 1257) buffer overflow in UTF8Reader for characters out
Problem with ref attributes and schema validation
Updated: (XERCESJ 122) XMLSchemaValidator does not contribute element 's d
Performance problem under load Xerces with Weblogic 9 x
remove ignored memory allocation
Commented: (XERCESJ 1177) SAXXMLStreamReader doesn 't always report namesp
Commented: (XERCESJ 977) Null pointer exception during DOM parsing
Commented: (XERCESJ 1197) Code cleanup for org apache xml serialize
Commented: (XERCESJ 1201) Initial contribution for StAX Event API
Updated: (XERCESJ 1061) Regex "$ " and "^ " characters treated as special c
Commented: (XERCESJ 1199) SAXXMLStreamReader should attempt to register a
Commented: (XERCESJ 1061) Regex "$ " and "^ " characters treated as special
Updated: (XERCESJ 589) Bug with pattern restriction on long strings
StackOverflow
xerces Range unnecessarily not garbage collectable if not detached
Updated: (XERCESJ 1178) Error getting prefix for an attribute with no nam
Bug in xs:redefine
Commented: (XERCESJ 1204) Can not set XMLEntityResolver for LSParser
Updated: (XERCESJ 1253) Prototype for SoC2007 project "Add support for th
Updated: (XERCESJ 1259) Add SteamFilter Function to SoC2007 project "Add
Assigned: (XERCESJ 444) SAXException thrown by EntityResolver is reported
Google Summer of Code 2007
Xerces J and XInclude relative path issue
Assigned: (XERCESJ 206) Stack overflow when using a schema validation
Commented: (XERCESJ 1215) Restrictions involving two levels of substituti
Closed: (XERCESJ 1203) NPE in XMLDTDProcessor
non overriding equals methoda
Resolved: (XERCESJ 1079) invalid value returned for TOTALDIGITS facet in
Xerces AS3 port
Updated: (XERCESJ 325) Regular Expression; Pattern "| " clause order de
Updated: (XERCESJ 1196) Javadoc generation fails on Java SE 5 0
Closed: (XERCESJ 1202) DTD validation on XIncluded documents when the sch
Created: (XERCESJ 1124) Nonspecific schema error message
a bug in xerces
Updated: (XERCESJ 1201) Initial contribution for StAX Event API
Closed: (XERCESJ 1254) Empty uris in targetNamespace attribute not report
Links
Home
Oracle database error code
 
Search:  
Power your search with and, or, +, -, or "some phrase" operators.
Possible encoding related bug

Possible encoding related bug

2003-08-20       - By Sasa Bojanic
Reply:     1     2  

Hi,

I think that that there is an encoding related bug in Xerces2.5.
When using DOM parser, and trying to parse a document that contains characters
that do not belong to the character set that correspond to the specified
document encoding (e.g. the character ä is contained in the document which
encoding is specified as "us-ascii"), the parser is crashing.

Here is the code snippet:

     try {
        DOMParser parser = new DOMParser();
        parser.parse(toParse);
     }catch (Exception ex) {
        ex.printStackTrace();
     }

* "toParse" is the path to the following document:

<?xml version="1.0" encoding="us-ascii"?>
<Package Id="pkg1">
 <!-- ä -->
   <PackageHeader>
       <XPDLVersion>1.0</XPDLVersion>
       <Vendor>Together</Vendor>
       <Created>2003-08-20 10:00:49</Created>
   </PackageHeader>
</Package>

The parser crashes because of ä character, and I get the following stack trace:
java.io.IOException: Byte "228" is not a member of the (7-bit) ASCII character
set.
       at org.apache.xerces.impl.io.ASCIIReader.read(Unknown Source)
       at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source)
       at org.apache.xerces.impl.XML11EntityScanner.skipSpaces(Unknown Source)
       at org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher
.dispatch(Unknown Source)
       at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument
(Unknown Source)
       at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
       at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
       at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
       at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
       at XML.main(XML.java:25)

When I use Xerces2.4, everything goes fine!

Regards,
Sasa.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content="text/html; charset=iso-8859-1" http-equiv=Content-Type>
<META content="MSHTML 5.00.3315.2870" name=GENERATOR>
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT face=Arial size=2>Hi,</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>I think that that there is an encoding related bug
in Xerces2.5.</FONT></DIV>
<DIV><FONT face=Arial size=2>When using DOM parser, and&nbsp;trying to parse a
document that contains characters that do not belong to the character set that
correspond to the specified document encoding (e.g. the character ä is
contained
in the document which encoding is specified as "us-ascii"), the parser is
crashing.</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>Here is the code snippet:</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; try
{<BR></FONT><FONT face=Arial
size=2>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; DOMParser parser = new
DOMParser();<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
parser.parse(toParse);<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }catch (Exception ex)
{<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
ex.printStackTrace();<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }<BR></FONT></DIV>
<DIV><FONT face=Arial size=2>* "toParse" is the path to the following
document:</DIV></FONT>
<DIV>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>&lt;?xml version="1.0"
encoding="us-ascii"?&gt;<BR>&lt;Package Id="pkg1"&gt;<BR>&nbsp; &lt;!-- ä
--&gt;<BR>&nbsp;&nbsp;&nbsp;
&lt;PackageHeader&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&lt;XPDLVersion&gt;1.0&lt;/XPDLVersion&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;&nbsp;
&lt;Vendor&gt;Together&lt;/Vendor&gt;<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
&nbsp;
&lt;Created&gt;2003-08-20 10:00:49&lt;/Created&gt;<BR>&nbsp;&nbsp;&nbsp;
&lt;/PackageHeader&gt;<BR>&lt;/Package&gt;<BR></FONT></DIV>
<DIV><FONT face=Arial size=2>The parser crashes because of ä character, and I
get the following stack trace:</FONT></DIV>
<DIV><FONT face=Arial size=2>java.io.IOException: Byte "228" is not a member of
the (7-bit) ASCII character set.<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
at org.apache.xerces.impl.io.ASCIIReader.read(Unknown
Source)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
org.apache.xerces.impl.XMLEntityScanner.load(Unknown
Source)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
org.apache.xerces.impl.XML11EntityScanner.skipSpaces(Unknown
Source)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher.dispatch(Unknown
Source)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
org.apache.xerces.parsers.XML11Configuration.parse(Unknown
Source)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
org.apache.xerces.parsers.DTDConfiguration.parse(Unknown
Source)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
org.apache.xerces.parsers.XMLParser.parse(Unknown
Source)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
org.apache.xerces.parsers.DOMParser.parse(Unknown
Source)<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; at
XML.main(XML.java:25)</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>When I use Xerces2.4, everything goes
fine!</FONT></DIV>
<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>
<DIV><FONT face=Arial size=2>Regards,</FONT></DIV>
<DIV><FONT face=Arial size=2>Sasa.</DIV></FONT></BODY></HTML>