Subjects
Home
VOTE Move XML Commons to Xerces
Commented: (XERCESJ 589) Bug with pattern restriction on long strings
: Xerces J 2 8 1 Release on Wednesday, September 13th
: Xerces J 2 9 0 Release on Wednesday, November 22nd
Commented: (XERCESJ 1066) Restriction+choice+substitutionGroup error
Commented: (XERCESJ 1178) Error getting prefix for an attribute with no n
Updated: (XERCESJ 1244) XMLSchemaValidator does not contribute element 's
Some consideration about the xerces DOM implementation
Updated: (XERCESJ 1066) Restriction+choice+substitutionGroup error
Commented: (XERCESJ 1227) Poor performance / OutOfMemoryError for sequenc
retain exception stack traces
Updated: (XERCESJ 1193) NPE or hang when parsing using the "continue afte
Future of NekoHTML
Commented: (XERCESJ 1203) NPE in XMLDTDProcessor
DOM Level 3 APIs for Xalan J and a new Xalan release (2 7 1)
: xml commons external 1 3 04 Release on Wednesday, November 22nd
Commented: (XERCESJ 1247) Incorrect location information on SAX when usin
XInclude exceptions how to mirror Xerces J functionality into Xerces C++?
First proposal on SoC project "Add support for the StAX (JSR 173) cursor API
: xml commons resolver 1 2 Release on Wednesday, November 22nd
Typo in RangeToken java Please check
Validator features
java lang ClassCastException when adopting Node
using the org apache xerces impl xs identity package
Updated: (XERCESJ 1257) buffer overflow in UTF8Reader for characters out
Problem with ref attributes and schema validation
Updated: (XERCESJ 122) XMLSchemaValidator does not contribute element 's d
Performance problem under load Xerces with Weblogic 9 x
remove ignored memory allocation
Commented: (XERCESJ 1177) SAXXMLStreamReader doesn 't always report namesp
Commented: (XERCESJ 977) Null pointer exception during DOM parsing
Commented: (XERCESJ 1197) Code cleanup for org apache xml serialize
Commented: (XERCESJ 1201) Initial contribution for StAX Event API
Updated: (XERCESJ 1061) Regex "$ " and "^ " characters treated as special c
Commented: (XERCESJ 1199) SAXXMLStreamReader should attempt to register a
Commented: (XERCESJ 1061) Regex "$ " and "^ " characters treated as special
Updated: (XERCESJ 589) Bug with pattern restriction on long strings
StackOverflow
xerces Range unnecessarily not garbage collectable if not detached
Updated: (XERCESJ 1178) Error getting prefix for an attribute with no nam
Bug in xs:redefine
Commented: (XERCESJ 1204) Can not set XMLEntityResolver for LSParser
Updated: (XERCESJ 1253) Prototype for SoC2007 project "Add support for th
Updated: (XERCESJ 1259) Add SteamFilter Function to SoC2007 project "Add
Assigned: (XERCESJ 444) SAXException thrown by EntityResolver is reported
Google Summer of Code 2007
Xerces J and XInclude relative path issue
Assigned: (XERCESJ 206) Stack overflow when using a schema validation
Commented: (XERCESJ 1215) Restrictions involving two levels of substituti
Closed: (XERCESJ 1203) NPE in XMLDTDProcessor
non overriding equals methoda
Resolved: (XERCESJ 1079) invalid value returned for TOTALDIGITS facet in
Xerces AS3 port
Updated: (XERCESJ 325) Regular Expression; Pattern "| " clause order de
Updated: (XERCESJ 1196) Javadoc generation fails on Java SE 5 0
Closed: (XERCESJ 1202) DTD validation on XIncluded documents when the sch
Created: (XERCESJ 1124) Nonspecific schema error message
a bug in xerces
Updated: (XERCESJ 1201) Initial contribution for StAX Event API
Closed: (XERCESJ 1254) Empty uris in targetNamespace attribute not report
Links
Home
Oracle database error code
 
Search:  
Power your search with and, or, +, -, or "some phrase" operators.
UTF-16 encoding problem and UTF8 with BOM

UTF-16 encoding problem and UTF8 with BOM

2003-09-30       - By David M Williams
Reply:     1     2     3     4  

Actually, I tried the original test example with UTF-8 encoding, with the
(optional) 3 Bytes BOM at the beginning, and received the following error,
only when the setEncoding method was used:

org.xml.sax.SAXParseException: Content is not allowed in prolog.
       at
org.apache.xerces.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1172)
       at javax.xml.parsers.SAXParser.parse(SAXParser.java:345)
       at test.SimpleSaxParser.parse(SimpleSaxParser.java:43)
       at test.SimpleSaxParser.main(SimpleSaxParser.java:64)


Perhaps that 'setEncoding' method causes BOM handling to be skipped
altogther?

David







Ravi Varanasi <rvaranasi@(protected)>
09/30/2003 01:04 PM
Please respond to xerces-j-user

       To:     xerces-j-user@(protected)
       cc:
       Subject:        RE: UTF-16 encoding problem







Hi,
      I do not see anything wrong in your code. I have the following
piece
of code in my program, working perfectly fine since last couple months. I
am using UTF-8 though. I do NOT think the change in encoding will make any
difference, provided your file is in the format given in setEncoding
method.

What I suspect is, your input file is not UTF-16 encoded but, you are
setting the stream encoding as UTF-16. Check it out & let us know if that
fixes the problem. I use UniPad UTF editor to check encoding.

   try {
     InputSource ipSource = new InputSource();
     ipSource.setEncoding("UTF-8");
     ipSource.setByteStream( new FileInputStream( new File(inputFile) )
);
     parser.parse(ipSource);
     return true;
   } catch (SAXParseException e) {
     e.printStackTrace();
     return false;
   } catch (Exception e) {
     e.printStackTrace();
     return false;
   }



Thanks,

Ravi Varanasi

408 517 7675 (Work)
408 394 3273 (Mobile)


---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@(protected)
For additional commands, e-mail: xerces-j-user-help@(protected)




<br><font size=2 face="sans-serif">Actually, I tried the original test
example with UTF-8 encoding, with the (optional) 3 Bytes BOM at the beginning,
and received the following error, only when the setEncoding method was
used:</font>
<br>
<br><font size=2 color=red face="Courier New">org.xml.sax.SAXParseException:
Content is not allowed in prolog.</font>
<br><font size=2 color=red face="Courier New">&nbsp; &nbsp; &nbsp; &nbsp;
at org.apache.xerces.parsers.AbstractSAXParser.parse(AbstractSAXParser.java
:1172)</font>
<br><font size=2 color=red face="Courier New">&nbsp; &nbsp; &nbsp; &nbsp;
at javax.xml.parsers.SAXParser.parse(SAXParser.java:345)</font>
<br><font size=2 color=red face="Courier New">&nbsp; &nbsp; &nbsp; &nbsp;
at test.SimpleSaxParser.parse(SimpleSaxParser.java:43)</font>
<br><font size=2 color=red face="Courier New">&nbsp; &nbsp; &nbsp; &nbsp;
at test.SimpleSaxParser.main(SimpleSaxParser.java:64)</font>
<br>
<br>
<br><font size=2 face="sans-serif">Perhaps that 'setEncoding' method causes
BOM handling to be skipped altogther? </font>
<br>
<br><font size=2 face="sans-serif">David</font>
<br>
<br>
<br>
<br>
<br>
<br>
<table width=100%>
<tr valign=top>
<td>
<td><font size=1 face="sans-serif"><b>Ravi Varanasi &lt;rvaranasi@(protected)
&gt;</b></font>
<p><font size=1 face="sans-serif">09/30/2003 01:04 PM</font>
<br><font size=1 face="sans-serif">Please respond to xerces-j-user</font>
<td><font size=1 face="Arial">&nbsp; &nbsp; &nbsp; &nbsp; </font>
<br><font size=1 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; To:
&nbsp; &nbsp; &nbsp; &nbsp;xerces-j-user@(protected)</font>
<br><font size=1 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; cc:
&nbsp; &nbsp; &nbsp; &nbsp;</font>
<br><font size=1 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; Subject:
&nbsp; &nbsp; &nbsp; &nbsp;RE: UTF-16 encoding problem</font></table>
<br>
<br>
<br><font size=2><tt><br>
<br>
<br>
<br>
<br>
Hi,<br>
&nbsp; &nbsp; &nbsp; I do not see anything wrong in your code. I have
the following piece<br>
of code in my program, working perfectly fine since last couple months.
I<br>
am using UTF-8 though. I do NOT think the change in encoding will make
any<br>
difference, provided your file is in the format given in setEncoding<br>
method.<br>
<br>
What I suspect is, your input file is not UTF-16 encoded but, you are<br>
setting the stream encoding as UTF-16. Check it out &amp; let us know if
that<br>
fixes the problem. I use UniPad UTF editor to check encoding.<br>
<br>
&nbsp; &nbsp;try {<br>
&nbsp; &nbsp; &nbsp;InputSource ipSource = new InputSource();<br>
&nbsp; &nbsp; &nbsp;ipSource.setEncoding(&quot;UTF-8&quot;);<br>
&nbsp; &nbsp; &nbsp;ipSource.setByteStream( new FileInputStream( new File
(inputFile)
) );<br>
&nbsp; &nbsp; &nbsp;parser.parse(ipSource);<br>
&nbsp; &nbsp; &nbsp;return true;<br>
&nbsp; &nbsp;} catch (SAXParseException e) {<br>
&nbsp; &nbsp; &nbsp;e.printStackTrace();<br>
&nbsp; &nbsp; &nbsp;return false;<br>
&nbsp; &nbsp;} catch (Exception e) {<br>
&nbsp; &nbsp; &nbsp;e.printStackTrace();<br>
&nbsp; &nbsp; &nbsp;return false;<br>
&nbsp; &nbsp;}<br>
<br>
<br>
<br>
Thanks,<br>
<br>
Ravi Varanasi<br>
<br>
408 517 7675 (Work)<br>
408 394 3273 (Mobile)<br>
<br>
<br>
---------------------------------------------------------------------<br>
To unsubscribe, e-mail: xerces-j-user-unsubscribe@(protected)<br>
For additional commands, e-mail: xerces-j-user-help@(protected)<br>
<br>
</tt></font>
<br>