UTF-16 encoding problem and UTF8 with BOM 2003-09-30 - By David M Williams
Actually, I tried the original test example with UTF-8 encoding, with the (optional) 3 Bytes BOM at the beginning, and received the following error, only when the setEncoding method was used:
org.xml.sax.SAXParseException: Content is not allowed in prolog. at org.apache.xerces.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1172) at javax.xml.parsers.SAXParser.parse(SAXParser.java:345) at test.SimpleSaxParser.parse(SimpleSaxParser.java:43) at test.SimpleSaxParser.main(SimpleSaxParser.java:64)
Perhaps that 'setEncoding' method causes BOM handling to be skipped altogther?
David
Ravi Varanasi <rvaranasi@(protected)> 09/30/2003 01:04 PM Please respond to xerces-j-user To: xerces-j-user@(protected) cc: Subject: RE: UTF-16 encoding problem
Hi, I do not see anything wrong in your code. I have the following piece of code in my program, working perfectly fine since last couple months. I am using UTF-8 though. I do NOT think the change in encoding will make any difference, provided your file is in the format given in setEncoding method.
What I suspect is, your input file is not UTF-16 encoded but, you are setting the stream encoding as UTF-16. Check it out & let us know if that fixes the problem. I use UniPad UTF editor to check encoding.
try { InputSource ipSource = new InputSource(); ipSource.setEncoding("UTF-8"); ipSource.setByteStream( new FileInputStream( new File(inputFile) ) ); parser.parse(ipSource); return true; } catch (SAXParseException e) { e.printStackTrace(); return false; } catch (Exception e) { e.printStackTrace(); return false; }
Thanks,
Ravi Varanasi
408 517 7675 (Work) 408 394 3273 (Mobile)
--------------------------------------------------------------------- To unsubscribe, e-mail: xerces-j-user-unsubscribe@(protected) For additional commands, e-mail: xerces-j-user-help@(protected)
<br><font size=2 face="sans-serif">Actually, I tried the original test example with UTF-8 encoding, with the (optional) 3 Bytes BOM at the beginning, and received the following error, only when the setEncoding method was used:</font> <br> <br><font size=2 color=red face="Courier New">org.xml.sax.SAXParseException: Content is not allowed in prolog.</font> <br><font size=2 color=red face="Courier New"> at org.apache.xerces.parsers.AbstractSAXParser.parse(AbstractSAXParser.java :1172)</font> <br><font size=2 color=red face="Courier New"> at javax.xml.parsers.SAXParser.parse(SAXParser.java:345)</font> <br><font size=2 color=red face="Courier New"> at test.SimpleSaxParser.parse(SimpleSaxParser.java:43)</font> <br><font size=2 color=red face="Courier New"> at test.SimpleSaxParser.main(SimpleSaxParser.java:64)</font> <br> <br> <br><font size=2 face="sans-serif">Perhaps that 'setEncoding' method causes BOM handling to be skipped altogther? </font> <br> <br><font size=2 face="sans-serif">David</font> <br> <br> <br> <br> <br> <br> <table width=100%> <tr valign=top> <td> <td><font size=1 face="sans-serif"><b>Ravi Varanasi <rvaranasi@(protected) ></b></font> <p><font size=1 face="sans-serif">09/30/2003 01:04 PM</font> <br><font size=1 face="sans-serif">Please respond to xerces-j-user</font> <td><font size=1 face="Arial"> </font> <br><font size=1 face="sans-serif"> To: xerces-j-user@(protected)</font> <br><font size=1 face="sans-serif"> cc: </font> <br><font size=1 face="sans-serif"> Subject: RE: UTF-16 encoding problem</font></table> <br> <br> <br><font size=2><tt><br> <br> <br> <br> <br> Hi,<br> I do not see anything wrong in your code. I have the following piece<br> of code in my program, working perfectly fine since last couple months. I<br> am using UTF-8 though. I do NOT think the change in encoding will make any<br> difference, provided your file is in the format given in setEncoding<br> method.<br> <br> What I suspect is, your input file is not UTF-16 encoded but, you are<br> setting the stream encoding as UTF-16. Check it out & let us know if that<br> fixes the problem. I use UniPad UTF editor to check encoding.<br> <br> try {<br> InputSource ipSource = new InputSource();<br> ipSource.setEncoding("UTF-8");<br> ipSource.setByteStream( new FileInputStream( new File (inputFile) ) );<br> parser.parse(ipSource);<br> return true;<br> } catch (SAXParseException e) {<br> e.printStackTrace();<br> return false;<br> } catch (Exception e) {<br> e.printStackTrace();<br> return false;<br> }<br> <br> <br> <br> Thanks,<br> <br> Ravi Varanasi<br> <br> 408 517 7675 (Work)<br> 408 394 3273 (Mobile)<br> <br> <br> ---------------------------------------------------------------------<br> To unsubscribe, e-mail: xerces-j-user-unsubscribe@(protected)<br> For additional commands, e-mail: xerces-j-user-help@(protected)<br> <br> </tt></font> <br>
|
|