encoding and saxparser 2003-01-20 - By Voytenko, Dimitry
Hi Joseph,
Could you change couple lines of your code and try to run it again?
old >> ByteArrayOutputStream bos = new ByteArrayOutputStream(); new >> StringWriter bos = new StringWriter ();
old >> public java.io.PrintStream out = System.out; new >> public java.io.PrintWriter out; // = new PrintWriter (System.out);
old >> public TestContentHandler (java.io.ByteArrayOutputStream bos){ old >> out = new java.io.PrintStream(bos); new >> public TestContentHandler (StringWriter bos){ new >> out = new PrintWriter (bos);
I think the problem is that in the fragment public void characters(char[] ch, int start, int length){ out.print(new String(ch,start,length)); }
You implicitly convert string to bytes (using default encoding). Then you convert bytes to string again, when you implicitly call bos.toString(), using default encoding again. And only then you output this string to console, using console/output encoding. You can check your default character encoding using: System.err.println (sun.io.CharToByteConverter.getDefault());
Your default encoding might be different from ISO-8859-1 and may not support character 0xE9. Your console/output encoding apparently supports this character, since you can see it in the second case. And you don't use extra byte-char-byte conversions in the second case, that's why this is first thing to suspect.
In either case, it's dangerous to use default encodings in this case, b/c you might encounter deployment problems. Plus, you have several extra conversions, which don't come free and absolutely excessive.
In the conclusion I can say, that I ran your example (with my changes) using Xalan 2.4.1 and Xerces 2.2.1 and everythign was just fine in both cases.
Thanks, Dimitry
-----Original Message----- From: Joseph Shraibman [mailto:jks@(protected)] Sent: Monday, January 20, 2003 17:44 To: xerces-j-user@(protected) Subject: Re: encoding and saxparser
OK here is what I used to test:
My jsp: =============================== <% response.setContentType("text/plain"); %> <%@ page import="java.io.*" %> <%@ page import="org.w3c.dom.Document" %> <%@ page import="org.apache.xerces.parsers.*" %> <%@ page import="org.xml.sax.*" %> <%@ page import="org.apache.xerces.dom.*" %> <%@ page import="javax.xml.transform.stream.*" %> <%@ page import="javax.xml.transform.dom.*" %> <%@ page import="javax.xml.transform.*" %>
D�cio: <%= "D�cio" %>
<%
{ File file = new java.io.File("/tmp/temp1.xml"); String xml_str = com.xtenit.control.SQLUtils.getFromFile(file);
ByteArrayOutputStream bos = new ByteArrayOutputStream(); TestContentHandler tch = new TestContentHandler(bos);
SAXParser sp = new SAXParser(); sp.setFeature("http://apache.org/xml/features/allow-java-encodings",true); sp.setContentHandler(tch); InputSource is = null; is = new InputSource("/tmp/temp1.xml"); sp.parse(is);
%>is encoding is: <%= is.getEncoding() %> xml: <br> <%= bos %><br> now encoding is: <%= is.getEncoding() %>
================================================================<% } { File file = new java.io.File("/tmp/temp1.xml"); DOMParser _dp = new DOMParser(); InputStream is = new FileInputStream(file);
_dp.parse(new InputSource(is)); Document doc = _dp.getDocument() ;
StringWriter sw = new StringWriter(); TransformerFactory.newInstance().newTransformer().transform(new DOMSource(doc), new StreamResult(sw)); %> xml: <br> <%= sw %><br>
<% } %> ===================== end of jsp TestContentHandler.java:
package com.xtenit.xml;
/** * TestContentHandler.java * * * Created: Mon Jan 13 19:59:00 2003 * * @(protected) Joseph Shraibman * @(protected) */ import org.xml.sax.*; import javax.xml.transform.stream.*; import javax.xml.transform.sax.*; import javax.xml.transform.*; import org.apache.xerces.parsers.*;
public class TestContentHandler implements org.xml.sax.ContentHandler{
public java.io.PrintStream out = System.out;
public void endDocument(){} public void startElement(java.lang.String namespaceURI, java.lang.String localName, java.lang.String qName, Attributes atts){ StringBuffer sb = new StringBuffer(); sb.append('<'); if (namespaceURI != null && namespaceURI.length() > 0){ sb.append(namespaceURI+':'); } sb.append(localName); for ( int i = 0, atts_len = atts.getLength() ; i < atts_len ; i++ ) sb.append(' ').append(atts.getLocalName(i)).append("=\"").append(atts.getValue(i)).appen d('"'); sb.append('>'); out.print(sb.toString()); } public void characters(char[] ch, int start, int length){ out.print(new String(ch,start,length)); } public void endElement(java.lang.String namespaceURI, java.lang.String localName, java.lang.String qName){ StringBuffer sb = new StringBuffer(); sb.append("</"); if (namespaceURI != null && namespaceURI.length() > 0 ){ sb.append(namespaceURI+':'); } sb.append(localName+">"); out.print(sb.toString()); } public void endPrefixMapping(java.lang.String prefix){} public void ignorableWhitespace(char[] ch, int start, int length){} public void processingInstruction(java.lang.String target, java.lang.String data){} public void setDocumentLocator(Locator locator){} public void skippedEntity(java.lang.String name){ if (true) out.println("DEGUG: skipped Entity: '"+name+"'"); } public void startDocument(){} public void startPrefixMapping(java.lang.String prefix, java.lang.String uri){}
public TestContentHandler (){
} public TestContentHandler (java.io.ByteArrayOutputStream bos){ out = new java.io.PrintStream(bos); }
public static void main(String[] args)throws Exception{ SAXParser sp = new SAXParser(); TestContentHandler xc = new TestContentHandler(); sp.setContentHandler(xc); String filename = args[0]; InputSource is = null; if (filename.equals("-")) is = new InputSource(System.in); //use standard input else is = new InputSource(new java.io. FileReader(filename)); sp.parse(is); }
}// TestContentHandler ===============================my xml file: <?xml version="1.0" encoding="ISO-8859-1"?> <data> <firstname>D�cio</firstname> </data>
My xerces is 2.2.1. In my test the first one does not work but the last one does. You can see what the jsp looks like at http://xis.xtenit.com/temp.jsp (except that has an old version of TestContentHandler that puts colons in the output.
Voytenko, Dimitry wrote: > Hi Joseph, > > I'm afraid nobody will be able to answer this w/o seeing the your code (the > one with the SAXParser). > So if you could send it (or just a fragment when you initialize SAXParser, > start parsing and process the SAX events) it would be hepful. > Plus, include Xerces version information. > > thanks, > Dimitry > > -----Original Message----- > From: Joseph Shraibman [mailto:jks@(protected)] > Sent: Monday, January 20, 2003 11:58 > To: xerces-j-user@(protected) > Subject: Re: encoding and saxparser > > > neilg@(protected) wrote: > >>Hi Joseph, >> >>I had a feeling something like that might have been the case. I'll bet >>there's some difference in the way you're viewing the SAX output as >>compared to the DOM output. > > > No, I made sure that that isn't a problem.
--------------------------------------------------------------------- To unsubscribe, e-mail: xerces-j-user-unsubscribe@(protected) For additional commands, e-mail: xerces-j-user-help@(protected)
_____________________________________________________ Sector Data, LLC, is not affiliated with Sector, Inc., or SIAC
--------------------------------------------------------------------- To unsubscribe, e-mail: xerces-j-user-unsubscribe@(protected) For additional commands, e-mail: xerces-j-user-help@(protected)
|
|