XMLSerializer and Character Encoding 2003-11-12 - By Adrian Sutton
Hi all, I'm probably dreaming here and what I'm doing is just outside of the realms of possibility but let me try anyway. :)
I'm in a situation where I need to take XML documents from different encodings (anything supported by the particular Java instance), make a few changes to them, and output them all using the same encoding. Now were these random text files that would nessecarily involve screwing up a bunch of characters because they're not supported in the target charset, however with XML, any character can be represented in entity form so the process I'd like is:
1. Create a DOM object by parsing with the input encoding for the document (taken from the <?xml ...?> declaration.
2. Manipulate the DOM
3. Use XMLSerializer to serialize the DOM in the target encoding - converting any characters not supported by the target encoding to their entity form.
Unfortunately, XMLSerializer doesn't convert the unrepresentable characters to their entity form and they wind up getting corrupted.
The particular piece of code I'm using for serializing the DOM is:
XMLSerializer ser = new XMLSerializer(new OutputFormat(doc, _outputCharset, false)); ser.setNamespaces(true); ByteArrayOutputStream out = new ByteArrayOutputStream(); ser.setOutputByteStream(out); ser.serialize(doc); String result = new String(out.toByteArray(), _outputCharset); return result;
Couple of notes: a. I realize it's pointless to convert to a byte array and back to a String, mostly I want XMLSerializer to convert the unsupported characters to entities and also the output will likely be redirected to a "real" output stream at some point.
b. I'm already very much tied into Xerces so have no qualms about using XNI or any other unsupported trickery to get what I need to do done. I don't see us changing the version of Xerces we're using any time soon.
So is there a way to escape characters that aren't supported in a particular encoding or should I extend XMLSerializer and do it myself? Am I completely insane for attempting this?
Regards,
Adrian Sutton, Software Engineer Ephox Corporation www.ephox.com
--------------------------------------------------------------------- To unsubscribe, e-mail: xerces-j-user-unsubscribe@(protected) For additional commands, e-mail: xerces-j-user-help@(protected)
|
|