xerces always escapes ampersands 2003-08-04 - By Williams, Erskine BGI SF
I'm finding that xerces is always escaping ampersands, even when they are a part of a character reference. For example, if I want to define a text element like so: <someText>€</someText>, (where "€" is the hexadecimal entity reference for the euro "EUR" sign) when xerces writes this out to a file, I invariably get: "<someText>&#x20AC;</someText>" Xerces is always escaping ampersands into the entity ref "&"
Perhaps my confusion arises out of poor understanding of xml, but I should think that xerces would only escape ampersands that aren't a part of a valid entity reference, i.e., if an ampersand is immediately followed by a pound (#) sign, it should leave it alone. Is there a more reliable way to reference extended ascii characters in xml, so that they will pass through xerces unmolested?
I use castor and dom4j to manipulate my xml in my application, but these both use Xerces under the covers if I am not mistaken. Some simple test cases are below. Any guidance is very much appreciated. Cheers, Erskine
/*********************** * Castor example * ************************/ import java.io.FileWriter; import java.io.File;
import org.exolab.castor.xml.Marshaller;
public class CastorTest {
public static void main(String [] args) {
//populate an arbitrary data object with special characters Factsheet fs = new Factsheet(); ContentSections cs = new ContentSections(); Content c = new Content(); c.addPara("£ © ®"); cs.addContent(c); fs.setContentSections(cs);
//now use the castor marshalling framework to write the data object out to xml try { FileWriter fw = new FileWriter(new File("tmp.xml")); Marshaller m = new Marshaller(fw); m.setEncoding("iso-8859-1"); m.marshal(fs); } catch (Exception e) { e.printStackTrace(); } } }
The resulting xml file looks like:
<?xml version="1.0" encoding="iso-8859-1"?> <factsheet> <content> <para>&#xA3; &#xA9; &#xAE;</para> </content> </factsheet>
/******************************** * * Dom4J example * ********************************/ import org.dom4j.Document; import org.dom4j.DocumentHelper; import org.dom4j.Element;
import java.io.FileWriter; import java.io.IOException; import java.io.Writer;
public class JDomTest {
public static void main(String [] args) { Document document = DocumentHelper.createDocument(); Element root = document.addElement("root"); Element test = root.addElement("test").addText("£,®"); try { Writer w = new FileWriter("tmp.xml"); document.write(w); w.close(); } catch (IOException e) { e.printStackTrace(); } } }
The result document is:
<?xml version="1.0" encoding="UTF-8"?> <root> <test>&#xA3;,&#xAE;</test> </root>
--------------------------------------------------------------------- To unsubscribe, e-mail: xerces-j-user-unsubscribe@(protected) For additional commands, e-mail: xerces-j-user-help@(protected)
|
|