How to keep entities unresolved in the result ? 2003-01-08 - By Andy Clark
Aurelien Pernoud wrote: > Everything works fine, except that the entities found are always translated > by the parser to their equivalent in the characters() method : > & becomes & > becomes space > é becomes � > This is fine, but how do I get the ref back ? I must in my case keep the > existant otherwise I get errors in the XHTML generated.
As Joe mentioned, it's probably better to allow the parser to do its job and pass the text of the entity to the application. If you're dealing with XHTML, then it should be the serializer's job to turn those characters back into their entity references.
However...
If you want to know exactly what entity references appear in the document (including character entity refs like  ) then you can turn on a feature in Xerces to notify the application of all entity refs. See the following page for information on the feature:
http://xml.apache.org/xerces2-j/features.html
But this would still pass on the characters between the start/end entity ref calls. If you don't want this, then you should extend the DOMParser or SAX- Parser class to filter out this unwanted content. However, realize that this would be a non-standard way of dealing with these references.
> Moreover, depending of encoding issue, some entities such as ’ are > translated to "?". I've set the encoding to ISO-8859-1, and didn't find > which one to use to get back the ’ ...
The appearance of a '?' is either a display issue (i.e. the font doesn't have a glyph for that char) or a serialization issue (i.e. that character can not be represented in the specified encoding). I'm guessing your problem is the latter -- please use an encoding that can represent all the Unicode characters, like UTF-8.
-- Andy Clark * andyc@(protected)
--------------------------------------------------------------------- To unsubscribe, e-mail: xerces-j-user-unsubscribe@(protected) For additional commands, e-mail: xerces-j-user-help@(protected)
|
|