Subjects
Home
VOTE Move XML Commons to Xerces
Commented: (XERCESJ 589) Bug with pattern restriction on long strings
: Xerces J 2 8 1 Release on Wednesday, September 13th
: Xerces J 2 9 0 Release on Wednesday, November 22nd
Commented: (XERCESJ 1066) Restriction+choice+substitutionGroup error
Commented: (XERCESJ 1178) Error getting prefix for an attribute with no n
Updated: (XERCESJ 1244) XMLSchemaValidator does not contribute element 's
Some consideration about the xerces DOM implementation
Updated: (XERCESJ 1066) Restriction+choice+substitutionGroup error
Commented: (XERCESJ 1227) Poor performance / OutOfMemoryError for sequenc
retain exception stack traces
Updated: (XERCESJ 1193) NPE or hang when parsing using the "continue afte
Future of NekoHTML
Commented: (XERCESJ 1203) NPE in XMLDTDProcessor
DOM Level 3 APIs for Xalan J and a new Xalan release (2 7 1)
: xml commons external 1 3 04 Release on Wednesday, November 22nd
Commented: (XERCESJ 1247) Incorrect location information on SAX when usin
XInclude exceptions how to mirror Xerces J functionality into Xerces C++?
First proposal on SoC project "Add support for the StAX (JSR 173) cursor API
: xml commons resolver 1 2 Release on Wednesday, November 22nd
Typo in RangeToken java Please check
Validator features
java lang ClassCastException when adopting Node
using the org apache xerces impl xs identity package
Updated: (XERCESJ 1257) buffer overflow in UTF8Reader for characters out
Problem with ref attributes and schema validation
Updated: (XERCESJ 122) XMLSchemaValidator does not contribute element 's d
Performance problem under load Xerces with Weblogic 9 x
remove ignored memory allocation
Commented: (XERCESJ 1177) SAXXMLStreamReader doesn 't always report namesp
Commented: (XERCESJ 977) Null pointer exception during DOM parsing
Commented: (XERCESJ 1197) Code cleanup for org apache xml serialize
Commented: (XERCESJ 1201) Initial contribution for StAX Event API
Updated: (XERCESJ 1061) Regex "$ " and "^ " characters treated as special c
Commented: (XERCESJ 1199) SAXXMLStreamReader should attempt to register a
Commented: (XERCESJ 1061) Regex "$ " and "^ " characters treated as special
Updated: (XERCESJ 589) Bug with pattern restriction on long strings
StackOverflow
xerces Range unnecessarily not garbage collectable if not detached
Updated: (XERCESJ 1178) Error getting prefix for an attribute with no nam
Bug in xs:redefine
Commented: (XERCESJ 1204) Can not set XMLEntityResolver for LSParser
Updated: (XERCESJ 1253) Prototype for SoC2007 project "Add support for th
Updated: (XERCESJ 1259) Add SteamFilter Function to SoC2007 project "Add
Assigned: (XERCESJ 444) SAXException thrown by EntityResolver is reported
Google Summer of Code 2007
Xerces J and XInclude relative path issue
Assigned: (XERCESJ 206) Stack overflow when using a schema validation
Commented: (XERCESJ 1215) Restrictions involving two levels of substituti
Closed: (XERCESJ 1203) NPE in XMLDTDProcessor
non overriding equals methoda
Resolved: (XERCESJ 1079) invalid value returned for TOTALDIGITS facet in
Xerces AS3 port
Updated: (XERCESJ 325) Regular Expression; Pattern "| " clause order de
Updated: (XERCESJ 1196) Javadoc generation fails on Java SE 5 0
Closed: (XERCESJ 1202) DTD validation on XIncluded documents when the sch
Created: (XERCESJ 1124) Nonspecific schema error message
a bug in xerces
Updated: (XERCESJ 1201) Initial contribution for StAX Event API
Closed: (XERCESJ 1254) Empty uris in targetNamespace attribute not report
Links
Home
Oracle database error code
 
Search:  
Power your search with and, or, +, -, or "some phrase" operators.
UTF-8 encoding errors are not always detected

UTF-8 encoding errors are not always detected

2004-02-20       - By Michael Glavassevich
Reply:     1     2     3     4     5  

Hi Bob,

I'm not sure what Xerces 1.4.2 was doing but Xerces2 has it's own
specialized UTF-8 reader which is used instead of the one provided by
Java. It will throw an exception when it encounters malformed UTF-8 byte
sequences.

On Fri, 20 Feb 2004, Bob Foster wrote:

> Encoding detection happens when the document is opened; after that, a
> conversion error may have caused a well-formed error, but it cannot be
> identified as a charset problem.
>
> Most likely the parser isn't detecting the non-UTF-8 characters because
> Java isn't. I have seen mention that you can ask Java's encoding
> converters to throw if they encounter invalid character sequences? Does
> anyone know if this is true? And if so, why doesn't Xerces do it?
>
> Bob
>
> DeSmet_Ringo@(protected) wrote:
> > Maybe because the bad character is in the comment. I suspect the parser
> > skips everything until the closing comment tag. What happens when the bad
> > character is in an attribute value for example?
> >
> > Ringo
> >
> > -----Original Message-----
> > From: Berchner Matthias ICM Berlin
> > [mailto:matthias.berchner@(protected)]
> > Sent: vrijdag 20 februari 2004 15:15
> > To: 'xerces-j-user@(protected)'
> > Subject: UTF-8 encoding errors are not always detected
> >
> >
> > Hi,
> >
> > I'm using Xerces 1.4.2, unfortunally  UTF-8 coding errors are not always
> > detected:
> >
> > Example:
> >
> > --------------------------------------------
> > <?xml version="1.0" encoding="UTF-8"?>
> > <Project>
> >   <!-- f�r ONC -->
> > </Project>
> > --------------------------------------------
> >
> > <!-- f�r ONC --> correponds to
> >   hex 3C 21 2D 2D 20 66 FC 72 20 4F 4E 43 20 2D 2D 3E
> >
> > Non-UTF-8 character: � <-> FC
> >
> >
> > Kind Regards,
> > Matthias
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-user-unsubscribe@(protected)
> For additional commands, e-mail: xerces-j-user-help@(protected)

---------------------------
Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@(protected)
E-mail: mrglavas@(protected)

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-user-unsubscribe@(protected)
For additional commands, e-mail: xerces-j-user-help@(protected)