Subjects
Home
VOTE Move XML Commons to Xerces
Commented: (XERCESJ 589) Bug with pattern restriction on long strings
: Xerces J 2 8 1 Release on Wednesday, September 13th
: Xerces J 2 9 0 Release on Wednesday, November 22nd
Commented: (XERCESJ 1066) Restriction+choice+substitutionGroup error
Commented: (XERCESJ 1178) Error getting prefix for an attribute with no n
Updated: (XERCESJ 1244) XMLSchemaValidator does not contribute element 's
Some consideration about the xerces DOM implementation
Updated: (XERCESJ 1066) Restriction+choice+substitutionGroup error
Commented: (XERCESJ 1227) Poor performance / OutOfMemoryError for sequenc
retain exception stack traces
Updated: (XERCESJ 1193) NPE or hang when parsing using the "continue afte
Future of NekoHTML
Commented: (XERCESJ 1203) NPE in XMLDTDProcessor
DOM Level 3 APIs for Xalan J and a new Xalan release (2 7 1)
: xml commons external 1 3 04 Release on Wednesday, November 22nd
Commented: (XERCESJ 1247) Incorrect location information on SAX when usin
XInclude exceptions how to mirror Xerces J functionality into Xerces C++?
First proposal on SoC project "Add support for the StAX (JSR 173) cursor API
: xml commons resolver 1 2 Release on Wednesday, November 22nd
Typo in RangeToken java Please check
Validator features
java lang ClassCastException when adopting Node
using the org apache xerces impl xs identity package
Updated: (XERCESJ 1257) buffer overflow in UTF8Reader for characters out
Problem with ref attributes and schema validation
Updated: (XERCESJ 122) XMLSchemaValidator does not contribute element 's d
Performance problem under load Xerces with Weblogic 9 x
remove ignored memory allocation
Commented: (XERCESJ 1177) SAXXMLStreamReader doesn 't always report namesp
Commented: (XERCESJ 977) Null pointer exception during DOM parsing
Commented: (XERCESJ 1197) Code cleanup for org apache xml serialize
Commented: (XERCESJ 1201) Initial contribution for StAX Event API
Updated: (XERCESJ 1061) Regex "$ " and "^ " characters treated as special c
Commented: (XERCESJ 1199) SAXXMLStreamReader should attempt to register a
Commented: (XERCESJ 1061) Regex "$ " and "^ " characters treated as special
Updated: (XERCESJ 589) Bug with pattern restriction on long strings
StackOverflow
xerces Range unnecessarily not garbage collectable if not detached
Updated: (XERCESJ 1178) Error getting prefix for an attribute with no nam
Bug in xs:redefine
Commented: (XERCESJ 1204) Can not set XMLEntityResolver for LSParser
Updated: (XERCESJ 1253) Prototype for SoC2007 project "Add support for th
Updated: (XERCESJ 1259) Add SteamFilter Function to SoC2007 project "Add
Assigned: (XERCESJ 444) SAXException thrown by EntityResolver is reported
Google Summer of Code 2007
Xerces J and XInclude relative path issue
Assigned: (XERCESJ 206) Stack overflow when using a schema validation
Commented: (XERCESJ 1215) Restrictions involving two levels of substituti
Closed: (XERCESJ 1203) NPE in XMLDTDProcessor
non overriding equals methoda
Resolved: (XERCESJ 1079) invalid value returned for TOTALDIGITS facet in
Xerces AS3 port
Updated: (XERCESJ 325) Regular Expression; Pattern "| " clause order de
Updated: (XERCESJ 1196) Javadoc generation fails on Java SE 5 0
Closed: (XERCESJ 1202) DTD validation on XIncluded documents when the sch
Created: (XERCESJ 1124) Nonspecific schema error message
a bug in xerces
Updated: (XERCESJ 1201) Initial contribution for StAX Event API
Closed: (XERCESJ 1254) Empty uris in targetNamespace attribute not report
Links
Home
Oracle database error code
 
Search:  
Power your search with and, or, +, -, or "some phrase" operators.
More on (Re: (XERCESJ-589) Bug with pattern restriction on long strings)

More on (Re: (XERCESJ-589) Bug with pattern restriction on long strings)

2007-07-08       - By Michael Glavassevich
Reply:     1     2  

Hi Geoff,

"Geoff M. Granum" <geoff.granum@(protected)> wrote on
07/05/2007 10:17:42 PM:

> Hello Michael,
>
> Everything works so far, with the exception of UNION, which technically
> works fine but exposes an infinite loop bug which was being broken by a
> stack overflow before.
>
> The regex: (((((boy)|(girl))[0-1][x-z]{2})?)|(man|woman)[0-1]?[y|n])*
> With Target: boy0xxwoman1ygirl1xyman
>
> Loops forever (the ending is invalid, needs [0-1]?[y|n] ). Well, forever

> being 'until you run out of memory'. I didn't notice this until I set my

> memory back to a reasonable size, as I had the -Xmx flag set to a GB
> earlier for some playing.
>
> So the options are
> a. ) Fix it. Being a serious edge case, I'm thinking this isn't really
> worth the risk. More bluntly, it would take me far more time than I want

> to spend on it to become comfortable with any fix I might produce.
>
> b. ) Ignore the case (let the system die of OOM *eventually*,
recognizing
> your CPU will be pegged for a while before this happens)
>
> c. ) Throw an exception if the stack depth gets to some crazy level (a
> million deep, for instance), perhaps adding a System variable to set the

> allowable depth.
>
> d. ) Other?

I would go with option d: open a new JIRA issue [1] to make folks aware
that this is a bug. Perhaps someone will fix it one day.

> For reference, using the standard JVM (no flags, 64MB heap space, IIRC),
I
> can check the DNA string appended to itself ~425 times before reaching a

> depth of a million stack elements. DNA_STRING is 2273 characters long.
>
> I run out of memory (heap space) around 650*DNA_STRING, with a depth of
> 1,477,450. (the string being 1,477,450 characters long... big shock).

Cool. That's leaps and bounds better than the current code. I think folks
were getting the stack overflow with around 2000 to 3000 characters
(without increasing the stack size from the default).

> DNA String runs out of memory very quickly, whereas the above regex is
> pretty slow about it because it pushes and pops a huge number of stack
> items, *eventually* achieving an overflow. The DNA string pushes items
> until it gets an answer, then backs out in the same order.
>
> Also, and oddly, the 'CAPTURE' option doesn't get hit a single time in
the
> 7000 or so regex tests provided by the W3C group. Testing it with the
full
> suite, which takes a painfully long time.
>
> I'm moving into optimization mode now that the regex tests check out;
once
> I hear back as to how the group wants the new bug handled, I'll
implement
> it and post the code for review.
>
> Oh, did you want the Test Suite code I had to implement? There's a lot
of
> generics code in it, and it's incredibly hackish, but it's free to
whoever
> wants it. It is by no means robust nor complete. IntelliJ has an 'Export

> to Eclipse' setting for modules, if that interests you.

If your test suite can be back-ported to Java 1.3 perhaps it could be
included with the other sanity/unit tests which run off the build.xml
'test' target.

> Cheers,
> --
> Geoff M. Granum
> Portland, Oregon

Thanks.

[1] http://issues.apache.org/jira/browse/XERCESJ

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@(protected)
E-mail: mrglavas@(protected)

---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@(protected)
For additional commands, e-mail: j-dev-help@(protected)