Subjects
Home
VOTE Move XML Commons to Xerces
Commented: (XERCESJ 589) Bug with pattern restriction on long strings
: Xerces J 2 8 1 Release on Wednesday, September 13th
: Xerces J 2 9 0 Release on Wednesday, November 22nd
Commented: (XERCESJ 1066) Restriction+choice+substitutionGroup error
Commented: (XERCESJ 1178) Error getting prefix for an attribute with no n
Updated: (XERCESJ 1244) XMLSchemaValidator does not contribute element 's
Some consideration about the xerces DOM implementation
Updated: (XERCESJ 1066) Restriction+choice+substitutionGroup error
Commented: (XERCESJ 1227) Poor performance / OutOfMemoryError for sequenc
retain exception stack traces
Updated: (XERCESJ 1193) NPE or hang when parsing using the "continue afte
Future of NekoHTML
Commented: (XERCESJ 1203) NPE in XMLDTDProcessor
DOM Level 3 APIs for Xalan J and a new Xalan release (2 7 1)
: xml commons external 1 3 04 Release on Wednesday, November 22nd
Commented: (XERCESJ 1247) Incorrect location information on SAX when usin
XInclude exceptions how to mirror Xerces J functionality into Xerces C++?
First proposal on SoC project "Add support for the StAX (JSR 173) cursor API
: xml commons resolver 1 2 Release on Wednesday, November 22nd
Typo in RangeToken java Please check
Validator features
java lang ClassCastException when adopting Node
using the org apache xerces impl xs identity package
Updated: (XERCESJ 1257) buffer overflow in UTF8Reader for characters out
Problem with ref attributes and schema validation
Updated: (XERCESJ 122) XMLSchemaValidator does not contribute element 's d
Performance problem under load Xerces with Weblogic 9 x
remove ignored memory allocation
Commented: (XERCESJ 1177) SAXXMLStreamReader doesn 't always report namesp
Commented: (XERCESJ 977) Null pointer exception during DOM parsing
Commented: (XERCESJ 1197) Code cleanup for org apache xml serialize
Commented: (XERCESJ 1201) Initial contribution for StAX Event API
Updated: (XERCESJ 1061) Regex "$ " and "^ " characters treated as special c
Commented: (XERCESJ 1199) SAXXMLStreamReader should attempt to register a
Commented: (XERCESJ 1061) Regex "$ " and "^ " characters treated as special
Updated: (XERCESJ 589) Bug with pattern restriction on long strings
StackOverflow
xerces Range unnecessarily not garbage collectable if not detached
Updated: (XERCESJ 1178) Error getting prefix for an attribute with no nam
Bug in xs:redefine
Commented: (XERCESJ 1204) Can not set XMLEntityResolver for LSParser
Updated: (XERCESJ 1253) Prototype for SoC2007 project "Add support for th
Updated: (XERCESJ 1259) Add SteamFilter Function to SoC2007 project "Add
Assigned: (XERCESJ 444) SAXException thrown by EntityResolver is reported
Google Summer of Code 2007
Xerces J and XInclude relative path issue
Assigned: (XERCESJ 206) Stack overflow when using a schema validation
Commented: (XERCESJ 1215) Restrictions involving two levels of substituti
Closed: (XERCESJ 1203) NPE in XMLDTDProcessor
non overriding equals methoda
Resolved: (XERCESJ 1079) invalid value returned for TOTALDIGITS facet in
Xerces AS3 port
Updated: (XERCESJ 325) Regular Expression; Pattern "| " clause order de
Updated: (XERCESJ 1196) Javadoc generation fails on Java SE 5 0
Closed: (XERCESJ 1202) DTD validation on XIncluded documents when the sch
Created: (XERCESJ 1124) Nonspecific schema error message
a bug in xerces
Updated: (XERCESJ 1201) Initial contribution for StAX Event API
Closed: (XERCESJ 1254) Empty uris in targetNamespace attribute not report
Links
Home
Oracle database error code
 
Search:  
Power your search with and, or, +, -, or "some phrase" operators.
Request for your Regular Expressions (Re: (XERCESJ-589) Bug with pattern res

Request for your Regular Expressions (Re: (XERCESJ-589) Bug with pattern res

2007-06-25       - By Michael Glavassevich

Hi Geoff,

The W3C test suite contains many regex tests, particularly this large
bucket [2] of tests contributed last year. That should give you a pretty
good selection though beware that some of the tests are invalid. The known
problems are documented in the W3C's Bugzilla here [3].

As for the code, one thing that may not be obvious is that it needs to be
thread-safe. This is because the RegularExpression objects are cached in
the schema grammar which could be shared with several parsers and
validators. To avoid having many large synchronized blocks, the matching
code keeps its state local to the call stack. Hoping that's the approach
you've been taking.

Thanks.

[1] http://www.w3.org/XML/2004/xml-schema-test-suite/index.html#releases
[2]
http://dev.w3.org/cvsweb/XML/xml-schema-test-suite/2004-01-14/xmlschema2006-11
-06/msMeta/Regex_w3c.xml
[3]
http://www.w3.org/Bugs/Public/buglist.cgi?query_format=specific&order=relevance
+desc&bug_status=__open__&product=XML+Schema+Test+Suite&content=

Thanks.

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: mrglavas@(protected)
E-mail: mrglavas@(protected)

"Geoff M. Granum" <geoff.granum@(protected)> wrote on
06/25/2007 04:15:27 AM:

> (If you don't care about the particulars, but have some Regex's you can
> contribute, jump to the code bit. Thanks)
>
> I have two implementations to test; one is a (somewhat) naive linked
list
> stack manager, the other is (as yet) still recursive.
>
> The former works, but I put it together as a proof of concept and don't
> trust it much. Fifty-two return points in one method is a tad much.
> Implemented as a raw java.util.Stack is ten times as slow as the
original,
> and creating a private static LocalStack class as a LinkedList is twice
as
> slow.
>
> Though, 10K runs of the first thousand chars of the two example regex
> patterns take ~1.2 and 2.6 seconds, respectively. So .12ms and .26ms per

> run. I'm rather set against ANY performance decrement, or I'd have just
> verified that code and moved on.
>
> The latter implementation is a refactor of the method to a single point
of
> exit. THAT goal is working, now I have to make sure that I can add
values
> to an internal stack manager without blowing away any state -- some of
the
> CASE statements are a mite obtuse, and I don't like using breaks much.
> Breaks also seem to affect the ability of the optimizer to do its job,
as
> the last CASE I modified (op.CLOSURE) gave a 10% performance boost
without
> it. Although I'm suspicious, as it's late and now the stack overflows
> somewhat (ok, a lot) earlier than before. I did add a number of
variables,
> so it's possible I made no mistake in the logic (I'd better not have!).
>
> --- The request part ---
>
> Regardless of the final form, I need to populate a test library:
>
> I have a few regular expressions lying around, and I figure I'll parse
in
> a few of my XML files and modify the RegularExpression class to dump
> anything it sees to a file... I still doubt I'd have more than 20, and
> none of them shockingly complex.
>
> So if you could send me your favorite regular expressions, along with a
> couple of stings to match them against (some pass, some fail, but
indicate
> which), it would be a big help.
>
> Even better, if you could format them like this sample:
>
> testCases.add(new TestCase(
>    "Overall description",
>    "Your Regex Pattern",
>    new SubCase("A description", shouldPass, "matchString" ),
>    new SubCase("A description2", shouldPass, "matchString2" ),
>    ... more SubCases ...
> ));
>
> I would be able to paste them straight into the unit test and run them.
> The SubCase argument uses varArgs, so add as many as you want/will. Feel

> free to add your 'contributed by:' to the overall description area for
> credit... Though I'd remind you not to include a parsable (or any, lest
> random-someone ask you for help later) e-mail address on this list, as
it
> is public and archived.
>
> My own direct e-mail address is (my first name @ my last name).biz. And
if
> someone has written a parser for THAT, they can have it.
>
> The more complex your tests the better, for the beat down. Tailored
> regex's would be grand for focused testing (e.g. the simplest lookahead,

> lookbehind, singleline, multiline, etc). But I figure that's asking for
> real work.
>
> Also, or instead, if you have a 'regular expression rich' schema and
> conforming xml file that you can send (think 'might become public'), I
> should be able to parse those out without much trouble.
>
> And yes (obviously), my test library uses 1.5 features... I'll convert
it
> if the changes are approved for commit. Keeps me sane.
>
> Of course the changes to RegularExpression are using JDK 1.3 as a
target,
> as that is the lowest I've available. My memory of the differences
between
> 1.2 and 1.3 are fuzzy, but I don't think anything I'm using has changed
> since 1.0. My only real concern is that my JVM has a better optimizer
and
> could be hiding poor performance that I induce.
>
>
> Thanks much,
> --
> Geoff M. Granum
> 760-534-1636
> Portland, Oregon
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-dev-unsubscribe@(protected)
> For additional commands, e-mail: j-dev-help@(protected)


---------------------------------------------------------------------
To unsubscribe, e-mail: j-dev-unsubscribe@(protected)
For additional commands, e-mail: j-dev-help@(protected)