regular expressions in java
I have written a package named "pat" to do regular expressions
in java. It supports most of the perl5 syntax, and is documented
in pages generated by javadoc. It works by treating each pattern
element as an
class which knows how to match itself and ask the next element to
match itself. Because of this, you can extend class regex to match
new syntax and pattern types.
Example of use
import pat.Regex;
public class tstRegex {
public static void main(String[] notused) {
Regex r = new Regex("[a-c]+([x-z]+)");
r.search("abcxyz");
System.out.println("match => "+r.substring());
System.out.println("backrefernce 0 => "+r.substring(0));
}
}
Which produces the output:
match => abcxyz
backreference 0 => xyz
To install this software you can either use the standard
pat.tar.gz or do the following:
- Download suitcase.class.
- Execute the command:
java suitcase
This will
create a directory named "pat" in the current directory and
install all needed files there.
- Set CLASSPATH appropriately.
You can also download a few test files if you like:
deriv.java
This file demonstrates how you can extend class regex to
compile new patterns.
tokenTest.java an example of what
the regexTokenizer does.
Documetation can be found in the directory pat/doc if you install,
or it may be read online. The best place to start is
pat.Regex.html.
If you wish to track new developments with this software, please send
me email.
But, if you don't want to go to the trouble of downloading it, if you
would rather just type in perverse patterns to try and break my library,
then you can just do that below. Simply type a pattern in, then some
text, then hit the return key to see the results of the match.
Or, if you are a really perverse individual, you could play my new
regular expression game.
Differences from the alpha version
This is now the beta version, although I still make the alpha accessible.
The new version differs in the following ways:
- It contains a few bug fixes (though I have received very few bug reports)
- My naming convention has changed to
accomodate capitols for classes and lower case for methods. The class
name "regex" is now "Regex."
- There is a new class RegRes, from which Regex is now derived.
RegRes is short for "Regular expression match result" and a RegRes object containing info about the last successful match can be obtained from
Regex's result() method.
- Backreferences (things in ()'s) are now treated more like patterns.
In other words, Regex.left() returns what's left of the match,
and Regex.left(1) returns what's left of backreference 1.
The only difference is that a function that takes void refers
to the match, and a function that takes int refers to a backreference. This convention applies to: left() right() substring() matchFrom() charsMatched().
- Return values make more sense, an unmatched pattern will give null from left(), and -1 from charsMatched().
- The beta source is not being distributed. The alpha source is still available and free, however.
Release 1.0
This will focus on making the package faster. I am also considering including a few more new pattern types (always beginning with a "(?"). I am considering, for example, making a type of backreference with a variable name "(?'a'" would store the backreference in a variable named "a", rather than in "0", "1", "2", etc.