Friday, April 01, 2005

Regular Expressions in Java

I started playing around with regular expressions this past week. I am going to use them in the personal project that I am working on to validate different user input data strings. I recently purchased a very good book from Apress called Regular Expression Recipes: A Problem-Solution Approach by Nathan A. Good. They have another book specifically for Java called Java Regular Expressions: Taming the java.util.regex Engine, but I did not know about it before I purchased the first book. It may have helped me with this problem. :)

Anyway, the book that I purchased has examples for Perl, Python, and shell scripting. Since it was the first example for each recipe, I picked off the regular expressions from the Perl code for the particulat items that I wanted to validate and used that in Java. I was dumbfounded when it did not work. As an example, I used a regular expression to validate an email address. The regular expression looked like this from the Perl example:

/^[-\w.]+@([A-z0-9][-A-z0-9]+\.)+[A-z]{2,4}$/
I spent a couple of evenings after work at home on this problem before I finally figured it out. You should not use the starting and ending slash ("/") when using any regular expression in Java. Java's regular expression compiler apparently sees this as a literal character when it compiles the expression and the corresponding validation ends up failing. When I removed those slashes, the validation unit test past and things were grand!

One other thing to remember in Java with regular expressions is that the backslash ("\") is an escape character. Unfortunately, it is also an escape character in Java. Thusly, you have to use double backslashes for the escape character in your regular expressions in Java. So, in the example above, the first \w should be set in Java as \\w.

Technorati Tags: , , ,

No comments: