Regex Tutorial

Regular expressions, or regex, is a very flexible set of string processing methods in Java (and other programming languages). Regex string processing can reduce dozens of lines of code to just one. It relies on pattern-matching, and this tutorial gives a quick overview of the basics of those patterns. If you would like to become a regex wizard, look up a more advanced tutorial on the internet. If you'd like practice, go to regexr.com.

First, a table of general Java regex methods. Note that x, y, and z are all strings in this table. Also note that no method modifies x, y, or z.

Method Function
x.matches(y) Returns true if x matches the regex y
x.split(y) Finds all the substrings of x that match the regex y, splits x around those substrings and returns an array of those split strings
x.replaceAll(y,z) Finds all substrings of x that match the regex y, replaces them with z
x.replaceFirst(y,z) Finds the first substring of x that matches y and replaces it with z

Now, time to learn the regex symbols.

The first symbol to learn is . which matches any character. For example, "abc".matches("ab.")
and "abc".matches("...")both return true, but "abc".matches("a.bc")and "abc".matches("..") both return false.

Next are +*? which match repetitions of the previous token. A token is a single character or a group of characters bounded by parentheses. + matches one or more. * matches zero or more. ? matches zero or one. For example, "abbbbd".matches("a*b*c*d"), "baccc".matches("b+ac+") and "ac".matches("a?b?c?") all return true.

As mentioned, () can be used to group multiple characters into a single token. For example, "abacad".matches("(a.)+") and "acccaacacccc".matches("(ac*)+") both return true.

The symbol | will match either the portion to its left or the portion to its right. For example, "bdbdacbd".matches("(bd|ac)+") returns true. The symbol can also be chained, so (ab|cd|ef|gh) will match ab, cd, ef, or gh.

Use \ if you want to escape special regex symbols (i.e. you want Java to interpret the symbol literally, not as a special symbol). But be sure to escape \ in Java string literals. For example, "+?|\\".matches("\\+\\?\\|\\\\") and "****a".matches("\\**a") both return true.

There are two sets of symbols that need special explanation.

First are {}, which extend the symbols +*?. There are three ways to use the symbol. One way is {n}, which matches exactly n of the previous token. Next is {n,} which matches n or more of the previous token. Finally is {n,m}, which matches anywhere from n to m of the previous token. As an example, + is analogous to {1,}, * is analogous to {0,}, and ? is analogous to {0,1}.

Next are [], which extend the symbol |. For example, [abc] is equivalent of (a|b|c). Also, putting a - between two characters will match any characters between them on the ASCII table. So [A-Za-z] will match any letter, uppercase or lowercase, and [A-M] will match any Uppercase letter in the first half of the alphabet. Putting a ^ at the beginning will match any character not between the brackets. For example [^A-Za-z0-9_] will match any character that is not alphanumeric.

To put it all together, the following Java statement will only return true if str is a valid email as described in the problem.

str.matches("[A-Za-z0-9_]+@[A-Za-z0-9_]+\\.(com|net|org)")

Here are some examples that use the other regex methods:

out.println("Output: " + Arrays.toString("-12, true  ,90,3, hello, 0.6".split(" *, *")));
// The call to split returns an array of strings that contains the comma-separated data of the 
// original string, regardless of the funky spacing around the commas.
Output: [-12, true, 90, 3, hello, 0.6]

out.println("Output: " + "There are 4 cats, 102 dogs, and 90 birds.".replaceAll("[0-9]+", "a number of"));
// Replaces all integers in the string with "a number of"
Output: There are a number of cats, a number of dogs, and a number of birds.

out.println("Output: " + "There are 4 cats, 102 dogs, and 90 birds.".replaceFirst("[0-9]+", "a number of"));
// Replaces just the first integers in the string with "a number of"
Output: There are a number of cats, 102 dogs, and 90 birds.

results matching ""

    No results matching ""