Java Regex - Possessive Quantifiers

[Last Updated: Jan 23, 2016]

As we saw in previous tutorials that the Greedy quantifier causes the regex engine to eat the input string as much as possible, even before attempting for a match. To have a match, the engine may back off to try different permutations. Possessive quantifiers are a way to prevent the regex engine from trying all permutations. This is primarily useful for performance reasons.

We can change a greedy quantifier into possessive by appending + at the end.

Let's explore details along with examples

Regex Construct/Terms Examples
The Plus Possessive Quantifier:

It matches one or more times possessively.

How a possessive quantifier works?

The possessive quantifier's first step is same as of greedy quantifiers. They possibly eat entire input string before doing a match. After eating the input string, they match the 'entire' regex expression against the eaten input. If there's a match then good!. If there's no match then that's it, no more steps. The engine will declare final no match found output. They don't back off like greedy quantifiers do.

Consider the pattern ".++x" (matches any character one or more times followed by x) and the input string "abx". The first step, the regex engine will do is exactly same as greedy quantifier. The part ".++" (notice no x here) will allow regex engine to eat the entire input string before doing a match. Then next step, engine will try to match the entire pattern ".++x" against the eaten input "abx". It's not a match because the patten is expecting "abxx". Also there's no backing off as mentioned before. So the match will failed.

Now consider the pattern "x++" (matches one or more Xs) and the input string "xxbxxx". The greedy part "x+" doesn't allow to eat the entire input because the presence of "b" doesn't match the greedy pattern. In that case, the engine will trying to match the longest possible parts against the entire pattern at once. The first "xx" is a match. "b" is not a match. The last part "xxx" is a match too.

As we saw in above two examples the possessive quantifier is same as greedy quantifier except for they don't do any backing off

.find();//matches: 'abc' at 0-3 //'abc' /* No match because there's no backing off after reading/eating the entire input against '.++'*/ Pattern.compile(".++x")
.find();//no matches /* Making it greedy*/ Pattern.compile(".+x")
.find();//matches: 'abcx' at 0-4 //'abcx' /* It is just like greedy 'x+'. The longest matches are found in this case, which is different than the reluctant x+?*/ Pattern.compile("x++")
.find();//matches: 'xx' at 0-2, 'xxx' at 3-6 //'xxbxxx' /* Making it reluctant*/ Pattern.compile("x+?")
.find();//matches: 'x' at 0-1, 'x' at 1-2, 'x' at 3-4, 'x' at 4-5,
//'x' at 5-6 //'xxbxxx' Pattern.compile("A.++")
.find();//matches: 'AEG' at 0-3 //'AEG' Pattern.compile("((cat)|(dog))++")
.matcher("cat dog catdog dogcat")
.find();//matches: 'cat' at 0-3, 'dog' at 4-7, 'catdog' at 8-14,
//'dogcat' at 15-21 //'cat dog catdog dogcat' /* Making it reluctant*/ Pattern.compile("((cat)|(dog))+?")
.matcher("cat dog catdog dogcat")
.find();//matches: 'cat' at 0-3, 'dog' at 4-7, 'cat' at 8-11,
//'dog' at 11-14, 'dog' at 15-18, 'cat' at 18-21 //'cat dog catdog dogcat'
The Asterisk Possessive Quantifier:

It matches zero or more times possessively.

It works same as ++ except that they supports zero-length matches too.

.find();//matches: '' at 0-0 /* No backing off that's why no match*/ Pattern.compile(".*+x")
.find();//no matches /* This is same as greedy x**/ Pattern.compile("x*+")
.find();//matches: 'xx' at 0-2, '' at 2-2, 'xx' at 3-5, '' at 5-5 //'xxbxx' /* Making it reluctant. No matches because the smallest match of 'xx' is empty string.*/ Pattern.compile("x*?")
.find();//matches: '' at 0-0, '' at 1-1, '' at 2-2, '' at 3-3,
//'' at 4-4, '' at 5-5 /* No matches becuse, the part '.*+' will eat the entire input and there's no backing off.*/ Pattern.compile("a.*+z")
.find();//no matches Pattern.compile("a*+z")
.find();//matches: 'z' at 2-3, 'az' at 6-8 //'abztstaz' Pattern.compile("a*+z")
.find();//no matches /* The part 'z?' can match zero number of times*/ Pattern.compile("a*+z?")
.find();//matches: 'aaaaaaa' at 0-7, '' at 7-7 //'aaaaaaa' /* This pattern matches the single quoted strings. We can use greedy quantifier by removing '+', which will do the same thing. But making it possessive we are telling engine not to backoff and fail fast. Pattern of this kind, after all, won't accept any single quote inside the string, notice [^']. Also the primary purpose of the possessive quantifiers to get some performance benefits*/ Pattern.compile("'[^']*+'")
.matcher("'Reason and Rationality'")
.find();//matches: ''Reason and Rationality'' at 0-24 //''Reason and Rationality''
The Question Mark Possessive Quantifier:

matches zero or one, possessively.

.find();//matches: '' at 0-0 Pattern.compile(".?+")
.find();//matches: 'a' at 0-1, 'b' at 1-2, 'c' at 2-3, '' at 3-3 //'abc' Pattern.compile(".?+x")
.find();//matches: 'yx' at 3-5 //'yyyyx' /* x doesn't match because '.?+' will eat the entire string which is only 'x' here and there's no backing off for possessive '+'*/ Pattern.compile(".?+x")
.find();//no matches Pattern.compile("x?+")
.find();//matches: 'x' at 0-1, '' at 1-1, '' at 2-2, '' at 3-3 //'xyz' Pattern.compile("x?+z")
.find();//matches: 'z' at 2-3, 'z' at 4-5, 'z' at 5-6, 'xz' at 8-10 //'xyzyzzxyxzy'
The Curly Brackets Possessive Quantifier.
Here are the details on the curly bracket constructs.
.find();//matches: 'abcd' at 0-4 //'abcd' Pattern.compile("a.{0,}+z")
.find();//no matches Pattern.compile(".*+abc")
.find();//no matches

Example Project

Dependencies and Technologies Used:

  • JDK 1.8
  • Maven 3.0.4

Regex Possessive Quantifier Select All Download
  • possessive-quantifier
    • src
      • main
        • java
          • com
            • logicbig
              • example

    See Also