Skip to Main Content

Java Programming

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

regex: how can Matcher.matches return true, but Matcher.find return false?

807580Jan 8 2010 — edited Jan 12 2010
Consider the class below:
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexBugDemo {
	
	private static final Pattern numberPattern;
	static {
			// The BigDecimal grammar below was adapted from the BigDecimal(String) constructor.
			// See also p. 46 of http://www.javaregex.com/RegexRecipesV1.pdf for a regex that matches Java floating point literals; uses similar techniques as below.
		String Sign = "[+-]";
		String Sign_opt = "(?:" + Sign + ")" + "?";	// Note: the "(?:" causes this to be a non-capturing group
		String Digits = "\\p{Digit}+";
		String IntegerPart = Digits;
		String FractionPart = Digits;
		String FractionPart_opt = "(?:" + FractionPart + ")" + "?";
		String Significand = "(?:" + IntegerPart + "\\." + FractionPart_opt + ")|(?:" + "\\." + FractionPart + ")|(?:" + IntegerPart + ")";
		String ExponentIndicator = "[eE]";
		String SignedInteger = Sign_opt + Digits;
		String Exponent = ExponentIndicator + SignedInteger;
		String Exponent_opt = "(?:" +Exponent + ")" + "?";
		numberPattern = Pattern.compile(Sign_opt + Significand + Exponent_opt);
	}
//	private static final Pattern numberPattern = Pattern.compile("\\p{Digit}+");
	
	public static void main(String[] args) throws Exception {
		String s = "0";
//		String s = "01";
		
		Matcher m1 = numberPattern.matcher(s);
		System.out.println("m1.matches() = " + m1.matches());
		
		Matcher m2 = numberPattern.matcher(s);
		if (m2.find()) {
			int i0 = m2.start();
			int i1 = m2.end();
			System.out.println("m2 found this substring: \"" + s.substring(i0, i1) + "\"");
		}
		else {
			System.out.println("m2 NOT find");
		}
		System.exit(0);
	}
	
}
Look at the main method: it constructs Matchers from numberPattern for the String "0" (a single zero). It then reports whether or not Matcher.matches works as well as Matcher.find works. When I ran this code on my box just now, I get:
m1.matches() = true
m2 NOT find
How the heck can matches work and find NOT work? matches has to match the entire input sequence, whereas find can back off if need be! I am really pulling my hair out over this one--is it a bug with the JDK regex engine? Did not seem to turn up anything in the bug database...

There are at least 2 things that you can do to get Matcher.find to work.

First, you can change s to more than 1 digit, for example, using the (originaly commented out) line
		String s = "01";
yields
m1.matches() = true
m2 found this substring: "01"
Second, I found that this simpler regex for numberPattern
	private static final Pattern numberPattern = Pattern.compile("\\p{Digit}+");
yields
m1.matches() = true
m2 found this substring: "0"
So, the problem seems to be triggered by a short source String and a complicated regex. But I need the complicated regex for my actual application, and cannot see why it is a problem.

Here is a version of main which has a lot more diagnostic printouts:
	public static void main(String[] args) throws Exception {
		String s = "0";
		Matcher m1 = numberPattern.matcher(s);
		System.out.println("m1.regionStart() = " + m1.regionStart());
		System.out.println("m1.regionEnd() = " + m1.regionEnd());
		System.out.println("m1.matches() = " + m1.matches());
		System.out.println("m1.hitEnd() = " + m1.hitEnd());
		
		m1.reset();
		System.out.println("m1.regionStart() = " + m1.regionStart());
		System.out.println("m1.regionEnd() = " + m1.regionEnd());
		System.out.println("m1.lookingAt() = " + m1.lookingAt());
		System.out.println("m1.hitEnd() = " + m1.hitEnd());
		
		Matcher m2 = numberPattern.matcher(s);
		System.out.println("m2.regionStart() = " + m2.regionStart());
		System.out.println("m2.regionEnd() = " + m2.regionEnd());
		if (m2.find()) {
			int i0 = m2.start();
			int i1 = m2.end();
			System.out.println("m2 found this substring: \"" + s.substring(i0, i1) + "\"");
		}
		else {
			System.out.println("m2 NOT find");
			System.out.println("m2.hitEnd() = " + m2.hitEnd());
		}
		System.out.println("m2.regionStart() = " + m2.regionStart());
		System.out.println("m2.regionEnd() = " + m2.regionEnd());
		System.out.println("m1 == m2: " + (m1 == m2));
		System.out.println("m1.equals(m2): " + m1.equals(m2));
		System.exit(0);
	}
Unfortunately, the output gave me no insights into what is wrong.

I looked at the source code of Matcher. find ends up calling
boolean search(int from)
and it executes with NOANCHOR. In contrast, matches ends up calling
boolean match(int from, int anchor)
and executes almost the exact same code but with ENDANCHOR. Unfortunately, this too makes sense to me, and gives me no insight into solving my problem.
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Feb 9 2010
Added on Jan 8 2010
26 comments
2,569 views