regular expression – why this regex “.*” match against “abcd 1234 abcd” gives two matches
Why do I get two matches when using the regular expression .*
on the string abcd 1234 abcd
? See regex101.com/r/rV8jfz/1.
From the explanation given by regex101, I can see that the second match happened at position 14-14 and the value matched is null. But why is a second match done? Is there a way that I can avoid the second match?
I understand .*
means zero or more of any character, so it’s trying to find zero occurrences. But I don’t understand why this null match is required.
The problem is when used in any language (e.g. Java), when I do while(matcher.find()) { ... }
, this would loop twice while I would want it to loop only once.
I know this could not be a real world match situation, but to understand and explore regex, I see this as a good case to study.
Edit – follwing @terdon response.
I did like to keep the /g option in regex101, i am aware about it. I would like to know the total possible matches.
regex101.com/r/EvOoAr/1 -> pattern abcd
against string abcd 1234 abcd
gives two matches. And i wan’t to know this information.
the problem i find is, when dealing this in a language like java –
Ref – onecompiler.com/java/3xnax494k
String str = "abcd 1234 abcd";
Pattern p = Pattern.compile(".*");
Matcher matcher = p.matcher(str);
int matchCount=0;
while(matcher.find()) {
matchCount++;
System.out.println("match number: " + matchCount);
System.out.println("matcher.groupCount(): " + matcher.groupCount());
System.out.println("matcher.group(): " + matcher.group());
}
The output is –
match number: 1
matcher.groupCount(): 0 //you can ignore this
matcher.group(): abcd 1234 abcd
match number: 2
matcher.groupCount(): 0
matcher.group(): //this is my concern. The program has to deal with this nothing match some how.
It would be nice for me as a programmer, if the find() did not match against “nothing”. I should add additional code in the loop to catch this “nothing” case.
This null problem (in code) will get even worse with this regex case – regex101.com/r/5HuJ0R/1 -> [0-9]*
against abcd 1234 abcd
gives 12 matches.
Read more here: Source link