regular expression – why this regex “.*” match against “abcd 1234 abcd” gives two matches

Why do I get two matches when using the regular expression .* on the string abcd 1234 abcd? See regex101.com/r/rV8jfz/1.

From the explanation given by regex101, I can see that the second match happened at position 14-14 and the value matched is null. But why is a second match done? Is there a way that I can avoid the second match?

I understand .* means zero or more of any character, so it’s trying to find zero occurrences. But I don’t understand why this null match is required.
The problem is when used in any language (e.g. Java), when I do while(matcher.find()) { ... }, this would loop twice while I would want it to loop only once.

I know this could not be a real world match situation, but to understand and explore regex, I see this as a good case to study.

Edit – follwing @terdon response.
I did like to keep the /g option in regex101, i am aware about it. I would like to know the total possible matches.
regex101.com/r/EvOoAr/1 -> pattern abcd against string abcd 1234 abcd gives two matches. And i wan’t to know this information.

the problem i find is, when dealing this in a language like java –
Ref – onecompiler.com/java/3xnax494k

  String str = "abcd 1234 abcd";
  Pattern p = Pattern.compile(".*");
  Matcher matcher = p.matcher(str);
  int matchCount=0;
  while(matcher.find()) {
    matchCount++;
    System.out.println("match number: " + matchCount);
    System.out.println("matcher.groupCount(): " + matcher.groupCount());
    System.out.println("matcher.group(): " + matcher.group());
  }

The output is –

match number: 1
matcher.groupCount(): 0  //you can ignore this
matcher.group(): abcd 1234 abcd
match number: 2
matcher.groupCount(): 0
matcher.group():  //this is my concern. The program has to deal with this nothing match some how.

It would be nice for me as a programmer, if the find() did not match against “nothing”. I should add additional code in the loop to catch this “nothing” case.

This null problem (in code) will get even worse with this regex case – regex101.com/r/5HuJ0R/1 -> [0-9]* against abcd 1234 abcd gives 12 matches.

Read more here: Source link