regex – grep PCRE still greedy
A greedy match would include everything from the first start to the last end, thus:
$ grep -Pzo '(?s)start.*end' file.txt
start
word1
word1
word1
word1
end
word2
word2
word2
start
word3
word3
word3
end
What you are actually seeing is two separate non-greedy matches, output on separate “lines” per the -o option – except that with -Z, “lines” are actually denoted by the null character instead of the newline character:
$ grep -Pzo '(?s)start.*?end' file.txt
start
word1
word1
word1
word1
endstart
word3
word3
word3
end
Since we can’t see the null byte here, it’s clearer if you add -b to indicate the byte offsets of the two matches within the “line”:
$ grep -Pzo -b '(?s)start.*?end' file.txt
0:start
word1
word1
word1
word1
end52:start
word3
word3
word3
end
Since the -o outputs are null-separated, you could pipe the result through head -z to get just the first match:
$ grep -Pzo '(?s)start.*?end' file.txt | head -z -n 1
start
word1
word1
word1
word1
end
Alternatively you could use perl itself
perl -0777 -nE 'say for /(start.*?end)/s' file.txt
which only prints one match in spite of the for loop since the g flag is omitted.
Read more here: Source link
