regex – grep PCRE still greedy

A greedy match would include everything from the first start to the last end, thus:

$ grep -Pzo '(?s)start.*end' file.txt
start                                                                                                                                                                                        
word1                                                                                                                                                                                        
word1                                                                                                                                                                                        
word1                                                                                                                                                                                        
word1                                                                                                                                                                                        
end                                                                                                                                                                                          
word2                                                                                                                                                                                        
word2                                                                                                                                                                                        
word2                                                                                                                                                                                        
start                                                                                                                                                                                        
word3                                                                                                                                                                                        
word3                                                                                                                                                                                        
word3                                                                                                                                                                                        
end

What you are actually seeing is two separate non-greedy matches, output on separate “lines” per the -o option – except that with -Z, “lines” are actually denoted by the null character instead of the newline character:

$ grep -Pzo '(?s)start.*?end' file.txt
start                                                                                                                                                                                        
word1                                                                                                                                                                                        
word1                                                                                                                                                                                        
word1                                                                                                                                                                                        
word1                                                                                                                                                                                        
endstart                                                                                                                                                                                     
word3                                                                                                                                                                                        
word3                                                                                                                                                                                        
word3                                                                                                                                                                                        
end

Since we can’t see the null byte here, it’s clearer if you add -b to indicate the byte offsets of the two matches within the “line”:

$ grep -Pzo -b '(?s)start.*?end' file.txt
0:start                                                                                                                                                                                      
word1                                                                                                                                                                                        
word1                                                                                                                                                                                        
word1                                                                                                                                                                                        
word1                                                                                                                                                                                        
end52:start                                                                                                                                                                                  
word3                                                                                                                                                                                        
word3                                                                                                                                                                                        
word3                                                                                                                                                                                        
end

Since the -o outputs are null-separated, you could pipe the result through head -z to get just the first match:

$ grep -Pzo '(?s)start.*?end' file.txt | head -z -n 1
start
word1
word1
word1
word1
end

Alternatively you could use perl itself

perl -0777 -nE 'say for /(start.*?end)/s' file.txt

which only prints one match in spite of the for loop since the g flag is omitted.

Read more here: Source link