Regex with m flag in Perl vs. Python
Perl and Python’s regex engines differ slightly on the definition of a “line”; Perl does not consider the empty string following a trailing newline in the input string to be a line, Python does.
Best solution I can come up with is to change "^"
to r"^(?=.|\n)"
(note r
prefix on string to make it a raw literal; all regex should use raw literals). You can also simplify a bit by just calling methods on the compiled regex or call re.sub
with the uncompiled pattern, and since count=0
is already the default, you can omit it. Thus, the final code would be either:
re.compile(r"^(?=.|\n)", re.M).sub("[stamp]", "message\n")
or:
re.sub(r"^(?=.|\n)", "[stamp]", "message\n", flags=re.M)
Even better would be:
start_of_line = re.compile(r"^(?=.|\n)", re.M) # Done once up front
start_of_line.sub("[stamp]", "message\n") # Done on demand
avoiding recompiling/rechecking compiled regex cache each time, by creating the compiled regex just once and reusing it.
An alternative solution would be to split up the lines in a way that will match Perl’s definition of a line, then use the non-re.MULTILINE
version of the regex per line, then shove them back together, e.g.:
start_of_line = re.compile(r"^") # Compile once up front without re.M
# Split lines, keeping ends, in a way that matches Perl's definition of a line
# then substitute on line-by-line basis
''.join([start_of_line.sub("[stamp]", line) for line in "message\n".splitlines(keepends=True)])
Read more here: Source link