Regex with m flag in Perl vs. Python

Perl and Python’s regex engines differ slightly on the definition of a “line”; Perl does not consider the empty string following a trailing newline in the input string to be a line, Python does.

Best solution I can come up with is to change "^" to r"^(?=.|\n)" (note r prefix on string to make it a raw literal; all regex should use raw literals). You can also simplify a bit by just calling methods on the compiled regex or call re.sub with the uncompiled pattern, and since count=0 is already the default, you can omit it. Thus, the final code would be either:

re.compile(r"^(?=.|\n)", re.M).sub("[stamp]", "message\n")

or:

re.sub(r"^(?=.|\n)", "[stamp]", "message\n", flags=re.M)

Even better would be:

start_of_line = re.compile(r"^(?=.|\n)", re.M)  # Done once up front

start_of_line.sub("[stamp]", "message\n")  # Done on demand

avoiding recompiling/rechecking compiled regex cache each time, by creating the compiled regex just once and reusing it.

An alternative solution would be to split up the lines in a way that will match Perl’s definition of a line, then use the non-re.MULTILINE version of the regex per line, then shove them back together, e.g.:

start_of_line = re.compile(r"^")  # Compile once up front without re.M

# Split lines, keeping ends, in a way that matches Perl's definition of a line
# then substitute on line-by-line basis
''.join([start_of_line.sub("[stamp]", line) for line in "message\n".splitlines(keepends=True)])

Read more here: Source link