Regex: Delete all the instances of html tag, except the first one
I have several files that looks like this:
<title>用正确方式打开 MyGainer增健肌粉! - MYPROTEIN™</title>
blah bhla
blah bhla
<title>Home is me</title>
blah bhla
<title>Payton is your name</title>
I want to find a regex that to delete all lines that contains <title>.*</title> except the first line:
My regex is not very good:
FIND: (<title>.*?</title>)(?=(?:<title>|$)) or(?s-i)\A.*\K<title>(.*?)(.*?</title>)
Replace by: \1
I made a Python code, very good, but I need the regex for this job:--------------------------
import re
def keep_first_title_tag(extracted_content):
# Find all `<title>` tags
title_tags = re.findall(r'<title>(.*?)</title>', extracted_content, re.DOTALL)
# Keep only the first `<title>` tag
extracted_content = title_tags[0]
return extracted_content
extracted_content = """
<title>用正确方式打开 MyGainer增健肌粉! - MYPROTEIN™</title>
blah bhla
blah bhla
<title>Home is me</title>
blah bhla
<title>Payton is your name</title>
"""
extracted_content = keep_first_title_tag(extracted_content)
print(extracted_content)
Read more here: Source link
