regex – Regexp to remove small numbers and leave large ones
On text files with many paragraphs, sentences or phrases are numbered. I use regexp in perl to remove those numbers. They are always followed by the first letter of the sentence/phrase or by a space. But that also matches numbers legitimately part of the text. If I could limit it to a string of one or two digits, not more, which does not contain a comma, I could manually delete the rare instances of an unwanted three-digit number, or re-insert the rare two-digit number that shouldn’t have been deleted.
I haven’t been able to figure out a regexp with those limitations. How can that be done?
Example:
perl -p -i -e 's:(\D)\d{1,2}(\w):\1\2:g;
s:\d+-\d+::g;
s:^\d{1,2} ?::g;' {filenames}
removed the markers but also removed the digits from “337,000” leaving the comma.
Read more here: Source link
