Python Regex split by full stop within Nth character
I believe you’re looking for something like this:
^.{0,99}(?:[.?]|$)|(?<=[.?]).{0,99}(?:[.?]|$)|.{100,}?(?:[.?]|$)
If it encounter large (>100) sentence without . or ? it will return smallest block till the nearest . or ?.
Here we are matching:
^.{0,99}(?:[.?]|$)block from beginning of the string less then 100 symbols, ending with.,?or taking full string.(?<=[.?]).{0,99}(?:[.?]|$)block starting after.or?(not to omit beginninig of large sentences)..{100,}?(?:[.?]|$)smallest number of symbols, but bigger than 100 ending with.,?or end of string: for cases of large sentences.
Demo here.
UPDATE: if you want just blocks of length 100 if sentence doesn’t contain . or ? you can use simple:
.{0,99}(?:[.?]|$)|.{100}
Demo here
Read more here: Source link
