Python Regex split by full stop within Nth character

I believe you’re looking for something like this:

^.{0,99}(?:[.?]|$)|(?<=[.?]).{0,99}(?:[.?]|$)|.{100,}?(?:[.?]|$)

If it encounter large (>100) sentence without . or ? it will return smallest block till the nearest . or ?.

Here we are matching:

  • ^.{0,99}(?:[.?]|$) block from beginning of the string less then 100 symbols, ending with ., ? or taking full string.
  • (?<=[.?]).{0,99}(?:[.?]|$) block starting after . or ? (not to omit beginninig of large sentences).
  • .{100,}?(?:[.?]|$) smallest number of symbols, but bigger than 100 ending with ., ? or end of string: for cases of large sentences.

Demo here.


UPDATE: if you want just blocks of length 100 if sentence doesn’t contain . or ? you can use simple:

.{0,99}(?:[.?]|$)|.{100}

Demo here

Read more here: Source link