regular expressions – Regex: Pre-determining the position of matching characters

For all regular expressions, is it possible to pre-determine the set of possible positions in which any given sub-expression may be found? If so, is there any existing research on this subject?

Here’s some examples of what I have in mind:

  1. For the expression /^f/, we can determine that f must be found as the first character in the input.
  2. For the expression /b$/, we can determine that b must be found as the last character in the input.
  3. For the expression /c.....b/, we cannot determine a fixed position for b, but we can pre-determine that once a c is encountered, a b must be found 6 characters later.

What I’ve got in mind is the notion of a “guided” regex. E.g. consider example 3 above. Given a regex implementation that can be streamed its input, and an input which can be randomly accessed, we should be able to do better than current implementations by having the regex implementation skip 6 characters ahead once it encounters a c.

Read more here: Source link