how to escape hyphen in regex character group in c
According to all the documentation I can find, in posix extended regex’s, you should be able to escape the – in a character group…
This is not correct. From POSIX 9.3.5 RE Bracket Expression…
The special characters ‘.’, ‘*’, ‘[‘, and ‘\’ ( , , , and , respectively) shall lose their special meaning within a bracket expression.
I understand that I can put the dash at the end of the character group without escaping at all, but I’m wondering how to properly escape if it’s in the middle of the character group.
There isn’t. You have to play with the parsing rules, as explained in the same document.
The character shall be treated as itself if it occurs first (after an initial ‘^’, if any) or last in the list, or as an ending range point in a range expression. As examples, the expressions “[-ac]” and “[ac-]” are equivalent and match any of the characters ‘a’, ‘c’, or ‘-‘; “[^-ac]” and “[^ac-]” are equivalent and match any characters except ‘a’, ‘c’, or ‘-‘; the expression “[%–]” matches any of the characters between ‘%’ and ‘-‘ inclusive; the expression “[–@]” matches any of the characters between ‘-‘ and ‘@’ inclusive; and the expression “[a–@]” is either invalid or equivalent to ‘@’, because the letter ‘a’ follows the symbol ‘-‘ in the POSIX locale. To use a as the starting range point, it shall either come first in the bracket expression or be specified as a collating symbol; for example, “[][.-.]-0]”, which matches either a or any character or collating element that collates between and 0, inclusive.
If a bracket expression specifies both ‘-‘ and ‘]’, the ‘]’ shall be placed first (after the ‘^’, if any) and the ‘-‘ last within the bracket expression.
What a nightmare. It’s simplest to just put the dash at the front or back.
POSIX regexes are pretty crude. Consider pcre or GRegex instead for anything serious.
Read more here: Source link