C# / RegEx – How can I improve my RegEx expression?

I have an question about my code with RegEx. My case is that I have about 700 text files, which I import within my tool and that is sometime taking under one second or sometimes taking up to 7 seconds.

So I profiled my tool and it was saying File.ReadAllLinesand my RegEx is taking the most time.. So maybe some RegEx pro can help me and teach me how I can improve my RegEx expression.

This is my code:

var entries = new List<SomeModelObject>();

            try
            {
                var allLines = File.ReadAllLines(filepath, Encoding.Default);

                var isHashCollected = false;
                // main pattern
                var pattern = @"[a-zA-Z<>:_]+\s+SYMBOL:\s(?<var>\w+)(|\s+)=\s(?<value>\W\w+|\w+)\s;\s\/\/(?<comment>.*)";
                Match match;
                Regex regex = new Regex(pattern,
                      RegexOptions.Singleline);

                var secondPattern = @"<0:64:0>[\w\s;]+\/\/\s(?<second>\w+)";
                Regex regexSecondPattern  = new Regex(secondPattern , RegexOptions.Singleline);

                var thirdPattern = @"<@\(#\)(?<third>\w+)";
                Regex regexThirdPattern  = new Regex(thirdPattern , RegexOptions.Singleline);
                

                foreach (var line in allLines)
                {
                    if (!isHashCollected)
                    {
                        isHashCollected = GetHash(regexThirdPattern.Match(line), someObject, isHashCollected);
                    }

                    match = regex.Match(line);

                    if (match.Success)
                    {
                        // get entries
                        var someModelObject= new SomeModelObject();
                        someModelObject.projCase = match.Groups["var"].Value;
                        someModelObject.projValue = match.Groups["value"].Value;
                        someModelObject.projComment = match.Groups["comment"].Value;
                        entries.Add(someModelObject);
                        continue;
                    }

                    GetSecondHash(regexSecondPattern.Match(line), someObject);

                }

                return entries;
            }
            catch (Exception e)
            {
                return new List<SomeModelObject>();
            }

For completness:

private static bool GetHash(Match match, SomeObject someObject, bool isHashCollected)
    {
        if (match.Success)
        {
            someObject.hash = match.Groups["third"].Value;
            return true; 
        }
        return isHashCollected;
    }

    private static void GetSecondHash(Match match, SomeObject someObject)
    {
        if (match.Success)
        {
            someObject.hash = match.Groups["second"].Value;
        }
    }

So I got three expressions, but I think “main pattern” must be improved. Maybe there is some other culprint here and I dont see it (any help is appreciated!)

And those are some lines from one file (where I do my RegEx) each has the same format:

<@(#)123456788c81a76adc83466212345678 
// 
<ABC_Defg:Element>  SYMBOL: ABCDEF = 00001  ;   //Some text
<ABC_Defg:Value>    SYMBOL: ABCDEF = 00000  ;   //Some text
<ABC_Defg:Value>    SYMBOL: ABCDEF = 00000  ;   //Some text


<0:64:0> 0xABCDEF 0xABCDEF 0xABCDEF0 0xABCDEF ; // 12345678f0aa885f12345678

As I said any help is appreciated!!! Thanks

UPDATE

I updated my pattern to this (?<var>[^:]*)[=](?<value>[^=]*)[;](?<comment>[^;]*) and I also parallized the reading of files with Parallel.ForeEach (thanks @charlieface) so I have now ~1 second to read and display input of 700 files. Thanks buddys for helping me out!

Read more here: Source link