C# / RegEx – How can I improve my RegEx expression?
I have an question about my code with RegEx. My case is that I have about 700 text files, which I import within my tool and that is sometime taking under one second or sometimes taking up to 7 seconds.
So I profiled my tool and it was saying File.ReadAllLines
and my RegEx is taking the most time.. So maybe some RegEx pro can help me and teach me how I can improve my RegEx expression.
This is my code:
var entries = new List<SomeModelObject>();
try
{
var allLines = File.ReadAllLines(filepath, Encoding.Default);
var isHashCollected = false;
// main pattern
var pattern = @"[a-zA-Z<>:_]+\s+SYMBOL:\s(?<var>\w+)(|\s+)=\s(?<value>\W\w+|\w+)\s;\s\/\/(?<comment>.*)";
Match match;
Regex regex = new Regex(pattern,
RegexOptions.Singleline);
var secondPattern = @"<0:64:0>[\w\s;]+\/\/\s(?<second>\w+)";
Regex regexSecondPattern = new Regex(secondPattern , RegexOptions.Singleline);
var thirdPattern = @"<@\(#\)(?<third>\w+)";
Regex regexThirdPattern = new Regex(thirdPattern , RegexOptions.Singleline);
foreach (var line in allLines)
{
if (!isHashCollected)
{
isHashCollected = GetHash(regexThirdPattern.Match(line), someObject, isHashCollected);
}
match = regex.Match(line);
if (match.Success)
{
// get entries
var someModelObject= new SomeModelObject();
someModelObject.projCase = match.Groups["var"].Value;
someModelObject.projValue = match.Groups["value"].Value;
someModelObject.projComment = match.Groups["comment"].Value;
entries.Add(someModelObject);
continue;
}
GetSecondHash(regexSecondPattern.Match(line), someObject);
}
return entries;
}
catch (Exception e)
{
return new List<SomeModelObject>();
}
For completness:
private static bool GetHash(Match match, SomeObject someObject, bool isHashCollected)
{
if (match.Success)
{
someObject.hash = match.Groups["third"].Value;
return true;
}
return isHashCollected;
}
private static void GetSecondHash(Match match, SomeObject someObject)
{
if (match.Success)
{
someObject.hash = match.Groups["second"].Value;
}
}
So I got three expressions, but I think “main pattern” must be improved. Maybe there is some other culprint here and I dont see it (any help is appreciated!)
And those are some lines from one file (where I do my RegEx) each has the same format:
<@(#)123456788c81a76adc83466212345678
//
<ABC_Defg:Element> SYMBOL: ABCDEF = 00001 ; //Some text
<ABC_Defg:Value> SYMBOL: ABCDEF = 00000 ; //Some text
<ABC_Defg:Value> SYMBOL: ABCDEF = 00000 ; //Some text
<0:64:0> 0xABCDEF 0xABCDEF 0xABCDEF0 0xABCDEF ; // 12345678f0aa885f12345678
As I said any help is appreciated!!! Thanks
UPDATE
I updated my pattern to this (?<var>[^:]*)[=](?<value>[^=]*)[;](?<comment>[^;]*)
and I also parallized the reading of files with Parallel.ForeEach
(thanks @charlieface) so I have now ~1 second to read and display input of 700 files. Thanks buddys for helping me out!
Read more here: Source link