REGEX: splitting up bank transaction statements
I’m writing a C# program to convert PDF bank statements into CSV using regex. Basically, there are only three different scenarios on the bank statement.
01 Sep 2025 Opening balance $10,000.00 DR
02 Sep 2025 Repayment/Payment $100.00 $10,100.00 DR
03 Sep 2025 Interest charged -300.00 $9,800.00 DR
I’m using this regex expression:
((\d{2}) (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) (20\d{2}))(.*)(\-[\d.,]+)?(\$[\d.,]+)?\s*(\$[\d.,]+)
It seems to break up date, description & balance correctly. However, it fails to break up description & debit/credit lines.
eg 02 Sep 2025 Repayment/Payment $100.00 $10,100.00 DR
RESULT: Group[5] is Repayment/Payment $100.00 (debit amount is not broken up with description). What have I done wrong?
Read more here: Source link
