Wednesday, March 23, 2011

Regex: Parse a Fixed Length Text File

In the c languange you can use sscanf(), but there is no function similar to this in .NET. The equivalent function is to use Regex to map fixed length field sizes to a GROUP, then access the GROUP NAMES to get the values.



// Create the pattern using Regex.  
//  is a GROUP.  
// ".{22}" means any (.) 22 characters {22}.

// SAMPLE INPUT FIXED LEGTH TEXT FILE 
// 123456789012345678901212345671234567891234567812345678123123
// 123456789012345       12345  1234567  2010041020100411ABC001
// <----22 characters---><--7--><---9---><--8---><--8---><3><3>
StringBuilder sb = new StringBuilder();
sb.AppendFormat("{0}", @"^");
sb.AppendFormat("{0}", @"(?.{22})");
sb.AppendFormat("{0}", @"(?.{7})");
sb.AppendFormat("{0}", @"(?.{9})");
sb.AppendFormat("{0}", @"(?.{8})");
sb.AppendFormat("{0}", @"(?.{8})");
sb.AppendFormat("{0}", @"(?.{3})");
sb.AppendFormat("{0}", @"(?.{3})");

string pattern = sb.ToString();
int OrderSequence = 0;
using (StreamReader sr = new StreamReader(this.FileNamePathToProcess))
{
    Regex re = new Regex(pattern);
    while (!sr.EndOfStream)
    {
        Match ma = re.Match(sr.ReadLine());

        OrderData myOrderData = new OrderData();
        myOrderData.OrderSequence = ++OrderSequence;
        myOrderData.OrderKey = ma.Groups["ORDERKEY"].Value.TrimEnd();
        myOrderData.OrderNumber = ma.Groups["ORDERNUM"].Value.TrimEnd();
        myOrderData.PatientSocialSecurity = ma.Groups["ID"].Value.TrimEnd();

        myOrderData.D_ORD = ma.Groups["DATEORDERED"].Value.TrimEnd();
        myOrderData.D_RX = ma.Groups["DATERECIEVED"].Value.TrimEnd();
        myOrderData.OrderTechnicianInitials = ma.Groups["INITIALS"].Value.TrimEnd();
        myOrderData.OrderPriority = ma.Groups["PRIORITY"].Value.TrimEnd();

        this.PackageOrderData.Add(myOrderData);
    }
}

1 comment:

  1. For anyone coming along after me and having a slight problem with the above example, the GROUP NAMES aren't defined in the pattern section of this example. The pattern should look something like this, but with out the spaces around the group name (only way I could find to keep them from being stripped from the post).

    sb.AppendFormat("{0}", @"(?< ORDERKEY >.{22})");
    sb.AppendFormat("{0}", @"(?< ORDERNUM >.{7})");
    sb.AppendFormat("{0}", @"(?< ID >.{9})");

    ReplyDelete