// Create the pattern using Regex. //is a GROUP. // ".{22}" means any (.) 22 characters {22}. // SAMPLE INPUT FIXED LEGTH TEXT FILE // 123456789012345678901212345671234567891234567812345678123123 // 123456789012345 12345 1234567 2010041020100411ABC001 // <----22 characters---><--7--><---9---><--8---><--8---><3><3> StringBuilder sb = new StringBuilder(); sb.AppendFormat("{0}", @"^"); sb.AppendFormat("{0}", @"(? .{22})"); sb.AppendFormat("{0}", @"(? .{7})"); sb.AppendFormat("{0}", @"(? .{9})"); sb.AppendFormat("{0}", @"(? .{8})"); sb.AppendFormat("{0}", @"(? .{8})"); sb.AppendFormat("{0}", @"(? .{3})"); sb.AppendFormat("{0}", @"(? .{3})"); string pattern = sb.ToString(); int OrderSequence = 0; using (StreamReader sr = new StreamReader(this.FileNamePathToProcess)) { Regex re = new Regex(pattern); while (!sr.EndOfStream) { Match ma = re.Match(sr.ReadLine()); OrderData myOrderData = new OrderData(); myOrderData.OrderSequence = ++OrderSequence; myOrderData.OrderKey = ma.Groups["ORDERKEY"].Value.TrimEnd(); myOrderData.OrderNumber = ma.Groups["ORDERNUM"].Value.TrimEnd(); myOrderData.PatientSocialSecurity = ma.Groups["ID"].Value.TrimEnd(); myOrderData.D_ORD = ma.Groups["DATEORDERED"].Value.TrimEnd(); myOrderData.D_RX = ma.Groups["DATERECIEVED"].Value.TrimEnd(); myOrderData.OrderTechnicianInitials = ma.Groups["INITIALS"].Value.TrimEnd(); myOrderData.OrderPriority = ma.Groups["PRIORITY"].Value.TrimEnd(); this.PackageOrderData.Add(myOrderData); } }
nug·get (nug!it)
n.
1. A small, solid lump, especially of gold.
2. A small compact portion or unit: nuggets of information.
Wednesday, March 23, 2011
Regex: Parse a Fixed Length Text File
In the c languange you can use sscanf(), but there is no function similar to this in .NET. The equivalent function is to use Regex to map fixed length field sizes to a GROUP, then access the GROUP NAMES to get the values.
Labels:
.NET
Subscribe to:
Post Comments (Atom)
For anyone coming along after me and having a slight problem with the above example, the GROUP NAMES aren't defined in the pattern section of this example. The pattern should look something like this, but with out the spaces around the group name (only way I could find to keep them from being stripped from the post).
ReplyDeletesb.AppendFormat("{0}", @"(?< ORDERKEY >.{22})");
sb.AppendFormat("{0}", @"(?< ORDERNUM >.{7})");
sb.AppendFormat("{0}", @"(?< ID >.{9})");