Address Parser
Address Parser
The Address Parser is a smart operation that automatically splits a full address string into its individual components: street number, street name, city, state, and zip code. Parsing addresses into components dramatically improves match quality because you can compare individual components (matching city to city, zip to zip) rather than relying on fuzzy matching of entire address strings.
How Address Parsing Works
The Address Parser analyzes each address value and identifies its structural components. For example:
- Input: "123 Main Street, Springfield, IL 62704"
- Output columns:
Parsed_StreetNumber: "123"Parsed_StreetName: "Main Street"Parsed_City: "Springfield"Parsed_State: "IL"Parsed_ZipCode: "62704"
Configuring the Address Parser
- Drag the Address Parser node from the Smart Operations category onto the canvas.
- Connect it into your workflow.
- Click the node and configure the properties:
- Target field -- Select the column containing full address strings.
- Prefix -- Enter a prefix for the output column names (e.g., "Parsed_"). This prefix is prepended to each component name to create the output column names.
The parser creates new columns automatically. With a prefix of "Parsed_", you will get columns named Parsed_StreetNumber, Parsed_StreetName, Parsed_City, Parsed_State, and Parsed_ZipCode.
Prefix Requirements
The prefix serves two purposes:
- It avoids naming conflicts with existing columns in your datasource.
- It makes the parsed columns easy to identify in the data preview and in match definition configuration.
Important
Each Address Parser node in your workflow must use a unique prefix. If you parse two different address fields (e.g., "HomeAddress" and "WorkAddress"), use different prefixes like "Home_" and "Work_" to avoid column name collisions.
Why Parse Addresses for Matching
Matching on full address strings is unreliable because of the many ways addresses can be written:
- "123 Main St" vs "123 Main Street" vs "123 Main St."
- "Springfield, IL 62704" vs "Springfield IL 62704" vs "SPRINGFIELD, IL 62704"
By parsing into components, you can:
- Exact match on zip code -- A strong first-pass filter.
- Fuzzy match on street name -- Handles abbreviation differences.
- Exact match on street number -- Usually consistent.
- Weight components differently -- Give zip code and street number higher weights, street name moderate weight, and city/state lower weight (since they correlate with zip).
Handling Edge Cases
The parser handles most standard US address formats. Be aware of these edge cases:
- PO Boxes -- "PO Box 1234" may parse differently than street addresses.
- Apartment/suite numbers -- "123 Main St Apt 4B" -- the apartment component may be included with the street name.
- International addresses -- The parser is optimized for US address formats. International addresses may not parse correctly.
- Incomplete addresses -- Missing components will result in empty values in the corresponding output columns.
Tip
After parsing, apply additional cleansing to the parsed components. For example, use #standardization-case-conversion on street names and https://help.matchlogic.io/article/238-replace-and-remove-operations to standardize abbreviations like "St" to "Street" in the Parsed_StreetName column.