Regex Pattern Extraction
Regex Pattern Extraction
The Regex Pattern node is an advanced Smart Operations tool that uses regular expressions to extract or transform data within a field. If the built-in cleansing nodes do not cover your specific transformation need, a custom regex pattern gives you full control over how text is matched and extracted.
What Regex Patterns Can Do
Regular expressions (regex) are powerful text-matching patterns that can identify and extract structured data from free-text fields. With the Regex Pattern node, you can:
- Extract specific portions of a value, such as pulling an email domain from a full email address.
- Identify and capture groups — match parts of a value using parenthesized groups and output only the captured portions.
- Validate data format — flag values that do not match an expected pattern.
- Transform complex patterns — reformat data using capture groups and output templates.
Configuring the Regex Node
- Drag the Regex Pattern node from the Smart Operations category onto the canvas.
- Connect it in your workflow.
- Click the node to open the properties panel.
- Select the input field — the column to apply the pattern to.
- Define the pattern — Enter your regular expression in the pattern field. Use standard regex syntax.
- Configure flags (optional):
- Case-insensitive — Match regardless of upper/lower case.
- Multiline — Treat each line in a multi-line value as a separate match target.
- Set the output format (optional) — Define a template for how matched groups should be assembled in the output.
- Run the node to apply the pattern and view results.
Examples
Here are some practical regex patterns you might use:
- Extract email domain: Pattern
@(.+)$applied to"user@example.com"extracts"example.com". - Extract zip code from address: Pattern
\b(\d{5})\bapplied to"Springfield, IL 62701"extracts"62701". - Extract area code from phone: Pattern
\((\d{3})\)applied to"(555) 123-4567"extracts"555". - Reformat dates: Pattern
(\d{2})/(\d{2})/(\d{4})with output template to rearrange date components.
Using Pre-Defined Patterns
MatchLogic includes a library of pre-defined regex patterns for common data types such as email addresses, phone numbers, zip codes, and dates. You can select these from the pattern library instead of writing your own regex. Pre-defined patterns can also be customized — use them as a starting point and modify as needed.
To manage the pattern library, see the regex patterns administration area accessible from the Data Profiling module. See #node-categories-backend-mapping for technical details on how regex patterns integrate with the backend.
Important
Regex patterns are powerful but can be complex. Test your pattern on a few sample values before running it on the full dataset. An incorrect pattern may produce unexpected results or empty outputs for values that do not match.
Tip
If you are not familiar with regular expressions, start with the pre-defined patterns in the library. They cover the most common extraction scenarios and are tested for accuracy.