Custom Regex Patterns

Custom Regex Patterns

MatchLogic uses regex (regular expression) patterns to validate field values during data profiling. While the platform includes built-in patterns for common data types, you can create custom patterns to validate domain-specific formats unique to your organization.

Important

This is an advanced feature. Custom regex patterns require familiarity with regular expression syntax. If you are not comfortable writing regex, the built-in patterns handle most common validation scenarios.

Pattern Options modal showing Default, Custom, and Add New tabs

Accessing Pattern Options

To manage regex patterns, open the Pattern Options modal from the Data Profiling page. The modal has three tabs:

  1. Default Patterns -- Built-in patterns that ship with MatchLogic. These are read-only and cover common formats like email addresses, phone numbers, dates, postal codes, and URLs.
  2. Custom Patterns -- Patterns you have created. These are fully editable and can be toggled on or off.
  3. Add New Pattern -- A form for creating new custom patterns.

Built-In Default Patterns

The default patterns include validators for:

  • Email addresses
  • US and international phone numbers
  • US Social Security numbers
  • Zip codes (5-digit and ZIP+4)
  • Date formats (multiple variations)
  • URLs
  • IP addresses

Default patterns cannot be edited or deleted, but you can toggle them on or off to control which patterns are applied during profiling.

Creating a Custom Pattern

To create a new pattern:

  1. Open the Pattern Options modal.
  2. Select the Add New Pattern tab.
  3. Enter a Pattern Name -- a descriptive label like "Employee ID" or "Product SKU".
  4. Enter the Regex Pattern -- the regular expression that values should match.
  5. Click Save.

Example Custom Patterns

Here are examples of common custom patterns:

  • SSN format: ^\d{3}-\d{2}-\d{4}$ -- Matches Social Security numbers in XXX-XX-XXXX format.
  • Employee ID: ^EMP-\d{6}$ -- Matches IDs like EMP-001234.
  • Product SKU: ^[A-Z]{2,4}-\d{4,8}$ -- Matches codes like ABC-12345.
  • Canadian postal code: ^[A-Za-z]\d[A-Za-z]\s?\d[A-Za-z]\d$ -- Matches formats like K1A 0B1.
  • US state abbreviation: ^[A-Z]{2}$ -- Matches two-letter state codes.

Managing Custom Patterns

On the Custom Patterns tab, you can:

  • Toggle patterns on/off -- Disabled patterns are not applied during profiling. This lets you keep patterns for future use without affecting current analysis.
  • Edit patterns -- Update the name or regex expression of any custom pattern.
  • Delete patterns -- Remove patterns you no longer need.
  • Reset to defaults -- Restore the built-in default patterns if they have been modified.

How Patterns Affect Profiling

When you run a data profile, all enabled patterns (both default and custom) are tested against every field. Fields whose values match a pattern are reported as valid for that pattern type. This drives the validity scores and pattern classification shown in the https://help.matchlogic.io/article/228-validity-and-pattern-analysis charts.

Tip

Start with the default patterns and only add custom patterns when you have domain-specific formats that the defaults do not cover. Too many custom patterns can slow down profiling on very large datasources.