Removing Characters by Type

Removing Characters by Type

MatchLogic provides a set of character-type removal nodes that strip specific categories of characters from field values. These operations are essential for isolating the relevant content in a field -- extracting just the digits from a phone number, removing numbers from a name field, or stripping special characters from an address.

Character removal nodes listed in the Cleansing category

Available Character Removal Nodes

  • Remove Numbers

    Strips all digit characters (0-9) from values. "Suite 100" becomes "Suite ". Use on name or text fields where numeric content is unwanted.

  • Remove Letters

    Strips all alphabetic characters (A-Z, a-z) from values. "Phone: 555-1234" becomes ": 555-1234". Use to extract numeric content from mixed fields.

  • Remove Special Characters

    Strips punctuation and symbols (periods, commas, hyphens, slashes, brackets, etc.) while preserving letters, numbers, and spaces. "O'Brien-Smith" becomes "OBrienSmith". Use to normalize names and addresses that contain inconsistent punctuation.

  • Remove Non-Alphanumeric

    Removes everything except letters and numbers. "John Smith (Jr.)" becomes "JohnSmithJr". Note that spaces are also removed. Useful for creating clean keys for comparison.

  • Remove Non-Alpha

    Removes everything except letters. "John123 Smith!" becomes "JohnSmith". Useful when you need a purely alphabetic value for phonetic matching.

  • Remove Non-Numeric

    Removes everything except digits. "(555) 123-4567" becomes "5551234567". The most common use case is normalizing phone numbers to a consistent digit-only format.

  • Remove Non-Printable

    Removes invisible control characters, zero-width characters, and other non-printable content. These characters are invisible on screen but cause string comparisons to fail. Always apply this operation if your data was imported from legacy systems, web scraping, or copy-pasted from documents.

Common Use Cases

Normalizing Phone Numbers

Phone numbers are stored in countless formats: (555) 123-4567, 555-123-4567, 555.123.4567, +1 555 123 4567. Apply Remove Non-Numeric to reduce all of these to a consistent digit string like "5551234567", which can then be matched exactly.

Cleaning Name Fields

Name fields sometimes contain numbers (from concatenated IDs) or special characters (from encoding issues). Apply Remove Numbers to strip digits, then Remove Non-Printable to handle invisible characters.

Creating Match Keys

Use Remove Non-Alphanumeric to create simplified versions of fields for blocking or exact comparison. "123 Main St., Apt #4" becomes "123MainStApt4".

Order of Operations

When combining character removal with other transformations, consider the order:

  1. Remove non-printable characters first (always safe).
  2. Apply character-type removal next.
  3. Then trim whitespace (removal operations may leave extra spaces).
  4. Apply case conversion last.

Tip

Before applying a character removal operation, use #copy-field to duplicate the field. Apply removal to the copy and keep the original for reference. This way you can compare original and cleansed values in the preview.

Important

Character removal operations cannot be undone once applied to your data. However, within the flow builder, you can use #undo-redo-and-workflow-history to reverse workflow changes, and you can always re-import or re-profile your original data.