Standardization: Case Conversion

Standardization: Case Conversion

Case conversion is one of the most impactful and simplest data cleansing operations you can apply. Inconsistent capitalization is a common reason for missed matches -- "JOHN SMITH", "john smith", and "John Smith" are the same person but will not score 100% on exact matching unless the case is standardized first.

Case conversion nodes on the canvas with properties panel showing field selector

Available Case Conversion Nodes

MatchLogic provides five case conversion operations:

  • Uppercase

    Converts all characters to uppercase. "john smith" becomes "JOHN SMITH". Best for identifiers, codes, and fields where you want a consistent canonical form for exact matching.

  • Lowercase

    Converts all characters to lowercase. "John SMITH" becomes "john smith". Useful for email addresses and fields where lowercase is the standard format.

  • Title Case

    Capitalizes the first letter of each word. "john smith" becomes "John Smith". Appropriate for person names and place names in most Western conventions.

  • Proper Case

    Context-aware capitalization that goes beyond simple title case. Handles exceptions like "McDonald", "O'Brien", and "van der Berg" more intelligently. Use this for person names when accuracy matters more than speed.

  • Reverse Case

    Swaps uppercase to lowercase and vice versa. "John Smith" becomes "jOHN sMITH". This is a specialized operation rarely needed in standard data cleansing, but can be useful for correcting data that was accidentally entered with Caps Lock on.

How to Use a Case Conversion Node

  1. Drag the desired case conversion node from the Standardization category in the left sidebar onto the canvas.
  2. Connect it into your workflow by drawing an edge from the previous node's output to this node's input.
  3. Click the node to open its properties panel on the right.
  4. Select the target field -- the column you want to convert.
  5. The conversion applies automatically when the workflow is executed.

Which Conversion to Choose

The right choice depends on the field type and your matching strategy:

  • For name fields -- Use Proper Case for the best results, or Uppercase if you plan to use exact matching and want a simple canonical form.
  • For email fields -- Use Lowercase, as email addresses are case-insensitive by standard.
  • For address fields -- Use Uppercase to match postal service conventions, or Title Case for readability.
  • For code/ID fields -- Use Uppercase to standardize alphanumeric codes.

Tip

If you are using fuzzy matching (Jaro-Winkler, phonetic), case differences have less impact on match scores. However, standardizing case before matching still improves results because even fuzzy algorithms score higher when inputs are consistently formatted.

Important

Case conversion is a destructive operation -- the original casing is lost. If you need to preserve the original values, use the #copy-field operation first to create a copy of the field, then apply the case conversion to the copy.