Standardization: Case Conversion
Standardization: Case Conversion
Case conversion is one of the most impactful and simplest data cleansing operations you can apply. Inconsistent capitalization is a common reason for missed matches -- "JOHN SMITH", "john smith", and "John Smith" are the same person but will not score 100% on exact matching unless the case is standardized first.
Available Case Conversion Nodes
MatchLogic provides five case conversion operations:
- Uppercase
Converts all characters to uppercase. "john smith" becomes "JOHN SMITH". Best for identifiers, codes, and fields where you want a consistent canonical form for exact matching.
- Lowercase
Converts all characters to lowercase. "John SMITH" becomes "john smith". Useful for email addresses and fields where lowercase is the standard format.
- Title Case
Capitalizes the first letter of each word. "john smith" becomes "John Smith". Appropriate for person names and place names in most Western conventions.
- Proper Case
Context-aware capitalization that goes beyond simple title case. Handles exceptions like "McDonald", "O'Brien", and "van der Berg" more intelligently. Use this for person names when accuracy matters more than speed.
- Reverse Case
Swaps uppercase to lowercase and vice versa. "John Smith" becomes "jOHN sMITH". This is a specialized operation rarely needed in standard data cleansing, but can be useful for correcting data that was accidentally entered with Caps Lock on.
How to Use a Case Conversion Node
- Drag the desired case conversion node from the Standardization category in the left sidebar onto the canvas.
- Connect it into your workflow by drawing an edge from the previous node's output to this node's input.
- Click the node to open its properties panel on the right.
- Select the target field -- the column you want to convert.
- The conversion applies automatically when the workflow is executed.
Which Conversion to Choose
The right choice depends on the field type and your matching strategy:
- For name fields -- Use Proper Case for the best results, or Uppercase if you plan to use exact matching and want a simple canonical form.
- For email fields -- Use Lowercase, as email addresses are case-insensitive by standard.
- For address fields -- Use Uppercase to match postal service conventions, or Title Case for readability.
- For code/ID fields -- Use Uppercase to standardize alphanumeric codes.
Tip
If you are using fuzzy matching (Jaro-Winkler, phonetic), case differences have less impact on match scores. However, standardizing case before matching still improves results because even fuzzy algorithms score higher when inputs are consistently formatted.
Important
Case conversion is a destructive operation -- the original casing is lost. If you need to preserve the original values, use the #copy-field operation first to create a copy of the field, then apply the case conversion to the copy.