Match Type — Similar Text

Match Type — Similar Text

Similar Text, also known as fuzzy matching, is a comparison method that tolerates minor differences between values. Instead of requiring an exact character-by-character match, it calculates a similarity score that reflects how close two values are to each other.

A match criterion row with Similar Text selected as the match type, showing the Match Strictness slider ranging from lenient on the left to strict on the right

How Similar Text Works

Similar Text uses edit-distance algorithms to measure how many changes (insertions, deletions, substitutions) are needed to transform one value into the other. The result is a similarity score from 0% to 100%, where 100% means the values are identical and lower scores indicate increasing differences.

Examples:

  • John vs Jon — approximately 85% similarity (one character difference)
  • Robert vs Robart — approximately 83% similarity (one substitution)
  • 123 Main Street vs 123 Main St — high similarity despite the abbreviation
  • Smith vs Johnson — very low similarity (completely different values)

Configuring Match Strictness

The Match Strictness slider controls the minimum similarity score required for two values to be considered a match. This is the most important setting for fuzzy matching:

  • Lower strictness (e.g., 60-70%) — more lenient. Catches more potential matches, including ones with significant differences. Higher risk of false positives.
  • Higher strictness (e.g., 85-95%) — stricter. Only very similar values match. Fewer results but higher accuracy.

Tip

A good starting point for name fields is 75-85% strictness. For address fields, try 70-80%. Run a match, review the results, and adjust the strictness up or down based on the quality of matches you see.

Best Use Cases

Similar Text is the right choice for fields that may contain:

  • Typos — data entry errors like "Micheal" instead of "Michael"
  • Abbreviations — "St" vs "Street", "Corp" vs "Corporation"
  • Minor formatting differences — extra spaces, missing punctuation
  • Transliterations — slightly different spellings of the same word

When to Use a Different Match Type

If the values sound alike but are spelled very differently (such as "Catherine" vs "Katherine"), consider using #match-type-sounds-alike instead. If the values are structured identifiers that must be identical, use https://help.matchlogic.io/article/257-match-type-exact-match.

Important

Very short field values (1-3 characters) can produce misleading similarity scores. A single-character difference between "Al" and "Bo" results in a 0% score even though both are short names. For very short fields, Exact Match may be more appropriate.

Combining with Other Match Types

Similar Text works well as one criterion among several in a definition. For example, pair a fuzzy name comparison with an exact postal code match to find people with similar names at the same location. See #setting-field-weights to learn how to balance multiple criteria.