Probabilistic Mode

Probabilistic Mode

MatchLogic supports two matching modes: deterministic and probabilistic. This setting changes how the system evaluates and scores record comparisons, and it can have a significant impact on match quality.

Match settings panel showing the Probabilistic mode toggle switch with Deterministic and Probabilistic options

Deterministic Matching (Default)

In deterministic mode, MatchLogic applies your match criteria straightforwardly. Each criterion produces a similarity score, and the weighted composite determines whether a pair matches. Every occurrence of a value is treated equally — the score for matching "John" to "John" is the same regardless of how common the name "John" is in your data.

Deterministic matching is:

  • Predictable — the same inputs always produce the same score
  • Fast — no additional statistical computation is required
  • Easy to understand — scores map directly to the similarity of the field values

Probabilistic Matching

In probabilistic mode, MatchLogic uses statistical models to adjust match scores based on the frequency of values in your data. The key insight is that matching on a rare value is stronger evidence of a true duplicate than matching on a common value.

For example:

  • Two records both have the last name Przybylski — this is a rare name, so matching on it is strong evidence they are the same person. The score receives a boost.
  • Two records both have the last name Smith — this is extremely common, so matching on it alone is weak evidence. The score is adjusted downward.

Probabilistic mode accounts for this by weighting evidence based on how discriminating each value is within the dataset.

When to Use Probabilistic Mode

Probabilistic matching is most beneficial for:

  • Large datasets (tens of thousands of records or more) where statistical patterns are meaningful
  • Messy, inconsistent data where deterministic thresholds produce too many false positives on common values
  • Name-heavy matching where the frequency of names varies dramatically
  • Scenarios requiring higher accuracy and you are willing to accept longer processing times

When to Use Deterministic Mode

Stick with deterministic matching when:

  • Your dataset is small — statistical frequency analysis is less meaningful with few records
  • You are matching on unique identifiers (IDs, emails) where frequency is not a factor
  • Speed is a priority — deterministic mode runs faster
  • You want simple, explainable scores — easier to communicate results to stakeholders

Tip

Try running the same match in both modes and compare the results. Probabilistic mode often reduces false positives on common values while maintaining true duplicates on rare values. The Match Results summary report makes it easy to compare quality metrics.

Important

Probabilistic mode increases processing time because MatchLogic must first analyze the frequency distribution of values across your dataset before scoring pairs. For very large datasets, expect the match to take noticeably longer.

Finding This Setting

The probabilistic/deterministic toggle is located in the match settings area of the Match Definitions page. Select your preferred mode before running the match. You can switch modes and re-run at any time to compare results.