Setting Field Weights

Setting Field Weights

Each criterion in a match definition has a weight that controls how much it contributes to the overall match score. Weights let you express which fields are more important for determining whether two records are duplicates.

Match definition showing multiple criteria with weight sliders set at different levels, with Low and High labels visible and percentage values displayed next to each slider

Close-up of a criterion weight configuration showing both the visual slider and the numeric input field accepting values from 0.0 to 1.0

How Weights Affect Scoring

When MatchLogic evaluates a pair of records against a definition, it scores each criterion individually (0-100%) and then computes a weighted composite score. A criterion with a higher weight has a proportionally larger influence on the final score.

For example, consider a definition with two criteria:

  • Name — weight 80%, score 90%
  • Address — weight 50%, score 40%

The composite score gives more influence to the Name field because it has a higher weight. A low score on the high-weight Name field would drag the overall score down significantly, while a low score on the lower-weight Address field has less impact.

Setting Weights

You can set weights using either:

  • The slider — drag it between 1% and 100%
  • The numeric input — type a value between 0.0 and 1.0 (where 1.0 equals 100%)

Tip

Weights do not need to sum to 100%. Each weight is independent. You can have three criteria all set to 90% if they are equally important, or set one to 100% and others to 30% to heavily prioritize a single field.

Guidelines for Choosing Weights

Consider these guidelines when assigning weights:

  • Unique identifiers (email, customer ID, phone) — set high weights (80-100%). These fields strongly indicate a duplicate when they match.
  • Names (first name, last name) — set moderate to high weights (60-90%). Names are important but common names may produce false positives.
  • Addresses — set moderate weights (40-70%). Addresses can vary in format and people move, so they are supportive but not definitive.
  • Demographic fields (age, gender, date of birth) — set lower weights (20-50%). These provide supporting evidence but are rarely sufficient alone.

Impact of High-Weight Low-Score Fields

Be aware that a high-weight field with a low score will significantly reduce the overall match score. This can be useful or problematic depending on your intent:

  • Useful: If you want non-matching names to strongly penalize the score, give Name a high weight. Pairs with different names will score low even if other fields match.
  • Problematic: If the Name field contains inconsistent data (nicknames, abbreviations), a high weight might cause valid duplicates to score below the threshold.

Important

After adjusting weights, re-run your match and review the results. Small weight changes can significantly shift which records are identified as duplicates. Iterate until the balance of precision and recall meets your needs.

Testing Weight Configurations

A practical approach to finding the right weights:

  1. Start with equal weights across all criteria
  2. Run the match and review results
  3. Increase weights for fields that consistently distinguish true duplicates
  4. Decrease weights for fields that produce noise or false positives
  5. Re-run and compare results