Setting Field Weights
Setting Field Weights
Each criterion in a match definition has a weight that controls how much it contributes to the overall match score. Weights let you express which fields are more important for determining whether two records are duplicates.
How Weights Affect Scoring
When MatchLogic evaluates a pair of records against a definition, it scores each criterion individually (0-100%) and then computes a weighted composite score. A criterion with a higher weight has a proportionally larger influence on the final score.
For example, consider a definition with two criteria:
- Name — weight 80%, score 90%
- Address — weight 50%, score 40%
The composite score gives more influence to the Name field because it has a higher weight. A low score on the high-weight Name field would drag the overall score down significantly, while a low score on the lower-weight Address field has less impact.
Setting Weights
You can set weights using either:
- The slider — drag it between 1% and 100%
- The numeric input — type a value between 0.0 and 1.0 (where 1.0 equals 100%)
Tip
Weights do not need to sum to 100%. Each weight is independent. You can have three criteria all set to 90% if they are equally important, or set one to 100% and others to 30% to heavily prioritize a single field.
Guidelines for Choosing Weights
Consider these guidelines when assigning weights:
- Unique identifiers (email, customer ID, phone) — set high weights (80-100%). These fields strongly indicate a duplicate when they match.
- Names (first name, last name) — set moderate to high weights (60-90%). Names are important but common names may produce false positives.
- Addresses — set moderate weights (40-70%). Addresses can vary in format and people move, so they are supportive but not definitive.
- Demographic fields (age, gender, date of birth) — set lower weights (20-50%). These provide supporting evidence but are rarely sufficient alone.
Impact of High-Weight Low-Score Fields
Be aware that a high-weight field with a low score will significantly reduce the overall match score. This can be useful or problematic depending on your intent:
- Useful: If you want non-matching names to strongly penalize the score, give Name a high weight. Pairs with different names will score low even if other fields match.
- Problematic: If the Name field contains inconsistent data (nicknames, abbreviations), a high weight might cause valid duplicates to score below the threshold.
Important
After adjusting weights, re-run your match and review the results. Small weight changes can significantly shift which records are identified as duplicates. Iterate until the balance of precision and recall meets your needs.
Testing Weight Configurations
A practical approach to finding the right weights:
- Start with equal weights across all criteria
- Run the match and review results
- Increase weights for fields that consistently distinguish true duplicates
- Decrease weights for fields that produce noise or false positives
- Re-run and compare results