Understanding Match Quality Scores
Understanding Match Quality Scores
Every pair of records that MatchLogic identifies as a potential match receives a numeric score. Understanding how scores work helps you interpret results accurately, prioritize your review effort, and fine-tune your matching rules for better precision.
How Scores Are Calculated
When the matching engine compares two records, it evaluates each criterion in your match definition independently. Each criterion produces a similarity score between 0 (no match) and 100 (perfect match) based on the selected match type:
- Exact match — Returns 100 if values are identical, 0 otherwise.
- Fuzzy match — Uses string similarity algorithms (such as Jaro-Winkler) to produce a score reflecting how close the two values are. "John Smith" vs. "Jon Smith" might score 92.
- Phonetic match — Encodes values by pronunciation and compares the encodings. "Stephen" vs. "Steven" would score highly because they sound alike.
- Numeric match — Compares numbers and scores based on proximity within a configured tolerance.
The overall score for a definition is the weighted average of all criteria scores. If you have three criteria with weights of 40, 30, and 30 that score 100, 85, and 90 respectively, the overall score is: (100 x 40 + 85 x 30 + 90 x 30) / 100 = 92.5.
Multiple Definitions and MaxScore
If your project has multiple match definitions, each pair receives a separate score for every definition. The MaxScore is the highest score across all definitions. This is the primary score displayed in the results view and used for sorting and filtering.
For example, a pair might score 45 on "Name + Address" but 95 on "Email + Phone." The MaxScore would be 95, and the winning definition would be "Email + Phone." The match key displayed in results reflects the fields from the winning definition.
Confidence Bands
MatchLogic groups scores into confidence bands to help you quickly assess result quality:
| Band | Score Range | Color | Interpretation |
|---|---|---|---|
| Excellent | 95 – 100 | Green | Very high confidence. These are almost certainly true matches. Typically safe to accept without manual review. |
| High | 80 – 94 | Teal | Strong match. Most of these are true matches, but a quick spot-check is recommended. |
| Good | 60 – 79 | Amber | Moderate confidence. Many are true matches, but some false positives are likely. Review a sample. |
| Moderate | 40 – 59 | Orange | Borderline. A mix of true and false matches. Manual review is recommended for this band. |
| Low | 20 – 39 | Red | Low confidence. Most pairs in this band are likely false positives, but some true matches may exist. |
| Poor | 0 – 19 | Red | Very low confidence. These are unlikely to be true matches. Consider adjusting your match rules. |
Using the Summary Report
The Summary tab in Match Results shows the distribution of scores across confidence bands as a visual chart. A healthy match run typically shows a concentration of pairs in the Excellent and High bands, with fewer pairs in the lower bands. If most pairs fall in the Moderate or Low bands, consider:
- Adjusting your match criteria -- fields may need different match types or weights.
- Cleansing your data -- inconsistent formatting can suppress scores.
- Adding more criteria -- additional fields provide more signal for the engine.
Practical Review Strategy
An efficient approach to reviewing match results:
- Start with the Summary to understand the overall score distribution.
- Accept Excellent matches (95+) with minimal review -- these are high-confidence.
- Spot-check High matches (80-94) to confirm accuracy.
- Manually review Good and Moderate matches (40-79) -- this is where your review time has the most impact.
- Investigate Low and Poor matches only if you suspect the engine is missing true matches at lower thresholds.
Tip
Use the confidence band filter in the Detailed Analysis tab to isolate pairs within a specific score range. This lets you work through borderline cases systematically without wading through high-confidence pairs that need no attention.
Improving Scores
If your scores are lower than expected, revisit your match setup:
- Weights — Make sure high-quality identifying fields (like email or national ID) have the highest weights.
- Match types — Use Fuzzy for name fields that may have typos. Use Phonetic for names with spelling variations. Use Exact for identifiers like email.
- Data cleansing — Standardize formatting, remove extra whitespace, and normalize abbreviations before matching. Even small inconsistencies can reduce fuzzy scores significantly.
For a complete walkthrough of configuring match rules, see https://help.matchlogic.io/article/202-quick-start-first-dedup-project.