Project Statistics
Project Statistics
MatchLogic provides statistical summaries at both the project level and the global workspace level to help you understand your data landscape and track progress across your matching projects.
Per-Project Statistics
Each project card on the dashboard displays the following key metrics:
- Total Records Imported — The combined number of records across all datasources in the project. This count reflects the raw data as imported, before any cleansing or deduplication.
- Number of Datasources — How many separate datasources (files, database tables, or cloud files) have been imported into this project. A project can have one datasource for within-source deduplication or multiple datasources for cross-source matching.
- Match Rate — The percentage of records that were identified as duplicates or matches after the matching stage completes. This metric only appears once a match job has been run successfully.
- Duplicate Count — The total number of records flagged as duplicates. This is the absolute number behind the match rate percentage.
Global Statistics
At the top of the Project Management dashboard, a global statistics row aggregates data across all projects in your workspace:
- Total Records — The sum of all records across all datasources in every project. This gives you a sense of the overall data volume being managed in MatchLogic.
- Total Duplicates — The aggregate count of duplicate records found across all projects that have completed matching. This number helps you understand the overall impact of your data quality efforts.
Tip
Global statistics are especially useful for reporting to stakeholders. You can quickly cite the total volume of data processed and the total number of duplicates found across all your matching projects.
When Statistics Update
Statistics are not static. They refresh automatically at specific points in the pipeline:
- After Data Import — Record counts and datasource counts update as soon as an import job completes.
- After Matching — Match rate and duplicate count appear (or update) after a match job finishes.
- After Merge and Survivorship — If master record determination changes which records are flagged as duplicates, the counts reflect the updated state.
- After Export — Statistics remain stable after export, as export does not change the underlying data.
Understanding Match Rate
The match rate percentage is calculated as the number of records involved in at least one match pair divided by the total number of records. A higher match rate means more potential duplicates were found. Keep in mind:
- A high match rate does not necessarily mean poor data quality. It depends on your matching criteria and thresholds.
- If you are running cross-source matching, the match rate reflects matches across all datasources, not just within a single source.
- Adjusting match definitions (adding or removing criteria, changing weights) and re-running the match will update these statistics.
Important
Statistics reflect the most recent completed run for each pipeline stage. If you re-import data or re-run a match, the statistics will update to reflect the latest results, and previous values are not retained.
Using Statistics to Guide Your Workflow
Project statistics can inform your next steps. If the match rate seems unexpectedly low, consider reviewing your match definitions or profiling data to check for quality issues. If the duplicate count is very high, you may want to tighten your matching criteria or review results in the Match Results module before proceeding to merge. See https://help.matchlogic.io/article/207-the-project-dashboard for more about the dashboard layout.