Quick Start: Cross-File Matching
Quick Start: Cross-File Matching
This tutorial shows you how to match records across two different files. Cross-file matching is useful when you need to link records between systems that share no common identifier -- for example, matching a CRM export against an ERP customer list to find overlapping contacts.
What You Will Need
- Two data files (CSV or Excel) containing records you want to match against each other.
- The files should share at least some comparable fields (such as name, email, address, or phone number), even if the column names differ.
Step 1: Create a Project
Navigate to Project Management and create a new project. Give it a name that reflects the matching task, such as "CRM vs ERP Customer Linking." Select the project to make it active.
Step 2: Import Both Files
Navigate to Data Import. Import your first file by selecting the file type, uploading the file, previewing the data, and clicking Import. Wait for the import job to complete (watch for the notification), then repeat the process for your second file.
When both imports are done, you should see two data sources listed on the Data Import page.
Tip
Give each data source a clear, distinctive name during import (for example, "CRM Contacts" and "ERP Customers"). This makes it easier to identify them in later steps.
Step 3: Profile Both Sources
Navigate to Data Profiling. Run profiling on each data source to understand data quality, column completeness, and value patterns. Compare the two profiles to identify which fields are most suitable for matching. Fields with high completeness and consistency in both sources will produce the best results.
Step 4: Configure Cross-File Matching
Navigate to Match Configuration. This is where you tell MatchLogic how to compare your data sources. Select the Cross-File Matches strategy. This instructs the engine to compare records between the two sources only -- it will not look for duplicates within the same file.
Verify that the pair is listed correctly: Source A paired with Source B. If you have more than two sources and only want to match specific pairs, you can adjust the configuration here. Click Save.
Step 5: Map Fields Between Sources
Navigate to Match Definitions. The field mapping step is especially important for cross-file matching because column names in your two files are likely different.
- Review the Field Mapping table. Each row associates a column from Source A with a column from Source B.
- Use the Auto Map button to let MatchLogic suggest mappings based on column names. Review and correct any mistakes.
- Manually map any fields that auto-mapping missed. For example, map
ContactEmailin Source A toEmail_Addressin Source B.
Step 6: Define Match Criteria
Still in Match Definitions, create a new definition and add criteria. A good starting point for person matching:
FirstName→ Fuzzy match, weight 25LastName→ Fuzzy match, weight 25Email→ Exact match, weight 30Phone→ Exact match, weight 20
Adjust weights based on which fields you trust most. Save the definition.
Step 7: Run the Match
Navigate to Match Results and click Run Match. The engine compares every record in Source A against every record in Source B using your criteria and weights. Wait for the background job to complete.
Step 8: Review Cross-File Results
The results show pairs where one record comes from Source A and the other from Source B. In the Summary tab, review the score distribution. In the Detailed Analysis tab:
- Each pair shows the source of each record so you can confirm cross-file links.
- Review the match score and the contributing fields to understand why two records were paired.
- Use confidence band filters to focus on borderline cases that need manual review.
- Mark pairs as confirmed matches or non-duplicates as needed.
Important
Cross-file matching can produce a large number of pairs if both files are large. Use filters and confidence bands to focus your review on the most relevant results. Start with the highest-scoring pairs and work downward.
For guidance on choosing between cross-file and other strategies, see #choosing-the-right-matching-strategy. For details on interpreting scores, see #understanding-match-quality-scores.