Key Concepts and Terminology

key concepts and terminology

Key Concepts and Terminology

This article defines the core terms and concepts used throughout MatchLogic. Familiarizing yourself with this vocabulary will help you navigate the platform and understand how each module works.

Projects and Data

  • Project — The top-level container in MatchLogic. A project holds one or more data sources along with all matching rules, results, merge settings, and exports. Every action in the platform takes place within a project.
  • Data Source — A dataset that has been imported into a project. A data source can originate from a CSV file, an Excel spreadsheet, a database table, or a cloud storage file. A project may contain multiple data sources.
  • Field / Column — A single attribute within a data source, such as FirstName, Email, or PostalCode. Fields are the building blocks of match rules.
  • Record / Row — A single entry in a data source. Each record is made up of values across one or more fields.

Matching Concepts

  • Field Mapping — The association between fields in different data sources that represent the same real-world attribute. For example, mapping FirstName in Source A to First_Name in Source B tells the matching engine these columns should be compared.
  • Match Definition — A named set of match criteria that the engine evaluates together. Each definition produces its own score. You can create multiple definitions within a project to cover different matching scenarios.
  • Match Criteria — A single comparison rule within a definition. A criterion specifies a mapped field pair, a match type, and a weight.
  • Match Type — The comparison algorithm applied to a field pair:
    • Exact — Values must be identical (after standardization) to count as a match.
    • Fuzzy — Uses string similarity algorithms (such as Jaro-Winkler) to measure how close two values are. Tolerates typos, abbreviations, and minor variations.
    • Phonetic — Encodes values by how they sound and compares the encodings. Useful for names with multiple spellings (e.g., "Jon" vs. "John").
    • Numeric — Compares numeric values, accounting for differences within a configurable tolerance.
  • Weight — A number assigned to each criterion that indicates its relative importance. Fields with higher weights contribute more to the overall score. For example, you might give Email a weight of 40 and City a weight of 10.
  • Score — A value between 0 and 100 that indicates how closely two records match for a given definition. The score is calculated from the weighted combination of all criteria within that definition.
  • Confidence Band — A labeled range that categorizes scores into quality tiers: Excellent (95–100), High (80–95), Good (60–80), Moderate (40–60), Low (20–40), and Poor (0–20). Confidence bands help you prioritize review efforts.

Results and Grouping

  • Pair — Two records that the matching engine identified as potential matches. Each pair has a score per definition plus a maximum score across all definitions.
  • Group — A cluster of records that are transitively linked through pairs. If Record A matches Record B, and Record B matches Record C, all three form a group even if A and C did not directly match.
  • Match Key — The field names from the winning definition for a pair. For example, a match key of "FirstName + LastName" tells you which criteria produced the highest score.

Merge and Survivorship

  • Master Record — The single record within a group that is designated as the primary or authoritative version. Master determination is based on rules you configure, such as preferring the longest or most complete record.
  • Golden Record — The final merged output for a group after survivorship rules have been applied. The golden record may combine field values from multiple records in the group.
  • Survivorship Rule — A rule that determines which value wins for a specific field during the merge process. Examples include keeping the longest value, the most popular value, or the value from the master record.
  • Overwrite Rule — A field-level rule that specifies when and how a value should be overwritten during merge. Overwrite rules can include conditions, such as only overwriting when the target field is empty.

Infrastructure

  • Pipeline — The ordered sequence of modules that data flows through within a project. See https://help.matchlogic.io/article/196-the-data-matching-pipeline for details on each step.
  • Background Job — A long-running operation (import, profiling, matching, or export) that executes on the server while you continue working. See #understanding-background-jobs.