Data Type Distribution
Data Type Distribution
The Data Type Distribution chart shows the data types that MatchLogic has detected for each field in your datasource, along with a confidence percentage for each classification. Understanding detected types helps you spot data quality issues and plan appropriate matching strategies.
Detected Data Types
During profiling, MatchLogic analyzes the values in each field and classifies them into one or more of the following types:
- Text -- General string values (names, descriptions, free-form text)
- Number -- Numeric values (integers, decimals)
- Date -- Date or datetime values in various formats
- Email -- Values matching email address patterns
- Phone -- Values matching telephone number patterns
- Boolean -- True/false or yes/no values
- Identifier -- Values that appear to be codes or IDs (account numbers, SKUs)
Reading the Chart
Each field shows its primary detected type along with a confidence percentage. A field detected as "Email (94%)" means that 94% of non-null values in that field match the email pattern. The remaining 6% are either malformed emails or non-email values mixed into the field.
Some fields may show mixed types. For example, a "Phone" field might display as "Text (65%) / Number (35%)" if some records contain formatted phone numbers with dashes and parentheses while others contain only digits.
What Type Mismatches Reveal
Type distribution issues often indicate data quality problems:
- Phone numbers classified as Text -- May contain letters, dashes, or parentheses. Consider cleansing to extract only digits before matching.
- Numeric fields with Text detected -- Some values may contain non-numeric characters (currency symbols, units, or data entry errors).
- Date fields with mixed formats -- Different date formats (MM/DD/YYYY vs DD-MM-YYYY) can cause matching failures. Standardize before matching.
- ID fields classified as Number -- Numeric IDs with leading zeros (like "00123") may lose their leading zeros if treated as numbers. Ensure they are treated as text for matching.
Impact on Matching Configuration
The detected data type directly influences how you should configure matching for each field:
- Text fields use fuzzy or exact text matching with algorithms like Jaro-Winkler.
- Number fields can use numeric range matching or exact number comparison.
- Phonetic matching works best on name fields that are confirmed as Text type.
Tip
If a field's detected type does not match what you expect, inspect the actual values using the #detailed-analysis-view. This will help you determine whether the data needs cleansing before it can be used effectively in matching.
Important
Type detection is based on pattern analysis, not column metadata from the source system. Even if your database defines a column as an integer, if it contains text values after import, profiling will detect it as mixed type.