Validity and Pattern Analysis

Validity and Pattern Analysis

The Validity and Pattern Analysis charts work together to show you how well your data conforms to expected formats. The validity bar chart counts valid versus invalid values per field, while the Pattern Classification treemap visualizes the structural patterns discovered in your data.

Validity bar chart and Pattern Classification treemap

Data Validity

The validity chart displays a bar for each field, split into valid and invalid segments. Validity is determined by comparing field values against detected or custom regex patterns:

  • Valid values conform to the expected pattern for that field's detected type (e.g., a properly formatted email address in an email field).
  • Invalid values do not match the expected pattern, suggesting malformed data, entry errors, or mixed data types.

Hover over any bar segment to see the exact count and percentage. A field with 92% validity means that 8% of its non-null values do not match the expected format.

Pattern Classification Treemap

The treemap visualization groups the structural patterns discovered across your data. Each block represents a distinct pattern, and the block's size reflects how frequently that pattern appears. Common patterns you might see include:

  • Email patterns -- AAAA@AAAA.AAA with variations in length and domain structure
  • Phone patterns -- (NNN) NNN-NNNN, NNN-NNN-NNNN, NNNNNNNNNN
  • Name patterns -- AAAA AAAA (first last), AAAA A. AAAA (first middle-initial last)
  • Date patterns -- NN/NN/NNNN, NNNN-NN-NN
  • Zip code patterns -- NNNNN, NNNNN-NNNN

Click on any treemap block to see which records match that particular pattern.

What Patterns Reveal

Pattern analysis is especially valuable for identifying:

  1. Format inconsistencies -- Multiple phone number patterns in the same field indicate lack of standardization. Apply cleansing to normalize to a single format before matching.
  2. Data entry errors -- Rare patterns with very small blocks may represent typos or malformed entries.
  3. Mixed content -- If a "Name" field contains patterns that look like addresses or codes, records may have been entered in the wrong column.
  4. International variations -- Different country formats for phones, dates, or postal codes appearing in the same field.

Custom Validation Patterns

MatchLogic comes with built-in patterns for common data types. You can also define custom regex patterns for domain-specific validation. For example, you might create a pattern for your internal product codes or employee ID formats. See #custom-regex-patterns for details on creating and managing custom patterns.

Tip

Use pattern analysis to guide your data cleansing strategy. If you see three different phone number formats, you know exactly which formats to target with replace and remove operations in the #introduction-to-the-flow-builder.

Important

Low validity does not always mean bad data. It may indicate that the auto-detected type is incorrect for that field. If a "Notes" field is detected as text but shows low validity, that is expected -- free-form text fields do not follow structural patterns.