Completeness: Filled vs Null

completeness filled vs null

Completeness: Filled vs Null

The Completeness chart shows the ratio of filled values to null (missing) values for each field in your datasource. Understanding which fields have high null rates is essential for planning your matching strategy, because fields with excessive missing data cannot reliably contribute to finding matches.

Reading the Chart

Each field in your datasource is represented as a bar. The bar is split into two segments:

  • Filled -- Records where this field has a value (shown in the primary color).
  • Null -- Records where this field is empty, null, or missing (shown in a contrasting color).

The percentage next to each bar indicates the fill rate. A field with 100% filled means every record has a value. A field with 60% filled means 40% of records are missing that value.

What Completeness Tells You

Completeness is one of the most important data quality dimensions for matching. Consider these scenarios:

  • A name field at 99% filled -- Excellent. This field is highly complete and can be used confidently as a matching criterion.
  • An email field at 45% filled -- Less than half of records have an email address. Using email as a required matching criterion would immediately exclude over half your data from potential matches.
  • A middle name field at 30% filled -- Common for optional fields. If you include middle name in matching, use it as a supplementary criterion rather than a primary one.

Impact on Matching

Fields with high null rates affect matching in several ways:

  1. Missed matches -- If a matching definition requires a field that is frequently null, records with missing values cannot be compared on that criterion and may fail to match.
  2. Lower match scores -- When a field is weighted in your matching strategy but one or both records have null values, the overall match score decreases.
  3. Skewed results -- A field that appears unique simply because most values are null can mislead your matching configuration.

What to Do About High Null Rates

If key fields have significant null rates, you have several options:

  • Apply default values through data cleansing -- replace nulls with a placeholder or derived value where appropriate.
  • Exclude the field from required matching criteria -- use it only as a supplementary or optional criterion.
  • Weight it lower in your match definitions -- so null values have less impact on the overall match score.
  • Investigate the source -- high null rates may indicate an import issue or a data entry problem that can be fixed upstream.

Tip

Click on a field in the completeness chart to drill down and see which specific records have null values. This can help you determine whether the nulls follow a pattern (e.g., all from one data source) or are randomly distributed.

Important

Do not confuse empty strings with null values. A field that contains blank spaces may appear filled but is effectively empty. Use the #whitespace-cleaning operations in data cleansing to trim whitespace, which may reveal additional null-equivalent values.