(Advanced) Step-Level Execution Statistics
When you expand an entry in the Execution History, you see a per-step breakdown of that execution. This granular view shows what happened inside each pipeline step — how long it ran, how many records it processed, and exactly what went wrong if it failed. Understanding this data helps you optimize workflows and diagnose problems precisely.
Accessing Step Statistics
Navigate to a workflow's detail page and open the Execution History tab. Click any execution row to expand it. The step breakdown appears below the row summary.
What Each Column Shows
- Step — the pipeline step name (Import, Profile, Cleanse, Match, Merge, Export).
- Status — Success, Failed, Skipped, or Not Run. Steps after a failed step show "Not Run" since execution halts at the failure point.
- Started At — the timestamp when the step began processing.
- Duration — how long the step took to complete (or fail). Displayed as seconds or minutes.
- Records In — the number of records passed into the step.
- Records Out — the number of records produced by the step. For steps like Cleanse this equals Records In; for Match it may be lower (records without matches) or represent matched pairs/groups.
- Error — the error message for a failed step. Empty for successful steps.
Identifying Bottlenecks
Duration data across steps helps you identify where execution time is being spent. In a typical large-dataset workflow, the distribution might look like:
- Import: 15% of total time
- Profile: 10% of total time
- Cleanse: 5% of total time
- Match: 65% of total time
- Merge: 3% of total time
- Export: 2% of total time
If Match consistently takes 80%+ of total execution time on a large dataset, this is expected behavior — matching involves comparing record pairs and is the most computationally intensive step. If Profile is taking an unexpectedly long time relative to the dataset size, consider whether you need to include it in every run or whether it can be scheduled less frequently.
Comparing step durations across multiple executions over time reveals trends. A gradual increase in Import duration might indicate the source dataset is growing. A sudden spike in Match duration might indicate a change in match configuration that dramatically increased the number of candidate pairs being evaluated.
Diagnosing Failures with Step Statistics
Step statistics are the fastest path to diagnosing a workflow failure. Rather than reading through logs, the error column in the step breakdown gives you the specific failure message for the step that went wrong. Common failure messages and what they mean:
- "No match configuration found for project" — the Match step is included but the project has no saved match definitions. Go to Match Definitions and configure them.
- "Connection refused" or "Timeout" during Import — the source database was unreachable. Check network connectivity and whether the source system was available at the scheduled run time.
- "No data sources found for project ID" — the Import step found no data sources configured for the project. Verify the project has at least one data source in Data Import.
- "Export destination not configured" — the Export step ran but has no destination settings. Configure export settings in the Final Export module.
- "Field not found: [FieldName]" during Merge — a survivorship rule references a column that was renamed or removed from the dataset. Update the survivorship rules in Merge and Survivorship.
Records In vs. Records Out
Tracking record counts through the pipeline can reveal data quality issues:
- If Records Out from Import is significantly lower than the previous run, the source dataset may have lost records (a truncated table, a file with fewer rows, or a filtered query).
- If Records Out from Match drops to near zero, the matching criteria may be too strict — check your match definitions for configuration changes.
- If Records Out from Cleanse is lower than Records In, cleansing rules may be removing rows. Review the transformation workflow in Data Cleansing.
A sudden drop in records processed between runs is often a leading indicator of an upstream data problem. Set up failure notifications and review step statistics regularly on critical production workflows to catch data issues before they affect downstream consumers.