The Problem
The Solution
With AnomalyArmor, you’ll know about schema changes before your pipelines run:Setup Guide
Step 1: Connect Your Source Databases
Connect the databases that your pipelines read , not just your warehouse. Common sources to monitor:- Production application databases (the ones your dbt reads from)
- Third-party data sources
- Shared data lakes
Step 2: Schedule Frequent Discovery
For pipeline-critical databases, run discovery frequently:| Database Type | Recommended Schedule | Why |
|---|---|---|
| Application databases | Hourly | Changes can happen anytime |
| Shared warehouses | Every 6 hours | Less frequent changes |
| Third-party sources | Daily | Usually stable |
Step 3: Create Breaking Change Alerts
Set up alerts specifically for changes that break pipelines: Rule: Breaking Schema Changes (Production)| Field | Value |
|---|---|
| Event | Schema Change Detected |
| Data Source | production-app-db |
| Schema | public |
| Assets | All (or list specific tables) |
| Change Type | Column Removed, Table Removed, Type Changed |
| Destinations | Slack #data-engineering, Email data-team@company.com |
Step 4: Time Alerts Before Pipeline Runs
If your dbt runs at 3 AM, schedule discovery at 2 AM:Advanced: Pre-dbt Validation
Option 1: Webhook Integration
Use webhooks to fail your pipeline early if breaking changes are detected:- Set up a webhook destination in AnomalyArmor
- Point it at a validation endpoint in your orchestrator
- If webhook fires, block the dbt run
- AnomalyArmor Alert fires on schema change
- Webhook sent to Airflow/Dagster
-
Set flag:
schema_changes_detected = true - dbt task checks flag before running
- If flag = true: Fail fast with meaningful error
Option 2: Discovery Schedule Alignment
Align discovery with your orchestration schedule:What to Do When Alerts Fire
Immediate Actions
- Acknowledge the alert: Let your team know you’re investigating
- Check the change details: View in AnomalyArmor: what changed, when, and on which asset
- Assess impact: Which models/dashboards use this table?
If the Change is Breaking
- Pause affected pipelines (if possible before they run)
- Update your dbt models to handle the change
- Test locally with the new schema
- Deploy the fix before the next scheduled run
If the Change is Expected
- Document it: Note in AnomalyArmor or your team wiki
- Update downstream: Ensure all dependents are updated
- Consider communication: Should you announce to stakeholders?
Model Dependency Mapping
Know which models depend on which tables: Source Table:production.orders
stg_orders(staging model)int_orders_enriched(intermediate)fct_orders(fact table)- monthly_revenue (dashboard)
- customer_lifetime_value (analytics)
rpt_daily_orders(report)
dim_order_status(dimension)
production.orders changes, all of these are potentially impacted.
Alert Configuration Examples
| Priority | Rule Name | Event | Scope | Conditions | Destinations |
|---|---|---|---|---|---|
| High | Revenue Table Changes | Schema Change | orders, payments, transactions | Any change | Slack #data-critical, PagerDuty |
| Medium | Dimension Table Changes | Schema Change | dim_*, *_lookup | Column removed or type changed | Slack #data-engineering |
| Low | External Source Changes | Schema Change | external., partner_ | Any change | Email (daily digest) |
Troubleshooting
Pipeline failed but I didn't get an alert
Pipeline failed but I didn't get an alert
- Check discovery timing: Did discovery run before the pipeline?
- Check scope: Is the table included in the alert rule?
- Check conditions: Does the change type match your conditions?
- Verify destination: Is the destination configured correctly?
Too many alerts for non-breaking changes
Too many alerts for non-breaking changes
- Filter change types: Alert only on
Column Removed,Table Removed,Type Changed - Exclude test schemas: Filter out
test_*,dev_* - Separate environments: Different rules for prod vs. staging
Can't connect to production database
Can't connect to production database
- Use a read replica: Monitor the replica instead of primary
- Create a dedicated user: With read-only permissions
- Check network access: Firewall rules, security groups
Checklist
Before going live:- Connected all source databases that feed pipelines
- Discovery scheduled to run before pipeline runs
- Alert rules for breaking changes (column/table removed)
- Alerts routed to the right channel (data engineering team)
- Team knows what to do when alerts fire
- Documented critical table dependencies
Related Resources
Schema Monitoring
Deep dive into schema change detection
Alert Rules
Configure alert conditions
