Why Data Quality Matters
Bad data leads to bad decisions:Data Quality Dimensions
AnomalyArmor monitors key data quality dimensions:| Dimension | What It Means | How We Monitor |
|---|---|---|
| Freshness | Is data up to date? | Timestamp monitoring, SLAs |
| Completeness | Did the right amount arrive? | Row count monitoring, ML anomaly detection |
| Metrics | Are column values correct? | Statistical monitoring, anomaly detection |
| Schema | Is structure correct? | Schema drift detection |
| Availability | Is data accessible? | Discovery success/failure |
Monitoring Capabilities
Freshness Monitoring
Track when data was last updated and detect stale data before it impacts downstream consumers.
Row Count Monitoring
Monitor row counts with ML-based anomaly detection or explicit thresholds.
Data Quality Metrics
Monitor null percentages, distinct counts, and other column-level statistics. Detect anomalies automatically.
Schema Monitoring
Detect structural changes to your database that could break pipelines and reports.
Report Badges
Embed data quality status indicators in dashboards, wikis, and operational tools.
How Data Quality Monitoring Works
- Discovery runs on your configured schedule, metrics captured at defined intervals
- Metadata collected including schema, timestamps, and metric values (row counts, null %, etc.)
- Compared against expectations (SLAs, statistical baselines, previous state)
- Alerts fired when expectations aren’t met or anomalies detected
Getting Started
Set Up Freshness Monitoring
- Navigate to an asset
- Click Freshness tab
- Select a timestamp column
- Set your expected update frequency
- Configure alert threshold
Set Up Row Count Monitoring
- Navigate to an asset
- Click Data Quality tab
- Scroll to Row Count Monitoring section
- Click Create Schedule
- Configure time window and check interval
- Choose auto-learn or explicit thresholds
Set Up Data Quality Metrics
- Navigate to an asset
- Click Metrics tab
- Click Create Metric
- Select metric type (null %, distinct count, etc.)
- Configure capture interval
- Enable anomaly detection (optional)
Set Up Schema Monitoring
Schema monitoring is automatic once you:- Connect a data source
- Run discovery
- Configure alert rules for schema changes
Best Practices
Start with Critical Assets
Don’t monitor everything at once. Focus on:- Revenue-impacting tables: Orders, payments, transactions
- Customer-facing data: Data that powers dashboards and reports
- Compliance-required data: Audit logs, regulatory reports
Set Realistic Expectations
Match SLAs to actual data patterns:| Data Type | Typical Freshness |
|---|---|
| Real-time events | Minutes |
| Hourly ETL | 1-2 hours |
| Daily batches | Same-day |
| Weekly reports | 1 week |
Layer Your Monitoring
Combine multiple checks for comprehensive coverage: Critical table (orders):- Freshness: Alert if >2 hours stale
- Completeness: Alert if row count drops >50%
- Metrics: Alert if null_percent exceeds 5%
- Schema: Alert on any column removed
- Availability: Alert if discovery fails
Data Quality Dashboard
View overall data health in the Assets section:| Indicator | Meaning |
|---|---|
| Green | All checks passing |
| Yellow | Warning threshold reached |
| Red | SLA violated or issue detected |
| Gray | Not monitored |
Related Topics
Freshness Monitoring
Set up freshness SLAs
Row Count Monitoring
Monitor row counts with ML anomaly detection
Data Quality Metrics
Track column-level statistics and detect anomalies
Alert Rules
Configure data quality alerts
Report Badges
Embed quality status in external tools
