Data Quality Metrics

Data quality metrics let you track statistical properties of your columns over time. AnomalyArmor captures metric values on a schedule, builds historical baselines, and automatically detects when values fall outside expected ranges.

Looking for row count monitoring? Use Row Count Monitoring for tracking row counts with ML-based anomaly detection or explicit thresholds.

Prerequisites: Before creating metrics, you need:

A connected data source with discovery completed
At least one asset (table/view) to monitor

Example scenario: The customer_email column normally has ~3% null values. On Jan 30, null percentage jumped to 12.3%, well outside the expected range band. AnomalyArmor flags this as an anomaly, indicating a potential data quality issue in the source system.

Why Use Metrics

Freshness tells you when data was updated. Completeness tells you how much arrived. Metrics tell you what changed at the column level:

Issue	Freshness	Completeness	Metrics
ETL job failed completely	Detects it	Detects it	Detects it
ETL ran but loaded 0 rows	Might miss it	Catches it	N/A
Data loaded but 50% nulls	Misses it	Misses it	Catches it
Unexpected duplicates	Misses it	Misses it	Catches it
Values outside valid range	Misses it	Misses it	Catches it

Use freshness for “did data arrive on time?” Use row count monitoring for “did the right amount of data arrive?” Use metrics for “is the column-level data quality correct?”

Metric Types

All metrics require a specific column to monitor:

Type	Description	Best For
`null_percent`	Percentage of null values	Detecting missing data
`distinct_count`	Count of unique values	Cardinality monitoring
`duplicate_count`	Count of repeated values	Data quality checks
`min_value`	Minimum numeric value	Range validation
`max_value`	Maximum numeric value	Outlier detection
`avg_value`	Average numeric value	Central tendency
`percentile`	Nth percentile value	Distribution analysis

Creating a Metric

Navigate to the Asset

Go to Assets and select the table you want to monitor.

Open Metrics Tab

Click the Metrics tab on the asset detail page.

Create New Metric

Click Create Metric to open the metric configuration form.

Select Metric Type

Choose the type of metric you want to track:

null_percent: Percentage of null values in a column
distinct_count: Number of unique values
duplicate_count: Number of duplicate values
min/max/avg: Numeric range and central tendency
percentile: Distribution analysis

Need to monitor row counts? Use Row Count Monitoring instead.

Configure Capture Interval

Choose how often to capture the metric:

Interval	Best For
Hourly	High-frequency data, real-time tables
Daily	Most batch ETL pipelines
Weekly	Slowly changing data

Enable Anomaly Detection

Toggle Anomaly Detection on and set sensitivity:

Sensitivity	Meaning	Use When
1.0	Alert at 1 standard deviation	Very sensitive
2.0	Alert at 2 standard deviations	Balanced (recommended)
3.0	Alert at 3 standard deviations	Less sensitive

Start with sensitivity 2.0. Adjust based on false positive rate.

Save Metric

Click Create to save the metric. The first capture will run immediately.

Viewing Metric History

Each metric tracks historical values and displays them as a trend chart:

Value line: Actual metric values over time
Anomaly band: Expected range (mean +/- sensitivity * stddev)
Anomaly points: Values outside the band are flagged

Reading the Chart

Indicator	Meaning
Green line within band	Normal values
Red dot outside band	Anomaly detected
Gray dashed lines	Upper/lower bounds

Which Metric Type Should I Use?

Is my table growing or shrinking unexpectedly?

Use Row Count Monitoring. It provides ML-based pattern learning, time-windowed counting, and explicit threshold support for row count monitoring.

Are there unexpected null values?

Use null_percent on the column that shouldn’t have nulls.Example: Monitor customer_email for null percentage. Alert if nulls exceed historical baseline (e.g., jumps from 2% to 15%).

Are values within expected range?

Use min_value and max_value on numeric columns.Example: Monitor price column. Alert if minimum drops below 0 (invalid) or maximum exceeds historical norms.

Is data being duplicated?

Use duplicate_count on columns that should be unique.Example: Monitor order_id for duplicates. Any duplicates indicate a data quality issue.

How many unique values exist?

Use distinct_count on categorical columns.Example: Monitor country_code distinct count. A sudden increase might indicate invalid data.

Best Practices

Start with High-Impact Metrics

Focus on metrics that catch real problems: Critical table (orders):

Completeness: Catch data loss or duplication (see Row Count Monitoring)
null_percent on order_id: Should never be null
null_percent on customer_id: Should never be null
min_value on total_amount: Should never be negative

Match Capture Interval to Data Freshness

Data Update Pattern	Recommended Interval
Real-time streaming	Hourly
Hourly batch jobs	Hourly
Daily batch jobs	Daily
Weekly aggregates	Weekly

Use Meaningful Sensitivity Values

Scenario	Sensitivity	Rationale
New table, learning patterns	3.0	Reduce noise while learning
Established table, stable patterns	2.0	Balanced detection
Critical data, low tolerance	1.5	More sensitive alerting

Troubleshooting

Metric shows 'No data'

Causes:

Metric was just created and hasn’t captured yet
Capture job failed
Table is empty

Solutions:

Wait for the next scheduled capture (check interval)
Trigger a manual capture: Actions > Capture Now
Check the table has data

Too many false positive anomalies

Causes:

Sensitivity is too low (too sensitive)
Normal data patterns are highly variable
Seasonality not accounted for

Solutions:

Increase sensitivity (e.g., 2.0 to 3.0)
Allow more baseline data to accumulate (30+ days)
Consider if the variation is actually expected

Missing real anomalies

Causes:

Sensitivity is too high (not sensitive enough)
Baseline includes anomalous data
Capture interval too infrequent

Solutions:

Decrease sensitivity (e.g., 3.0 to 2.0)
Reset baseline after fixing data issues
Increase capture frequency

Metric capture failing

Causes:

Database connection issues
Column was renamed or removed
Permission changes

Solutions:

Check data source connection status
Verify column still exists
Check database user permissions

What’s Next

Set Up Metric Alerts

Get notified when metrics detect anomalies

Metrics API

Automate metric management with the API

Report Badges

Embed metric status in dashboards

Alert Rules

Configure where alerts are sent

Getting Started

Core Concepts

Data Sources

Detect Schema Changes

Monitor Data Health

Coverage Tiers

Get Notified

Understand Your Data

Organize & Tag

Guides

Account & Settings

Security

Help

Downloads

Why Use Metrics

Metric Types

Creating a Metric

Viewing Metric History

Reading the Chart

Which Metric Type Should I Use?

Best Practices

Start with High-Impact Metrics

Match Capture Interval to Data Freshness

Use Meaningful Sensitivity Values

Troubleshooting

What’s Next

Set Up Metric Alerts

Metrics API

Report Badges

Alert Rules

Getting Started

Core Concepts

Data Sources

Detect Schema Changes

Monitor Data Health

Coverage Tiers

Get Notified

Understand Your Data

Organize & Tag

Guides

Account & Settings

Security

Help

Downloads

​Why Use Metrics

​Metric Types

​Creating a Metric

​Viewing Metric History

​Reading the Chart

​Which Metric Type Should I Use?

​Best Practices

​Start with High-Impact Metrics

​Match Capture Interval to Data Freshness

​Use Meaningful Sensitivity Values

​Troubleshooting

​What’s Next

Set Up Metric Alerts

Metrics API

Report Badges

Alert Rules

Why Use Metrics

Metric Types

Creating a Metric

Viewing Metric History

Reading the Chart

Which Metric Type Should I Use?

Best Practices

Start with High-Impact Metrics

Match Capture Interval to Data Freshness

Use Meaningful Sensitivity Values

Troubleshooting

What’s Next