Row Count Monitoring

Row Count Monitoring tracks row counts in your tables over time. It detects when data volumes drop unexpectedly (data loss) or spike unusually (duplicate loads), helping you catch ETL issues before they impact downstream consumers.

Why Row Count? Row count monitoring used to be part of Data Quality Metrics. We moved it to its own feature with enhanced capabilities: ML-based pattern learning, time-windowed counting, and explicit threshold support.

Example scenario: The orders table typically receives 45,000-55,000 rows daily. On Jan 30, only 15,234 rows were loaded, a 70% drop indicating a potential ETL failure.

Two Monitoring Modes

Row Count Monitoring offers two approaches to fit different needs:

Auto-Learn Mode (Recommended)

Let AnomalyArmor learn your table’s normal row count patterns:

Aspect	How It Works
Learning period	Collects data for 7+ days to establish baseline
Pattern detection	Identifies daily, weekly, and seasonal trends
Anomaly detection	Uses statistical analysis (mean +/- stddev * sensitivity)
Best for	Tables with consistent, predictable patterns

Auto-learn example (orders table):

Day 1-7:    Learning... collecting baseline data
Day 8+:     Baseline established (avg: 48,000, stddev: 3,200)
            Alerts if row count deviates significantly

Explicit Mode

Set specific row count thresholds when you know exactly what to expect:

Setting	Description
Min rows	Alert if row count falls below this value
Max rows	Alert if row count exceeds this value
Best for	Tables with known, fixed expectations

Explicit example (daily_summary table):

Expected: Exactly 1 row per day
Min: 1, Max: 1
Alert if row count != 1

Time-Windowed Counting

For tables that accumulate data over time, use a timestamp column to count rows within a specific window:

Window	Use Case
1 hour	Real-time event streams
6 hours	Frequent batch loads
12 hours	Twice-daily pipelines
24 hours	Daily batch ETL (most common)
168 hours	Weekly aggregates

Time-windowed counting (orders table with created_at):

Without time window:  COUNT(*) = 5,000,000 (all time)
With 24h window:      COUNT(*) WHERE created_at >= now() - 24h = 48,000

Use time-windowed counting for append-only tables. Without it, row counts only grow, making anomaly detection less useful.

Setting Up Row Count Monitoring

Navigate to the Asset

Go to Assets and select the table you want to monitor.

Open Data Quality Tab

Click the Data Quality tab on the asset detail page, then scroll to the Row Count Monitoring section.

Create Schedule

Click Create Schedule and configure:

Table: Select the table to monitor
Timestamp column: (Optional) For time-windowed counting
Time window: How far back to count rows
Check interval: How often to check (1h, 6h, 12h, 24h)

Choose Monitoring Mode

Select your monitoring approach:Auto-Learn Mode:

Toggle Auto-learn on
Set Sensitivity (1-4, lower = more sensitive)
Wait for learning period to complete

Explicit Mode:

Toggle Auto-learn off
Set Expected min rows
Set Expected max rows

Save and Monitor

Click Create. The first check runs immediately, then continues on your configured interval.

Understanding Results

Status Indicators

Status	Meaning	Action
Healthy	Row count within expected range	None needed
Anomaly	Row count outside expected range	Investigate the cause
Learning	Collecting baseline data	Wait for learning to complete
No Data	No checks have run yet	Check will run on next interval

Anomaly Types

Anomaly	Possible Causes
Row count too low	ETL failure, data loss, filter bug, source issue
Row count too high	Duplicate load, removed filter, upstream spike
Row count zero	Complete ETL failure, wrong table, permissions

Best Practices

Choose the Right Mode

Scenario	Recommended Mode
Data patterns vary naturally	Auto-learn with sensitivity 2-3
Exact expectations known	Explicit with min/max thresholds
New table, unknown patterns	Auto-learn with sensitivity 3-4
Critical data, low tolerance	Auto-learn with sensitivity 1-2

Set Appropriate Windows

Data Pattern	Recommended Window
Real-time streaming	1 hour
Hourly batch jobs	6 hours
Daily batch jobs	24 hours
Weekly aggregates	168 hours

Start Conservative, Then Tighten

Week 1: Use auto-learn with sensitivity 3 (less sensitive)
Week 2-4: Review any anomalies, adjust if too noisy
Month 2+: Tighten to sensitivity 2 once patterns are stable

Row Count vs. Metrics

Feature	Row Count	Data Quality Metrics
Purpose	Monitor row counts	Monitor column statistics
Scope	Table-level	Column-level
ML-based	Yes (auto-learn)	Yes (anomaly detection)
Time windows	Yes	No
Explicit thresholds	Yes	Via checks

Use Row Count Monitoring for: “Did the right amount of data arrive?” Use Metrics for: “Is the data quality correct?” (nulls, duplicates, ranges)

Troubleshooting

Status shows 'Learning' for too long

Causes:

Not enough data points collected yet
Check interval is very long (weekly)

Solutions:

Wait for at least 7 data points (7 days for daily checks)
Consider switching to explicit mode if you know expected values

Too many false positive anomalies

Causes:

Sensitivity is too low (too sensitive)
Natural data variation is high
Seasonality not yet learned

Solutions:

Increase sensitivity (e.g., 2 to 3)
Allow more baseline data (30+ days)
Switch to explicit mode with wider thresholds

Missing real anomalies

Causes:

Sensitivity is too high (not sensitive enough)
Baseline includes anomalous data

Solutions:

Decrease sensitivity (e.g., 3 to 2)
Switch to explicit mode with tighter thresholds

Row count always zero with time window

Causes:

Timestamp column has no recent data
Wrong timestamp column selected
Time window too narrow

Solutions:

Verify timestamp column has data in the window
Check column data type (should be timestamp/datetime)
Widen the time window

What’s Next

Set Up Alerts

Get notified when row count anomalies are detected

Data Quality Metrics

Monitor column-level statistics like null percentages

Freshness Monitoring

Track when data was last updated

Report Badges

Embed row count status in dashboards

Getting Started

Core Concepts

Data Sources

Detect Schema Changes

Monitor Data Health

Coverage Tiers

Get Notified

Understand Your Data

Organize & Tag

Guides

Account & Settings

Security

Help

Downloads

Two Monitoring Modes

Auto-Learn Mode (Recommended)

Explicit Mode

Time-Windowed Counting

Setting Up Row Count Monitoring

Understanding Results

Status Indicators

Anomaly Types

Best Practices

Choose the Right Mode

Set Appropriate Windows

Start Conservative, Then Tighten

Row Count vs. Metrics

Troubleshooting

What’s Next

Set Up Alerts

Data Quality Metrics

Freshness Monitoring

Report Badges

Getting Started

Core Concepts

Data Sources

Detect Schema Changes

Monitor Data Health

Coverage Tiers

Get Notified

Understand Your Data

Organize & Tag

Guides

Account & Settings

Security

Help

Downloads

​Two Monitoring Modes

​Auto-Learn Mode (Recommended)

​Explicit Mode

​Time-Windowed Counting

​Setting Up Row Count Monitoring

​Understanding Results

​Status Indicators

​Anomaly Types

​Best Practices

​Choose the Right Mode

​Set Appropriate Windows

​Start Conservative, Then Tighten

​Row Count vs. Metrics

​Troubleshooting

​What’s Next

Set Up Alerts

Data Quality Metrics

Freshness Monitoring

Report Badges

Two Monitoring Modes

Auto-Learn Mode (Recommended)

Explicit Mode

Time-Windowed Counting

Setting Up Row Count Monitoring

Understanding Results

Status Indicators

Anomaly Types

Best Practices

Choose the Right Mode

Set Appropriate Windows

Start Conservative, Then Tighten

Row Count vs. Metrics

Troubleshooting

What’s Next