Skip to content

06. Data Quality and Testing Layers

⏱️ Estimated time: 22 min | Level: advanced

ELI5 callback: In the car factory, the loading dock sets arrival rhythm, the conveyor belt sets work rhythm, the showroom exposes finished output, the reject bin protects trust, and the manifest explains every move. This file teaches how the factory keeps bad parts from spreading.

Trust is built in layers

See. Data quality is not a final checkpoint only.

Quality must exist at ingestion, transformation, and serving.

One perfect dashboard cannot rescue poisoned upstream records.

Great Expectations, dbt tests, and schema registries each solve different slices.

Quality work starts by defining what can go wrong.

Missing values, duplicate keys, and broken types are common.

Business rule violations are even more painful.

So what to do?

Define critical tables and critical columns first.

Attach explicit expectations to them.

Schema contracts catch structural drift early.

Content tests catch logical drift later.

Freshness tests catch silent pipeline stalls.

Volume tests catch load anomalies.

Distribution tests catch suspicious behavior shifts.

Simple, no?

Trust grows when failures are classified and routed clearly.

Quality without ownership becomes background noise.

Use the right quality control at the right layer

Schema registry works well for strongly governed event streams.

Producers know what fields are allowed before publishing.

dbt tests work well inside modeled warehouse layers.

Great Expectations helps express richer validation suites.

Quarantine tables preserve bad records for later inspection.

Hard fails stop corruption, but can block critical flows.

Soft fails keep flows alive, but may hide risk.

Now watch.

Severity must match business consequence.

┌──────────┐ schema ┌──────────┐ tests ┌───────────┐ │ Ingest │──────────▶│ Raw zone │──────────▶│ Models │ └──────────┘ └──────────┘ └────┬──────┘ │ fail hard ────┤ quarantine ──▶│ Bad rows └─────────

A missing optional field is not the same as a broken primary key.

One outlier order may be harmless.

A thousand outlier orders may mean upstream logic changed.

Quality metrics need trend views, not only pass-fail lights.

Store failed samples with reasons.

Operators need examples to debug quickly.

Tests should run close to the transformation that knows the intent.

Late testing increases blast radius.

Early testing increases confidence.

Anomaly detection helps, but basics come first

Anomaly detection helps when exact rules are hard to encode.

It is useful for volume, drift, and unusual null spikes.

But anomaly tools should not replace basic contracts.

See.

Start with deterministic rules before statistical ones.

Statistical alerts can be noisy during seasonality or launches.

Backtest thresholds on historical data.

Calibrate by consumer impact, not mathematical novelty.

Data quality incidents need the same rigor as service incidents.

Track detection time, scope, and recovery steps.

Root cause may sit in code, source, or business process.

Quality dashboards must separate failing source from failing test suite.

Otherwise teams debug the wrong layer.

So what to do?

Build runbooks for common failures.

Define when to quarantine, when to drop, and when to continue.

Document acceptable data debt for non-critical tables.

That prevents drama during minor blips.

Make failures clear and actionable

Make tests cheap enough to run frequently.

Make failures informative enough to act immediately.

Make ownership obvious on every asset.

Tie alerts to services, not to vague team names.

Sample bad data safely for humans to inspect.

Keep PII handling compliant even in error stores.

Version schemas and expectations together where possible.

Review false positives every month.

Think again using the factory analogy.

The loading dock screens incoming parts, the conveyor belt checks fit at every station, the showroom should display trusted cars only, the reject bin holds suspicious pieces, and the manifest records every rule breach.

Simple, no?

Teams trust data when surprises become visible early.

They distrust data when quality rules are secret or inconsistent.

Invest first in high-value tables, not everything equally.

Quality maturity is layered, not magical.

Contracts, tests, quarantine, and review must work together.

That stack protects business decisions.

That is why quality is platform work.

Where this lives in the wild

  • Great Expectations often fronts custom validation suites in Python-heavy teams.
  • dbt tests are common for warehouse-centric analytics models.
  • Schema registries are crucial where event contracts drive many consumers.
  • Quarantine tables are common when bad data must be inspected, not discarded.

Pause and recall

  • Why should quality checks exist before the serving layer?
  • When is a hard fail better than a soft fail?
  • Why are trend views better than isolated pass-fail checks?
  • What must anomaly detection never replace?

Interview Q&A

Q: What is the first step in a quality program? A: Define critical assets and the failures that matter most. Common wrong answer to avoid: Add fancy anomaly tooling everywhere.

Q: Why quarantine bad records? A: It preserves evidence and avoids silent downstream corruption. Common wrong answer to avoid: Because storage is cheap.

Q: Where should tests run? A: As close as possible to the layer that knows the intent. Common wrong answer to avoid: Only on final dashboards.

Q: How do you reduce alert fatigue? A: Classify severity and review false positives regularly. Common wrong answer to avoid: Lower every threshold until nothing alerts.

Apply now (5 min)

Pick one important table and write three concrete expectations. Decide which failure should hard fail and which should quarantine. Add one freshness check and one volume check. Write the owner and escalation path for a failed test. Note what bad sample you would retain for debugging.

Bridge. Quality enforced. But where did this data come from? → 07