Skip to content

AI Product Evals

Use this track for the measurement and release layer: evals, metrics, experimentation, release gates, judge calibration, drift checks, dashboards, and feedback loops.

This track is deliberately separate from agent architecture. Evals are the proof layer that tells you whether an AI product can ship, regress, roll back, or improve.

Module Focus Folder
00 AI evals and release gates 00_ai_evals_release_gates/
01 Dataset and golden set operations 01_dataset_golden_set_operations/ (placeholder)
02 Telemetry and feedback loops 02_telemetry_feedback_loops/ (placeholder)
03 AI release management 03_ai_release_management/ (placeholder)