Level 1: Foundations
24 modules · 50+ hours · 800+ practice questions
Core Competencies & Mental Models
Master the 4 core competencies of AI evaluators and learn cognitive load management strategies for consistent evaluation work.
Modality-Specific Assessment
Evaluate AI outputs across text, code, image, and multi-modal modalities. Recognize modality-specific quality cues and failure patterns.
How AI Training Works
Learn about RLHF, the three stages of AI training, different evaluation task types, and the HHH framework.
Core Evaluation Skills
Master comparison methodology, identifying unrateable prompts, and reading & interpreting rubrics effectively.
Evaluation Dimensions
Learn to evaluate AI responses across key dimensions: accuracy, hallucination detection, instruction following, and failure type hierarchy.
Safety Fundamentals
Understand AI safety principles, harm categories, and how to evaluate responses for safety compliance.
Prompt Engineering & Writing
Master prompt decomposition, quality criteria identification, and structured evaluation approaches for different prompt types.
Justification Writing
Write clear, defensible justifications using the SPEC framework. Master evidence-based reasoning for evaluation decisions.
Data Annotation Fundamentals
Master annotation taxonomies, labeling guidelines, and quality control for structured AI training data.
Ideal Response Description & Rubric Properties
Create Ideal Response Descriptions through prompt analysis. Master criterion properties: atomicity, self-containment, objectivity, specificity, weighting, and golden responses.
Atomicity Bootcamp
Recognize and decompose non-atomic criteria. Write atomic criteria from scratch and apply atomicity reliably across any task type.
Instance-Specific Mastery
Write criteria tied to the specific instance, not to generic task templates. Surface what this prompt requires that others would not.
Self-Containment
Write criteria a reviewer can apply without consulting external materials. Eliminate hidden dependencies and implicit context.
Objectivity & Thresholds
Convert subjective judgments into concrete thresholds. Build criteria two reviewers would score the same way.
Applying Rubrics in Practice
Apply rubric properties to real tasks. Master modality-aware evaluation, platform patterns, speed drills, and integration practice.
Platform Rubric Patterns
Recognize the rubric patterns each major platform expects. Adapt your rubric writing to platform conventions without losing rigor.
Rubric Speed Drills
Build the muscle to draft sound rubrics under platform-realistic time limits. Cut deliberation time without dropping rubric quality.
Integration Practice
Combine rubric writing, evaluation, and justification on end-to-end tasks that mirror real platform workflows.
Citation & Fact-Checking Skills
Find and format citations rapidly. Master source reliability evaluation and platform citation formats.
Source Reliability
Distinguish strong, weak, and unreliable sources. Apply a repeatable check to every citation before you accept it as evidence.
Platform Citation Formats
Format citations to match each platform's expected style. Move fast without tripping on platform-specific format rules.
Platform Navigation & Tools
Navigate platform interfaces and tooling efficiently. Reduce friction so your evaluation time goes into the work, not the UI.
Time Management & Productivity
Run an evaluation session with a sustainable cadence. Protect attention across the hours where careless mistakes pile up.
Gating Test Simulations
Walk into the gating exam knowing what to expect. Practice on representative tasks under representative time pressure.
Level 2: Advanced
15 modules · 35+ hours
Advanced RLHF Concepts
Deep dive into RLHF failure modes, reward hacking, specification gaming, and alignment faking. Learn to detect and prevent these issues.
Inter-Annotator Agreement
Master IAA metrics including Cohen's Kappa and weighted Kappa. Learn platform-specific thresholds and strategies to improve agreement.
Model Failure Prompting & Adversarial Testing
Design prompts that systematically expose model failures. Categorize failures, test abstention with logic traps, and use the knowledge check follow-up technique.
Dimension Tensions
Navigate HHH (Helpful, Harmless, Honest) conflicts using the priority hierarchy. Master helpfulness vs safety trade-offs.
Ambiguous Prompt Interpretation
Apply the CLEAR framework for resolving unclear intent. Distinguish user error from intentional ambiguity and evaluate fairly.
Complex Safety Scenarios
Handle dual-use content, professional claims, cross-cultural safety, and novel harm recognition in challenging scenarios.
Hierarchical Criteria Structures
Master the GCM framework (Global Category → Meta-Criteria → Unit Properties) for building complex rubrics systematically.
Criterion Tension Resolution
Learn the SAFE framework for resolving conflicts between criteria when atomicity, instance-specificity, and other properties conflict.
Novel Task Rubric Creation
Master the five-step process for creating rubrics for unfamiliar task types. Build a pattern library for rapid rubric development.
Expert Speed Optimization
Achieve 10-15 minute complex task completion at 95%+ quality. Master speed techniques while avoiding time sinks.
Advanced Source Evaluation
Handle conflicting sources with the CONFLICT framework. Detect misinformation and evaluate controversial topics objectively.
Reviewer & QA Fundamentals
Master the quality rubric for reviewing contributors, feedback frameworks (sandwich, evidence-based, STAR), and SBQ decisions.
Calibration & Drift Detection
Master Cohen's Kappa for inter-annotator agreement, detect personal drift patterns, and develop self-calibration techniques.
Task Difficulty Assessment
Apply the Marimba checklist for project assessment. Recognize time sinks and calibrate task difficulty before committing.
Cross-Platform Optimization
Develop efficient multi-platform workflows and a sustainable evaluation career. Master rapid onboarding, tool proficiency, reviewer-track progression, and specialization.
Ready to start?
Your first module is free. No credit card required.