Level 1: Foundations

24 modules · 50+ hours · 800+ practice questions

01
1h

Core Competencies & Mental Models

Master the 4 core competencies of AI evaluators and learn cognitive load management strategies for consistent evaluation work.

Included in Level 1
02
4h

Modality-Specific Assessment

Evaluate AI outputs across text, code, image, and multi-modal modalities. Recognize modality-specific quality cues and failure patterns.

Included in Level 1
03
2h

How AI Training Works

Learn about RLHF, the three stages of AI training, different evaluation task types, and the HHH framework.

Included in Level 1
04
2h

Core Evaluation Skills

Master comparison methodology, identifying unrateable prompts, and reading & interpreting rubrics effectively.

Included in Level 1
05
2h

Evaluation Dimensions

Learn to evaluate AI responses across key dimensions: accuracy, hallucination detection, instruction following, and failure type hierarchy.

Included in Level 1
06
2h

Safety Fundamentals

Understand AI safety principles, harm categories, and how to evaluate responses for safety compliance.

Included in Level 1
07
2h 30m

Prompt Engineering & Writing

Master prompt decomposition, quality criteria identification, and structured evaluation approaches for different prompt types.

Included in Level 1
08
3h

Justification Writing

Write clear, defensible justifications using the SPEC framework. Master evidence-based reasoning for evaluation decisions.

Included in Level 1
09
2h 30m

Data Annotation Fundamentals

Master annotation taxonomies, labeling guidelines, and quality control for structured AI training data.

Included in Level 1
10
4h

Ideal Response Description & Rubric Properties

Create Ideal Response Descriptions through prompt analysis. Master criterion properties: atomicity, self-containment, objectivity, specificity, weighting, and golden responses.

Included in Level 1
11
3h 30m

Atomicity Bootcamp

Recognize and decompose non-atomic criteria. Write atomic criteria from scratch and apply atomicity reliably across any task type.

Included in Level 1
12
2h 30m

Instance-Specific Mastery

Write criteria tied to the specific instance, not to generic task templates. Surface what this prompt requires that others would not.

Included in Level 1
13
2h

Self-Containment

Write criteria a reviewer can apply without consulting external materials. Eliminate hidden dependencies and implicit context.

Included in Level 1
14
2h

Objectivity & Thresholds

Convert subjective judgments into concrete thresholds. Build criteria two reviewers would score the same way.

Included in Level 1
15
3h

Applying Rubrics in Practice

Apply rubric properties to real tasks. Master modality-aware evaluation, platform patterns, speed drills, and integration practice.

Included in Level 1
16
2h

Platform Rubric Patterns

Recognize the rubric patterns each major platform expects. Adapt your rubric writing to platform conventions without losing rigor.

Included in Level 1
17
2h

Rubric Speed Drills

Build the muscle to draft sound rubrics under platform-realistic time limits. Cut deliberation time without dropping rubric quality.

Included in Level 1
18
3h

Integration Practice

Combine rubric writing, evaluation, and justification on end-to-end tasks that mirror real platform workflows.

Included in Level 1
19
2h

Citation & Fact-Checking Skills

Find and format citations rapidly. Master source reliability evaluation and platform citation formats.

Included in Level 1
20
1h 30m

Source Reliability

Distinguish strong, weak, and unreliable sources. Apply a repeatable check to every citation before you accept it as evidence.

Included in Level 1
21
1h

Platform Citation Formats

Format citations to match each platform's expected style. Move fast without tripping on platform-specific format rules.

Included in Level 1
22
2h

Platform Navigation & Tools

Navigate platform interfaces and tooling efficiently. Reduce friction so your evaluation time goes into the work, not the UI.

Included in Level 1
23
1h 30m

Time Management & Productivity

Run an evaluation session with a sustainable cadence. Protect attention across the hours where careless mistakes pile up.

Included in Level 1
24
1h

Gating Test Simulations

Walk into the gating exam knowing what to expect. Practice on representative tasks under representative time pressure.

Included in Level 1

Level 2: Advanced

15 modules · 35+ hours

Requires Level 1 completion. Unlocks after you earn your Level 1 certificate.
01
3h

Advanced RLHF Concepts

Deep dive into RLHF failure modes, reward hacking, specification gaming, and alignment faking. Learn to detect and prevent these issues.

02
3h

Inter-Annotator Agreement

Master IAA metrics including Cohen's Kappa and weighted Kappa. Learn platform-specific thresholds and strategies to improve agreement.

03
3h

Model Failure Prompting & Adversarial Testing

Design prompts that systematically expose model failures. Categorize failures, test abstention with logic traps, and use the knowledge check follow-up technique.

04
2h 30m

Dimension Tensions

Navigate HHH (Helpful, Harmless, Honest) conflicts using the priority hierarchy. Master helpfulness vs safety trade-offs.

05
2h

Ambiguous Prompt Interpretation

Apply the CLEAR framework for resolving unclear intent. Distinguish user error from intentional ambiguity and evaluate fairly.

06
2h 30m

Complex Safety Scenarios

Handle dual-use content, professional claims, cross-cultural safety, and novel harm recognition in challenging scenarios.

07
3h 30m

Hierarchical Criteria Structures

Master the GCM framework (Global Category → Meta-Criteria → Unit Properties) for building complex rubrics systematically.

08
3h

Criterion Tension Resolution

Learn the SAFE framework for resolving conflicts between criteria when atomicity, instance-specificity, and other properties conflict.

09
2h 30m

Novel Task Rubric Creation

Master the five-step process for creating rubrics for unfamiliar task types. Build a pattern library for rapid rubric development.

10
2h 30m

Expert Speed Optimization

Achieve 10-15 minute complex task completion at 95%+ quality. Master speed techniques while avoiding time sinks.

11
1h 30m

Advanced Source Evaluation

Handle conflicting sources with the CONFLICT framework. Detect misinformation and evaluate controversial topics objectively.

12
2h 30m

Reviewer & QA Fundamentals

Master the quality rubric for reviewing contributors, feedback frameworks (sandwich, evidence-based, STAR), and SBQ decisions.

13
2h 30m

Calibration & Drift Detection

Master Cohen's Kappa for inter-annotator agreement, detect personal drift patterns, and develop self-calibration techniques.

14
2h

Task Difficulty Assessment

Apply the Marimba checklist for project assessment. Recognize time sinks and calibrate task difficulty before committing.

15
2h

Cross-Platform Optimization

Develop efficient multi-platform workflows and a sustainable evaluation career. Master rapid onboarding, tool proficiency, reviewer-track progression, and specialization.

Ready to start?

Your first module is free. No credit card required.