The Full Calibration.
A complete statistical calibration across all eight RLHF task types. It produces your reliability score with confidence intervals on every metric — the number AI companies see when they match you to paid work.
At a glance
- Duration
- 60–90 min
- Task types
- 8
- Output
- Reliability Report
- Price
- Free
The score on the record.
Where Phase 1 is a private diagnostic, Phase 2 is the procurement-grade measurement. It produces your reliability score with statistical confidence intervals and the buyer-facing Reliability Report — the evidence that gets you matched to paid evaluation work.
All eight task types.
Real RLHF work spans more than rating answers. Phase 2 exercises the full range, so your score reflects how you actually perform across the work buyers need.
Rating
Score a single AI response against a quality scale.
Comparison
Choose the better of two responses, and say why.
Ranking
Order several responses from best to worst.
Rubric
Assess against structured, dimensional criteria.
Correction
Fix an unsafe or incorrect response.
Annotation
Label spans and structure in clinical text.
Justification
Write the clinical reasoning behind a judgement.
Red-team
Probe for failure modes across the safety taxonomy.
Measured, not asserted.
Every metric is reported with uncertainty, not as a bare number — so a score means what it says.
Per-metric confidence intervals
Beta-Binomial intervals for proportions, bootstrap intervals for continuous metrics — uncertainty quantified on every score.
Lower confidence bounds
Certification reads the lower bound, not the point estimate, so your score reflects worst-case plausible performance.
Per-category coverage
Performance broken down across task types and the safety taxonomy, recorded in the report.
A Reliability Report on record.
You finish Phase 2 with a procurement-grade Reliability Report — PDF for people, JSON for systems — that you can share with employers and that backs every match we make.
Get calibrated.
Work through Foundation and the Quick Screen, then complete the Full Calibration to put a reliability score on your profile — all free.
Verified against
Clinicians powering AI alignment, training & safety.