EnterTheLoopentertheloop
For Clinicians
Sovereign AIBlogAboutContact
EnterTheLoopentertheloopClinicians powering AI alignment, training & safety.

Verified against

GMCNMCGPhCHCPC

Follow

Register→
© 2026 EnterTheLoop Ltd  ·  Built in Britain
PrivacyTermsCookies
EnterTheLoopentertheloop

Clinicians powering AI alignment, training & safety.

PrivacyTermsCookies
© 2026 EnterTheLoop Ltd · Built in Britain
Phase 2 · Full Calibration

The Full Calibration.

A complete statistical calibration across all eight RLHF task types. It produces your reliability score with confidence intervals on every metric — the number AI companies see when they match you to paid work.

Start training — freeThe methodology

At a glance

Duration
60–90 min
Task types
8
Output
Reliability Report
Price
Free
Why it matters

The score on the record.

Where Phase 1 is a private diagnostic, Phase 2 is the procurement-grade measurement. It produces your reliability score with statistical confidence intervals and the buyer-facing Reliability Report — the evidence that gets you matched to paid evaluation work.

Coverage

All eight task types.

Real RLHF work spans more than rating answers. Phase 2 exercises the full range, so your score reflects how you actually perform across the work buyers need.

01

Rating

Score a single AI response against a quality scale.

02

Comparison

Choose the better of two responses, and say why.

03

Ranking

Order several responses from best to worst.

04

Rubric

Assess against structured, dimensional criteria.

05

Correction

Fix an unsafe or incorrect response.

06

Annotation

Label spans and structure in clinical text.

07

Justification

Write the clinical reasoning behind a judgement.

08

Red-team

Probe for failure modes across the safety taxonomy.

The statistics

Measured, not asserted.

Every metric is reported with uncertainty, not as a bare number — so a score means what it says.

Per-metric confidence intervals

Beta-Binomial intervals for proportions, bootstrap intervals for continuous metrics — uncertainty quantified on every score.

Lower confidence bounds

Certification reads the lower bound, not the point estimate, so your score reflects worst-case plausible performance.

Per-category coverage

Performance broken down across task types and the safety taxonomy, recorded in the report.

The deliverable

A Reliability Report on record.

You finish Phase 2 with a procurement-grade Reliability Report — PDF for people, JSON for systems — that you can share with employers and that backs every match we make.

How we score it →

Get calibrated.

Work through Foundation and the Quick Screen, then complete the Full Calibration to put a reliability score on your profile — all free.

Register freeBack to training
EnterTheLoopentertheloopClinicians powering AI alignment, training & safety.

Verified against

GMCNMCGPhCHCPC

Follow

Register→
© 2026 EnterTheLoop Ltd  ·  Built in Britain
PrivacyTermsCookies
EnterTheLoopentertheloop

Clinicians powering AI alignment, training & safety.

PrivacyTermsCookies
© 2026 EnterTheLoop Ltd · Built in Britain