Deterministic multi-agent clinical RL environment โ 11 autonomous physician agents, 47 patient cases, dense process-supervision rewards at every step.
| ID | Title | Division | Tier |
|---|---|---|---|
| C01 | Clinical Research Expert | Evidence-Based Medicine & Clinical Research | Consultant |
| D01 | Clinic Director | Executive Leadership | Director |
| S01 | Emergency Physician | Emergency Medicine | Specialist |
| S02 | Intensivist | Intensive Care Unit | Specialist |
| S03 | Consultant Cardiologist | Cardiovascular Medicine | Specialist |
| S04 | Consultant Surgeon | General & Acute Surgery | Specialist |
| S05 | Consultant Nephrologist | Nephrology & Renal Medicine | Specialist |
| S06 | Senior Hospitalist | General Internal Medicine | Specialist |
| S07 | Consultant Neurologist | Neurology | Specialist |
| S08 | Infectious Disease Physician | Infectious Disease & Microbiology | Specialist |
| S09 | Clinical Pharmacist | Pharmacy & Medicines Optimisation | Specialist |
Guessing without investigation is actively penalised โ GRPO must learn process quality independently of correctness.