OpenEnv ยท V10 ยท GRPO Training Environment

๐Ÿฅ Digital Hospital

Deterministic multi-agent clinical RL environment โ€” 11 autonomous physician agents, 47 patient cases, dense process-supervision rewards at every step.

Server Running
๐Ÿ‘ฅ 11 Roles
๐Ÿ—‚ 47 Patient Cases
๐Ÿ“‹ 550 MCQ Questions
๐Ÿ”ง 50+ Tools
โšก Dense Rewards [0.01 โ€“ 0.99]
๐Ÿ‘ฅ Role Registry โ€” All 11 Agents
IDTitleDivisionTier
C01Clinical Research ExpertEvidence-Based Medicine & Clinical ResearchConsultant
D01Clinic DirectorExecutive LeadershipDirector
S01Emergency PhysicianEmergency MedicineSpecialist
S02IntensivistIntensive Care UnitSpecialist
S03Consultant CardiologistCardiovascular MedicineSpecialist
S04Consultant SurgeonGeneral & Acute SurgerySpecialist
S05Consultant NephrologistNephrology & Renal MedicineSpecialist
S06Senior HospitalistGeneral Internal MedicineSpecialist
S07Consultant NeurologistNeurologySpecialist
S08Infectious Disease PhysicianInfectious Disease & MicrobiologySpecialist
S09Clinical PharmacistPharmacy & Medicines OptimisationSpecialist
๐Ÿ”„ Episode Structure โ€” 3 Phases
Phase 1 โ€” All Roles
MCQ Assessment
50 role-specific questions
Binary 0.99 / 0.01 reward
25 โ€“ 30% of score
โ†’
Phase 2 โ€” Role-Dependent
Clinical Operations
S01โ€“S09: 5 patient cases (Easy/Med/Hard)
D01: 2 own cases + full specialist review
C01: Inbox-driven research responses
40 โ€“ 70% of score
โ†’
Phase 3 โ€” S01โ€“S09 & C01
Review & Reflection
Receive D01 feedback email
Reply scored on clinical depth
5 โ€“ 10% of score
๐Ÿ’ฐ Treatment Grading โ€” 4 Tiers
Comprehensive
0.55 โ€“ 0.85
Full workup + correct Dx
Inconclusive
0.15 โ€“ 0.25
Thorough workup, wrong Dx
Expedited
0.10
Correct guess, no workup
Insufficient
0.01
No workup + wrong Dx

Guessing without investigation is actively penalised โ€” GRPO must learn process quality independently of correctness.

๐Ÿ“Š Baseline โ€” google/gemma-3-4b-it ยท seed=42 ยท All 11 Roles
S01
Emergency Physician
0.210
MCQ 28% ยท ClinOps 0.20
S02
Intensivist
0.240
MCQ 54% ยท ClinOps 0.15
S03
Cardiologist
0.220
MCQ 40% ยท ClinOps 0.18
S04
Surgeon
0.250
MCQ 52% ยท ClinOps 0.17
S05
Nephrologist
0.280
MCQ 58% ยท ClinOps 0.19
S06
Sr. Hospitalist
0.260
MCQ 52% ยท ClinOps 0.18
S07
Neurologist
0.260
MCQ 66% ยท ClinOps 0.13
S08
Infect. Disease
0.280
MCQ 70% ยท ClinOps 0.15
S09
Pharmacist
0.200
MCQ 38% ยท ClinOps 0.15
C01
Research Expert
0.450
MCQ 74% ยท ClinOps 0.38
D01
Clinic Director
0.540
MCQ 38% ยท ClinOps 0.32
AVG
All 11 Roles ยท Untrained
0.290
GRPO training starting point
๐Ÿ”Œ API Endpoints
POST/reset
POST/step
GET/state
POST/close
GET/health
GET/metadata
GET/schema
GET/action_space
POST/inject_email orch.
POST/resume orch.
POST/inject_clinic_data orch.
๐Ÿš€ Quick Start
# 1. Start episode for Emergency Physician curl -X POST /reset -H "Content-Type: application/json" \ -d '{"role_id": "S01", "seed": 42}' # 2. Answer MCQ question (Phase 1) curl -X POST /step -H "Content-Type: application/json" \ -d '{"command": "answer_mcq", "arguments": {"answer": "B"}, "reasoning": "..."}' # 3. Run a diagnostic tool (Phase 2) curl -X POST /step -H "Content-Type: application/json" \ -d '{"command": "get_lab_results", "arguments": {}, "reasoning": "Need CBC + troponin"}' # 4. Submit treatment and receive grading curl -X POST /step -H "Content-Type: application/json" \ -d '{"command": "submit_treatment", "arguments": {"icd10_code": "I21.9", "severity": "CRITICAL", "treatment_plan": "Aspirin 300mg stat, heparin, urgent PCI..."}}'