Contributors

Project contributors

K-Bench is built and maintained through collaboration between engineering and independent academic partners.

Engineering and Implementation

Kivira led the technical design and implementation of the platform, including overall system architecture, benchmarking infrastructure, and deployment of the website. Kivira also generated the synthetic patient vignettes and conversation data used throughout the benchmark.

  • Dr. Matthew Vowels (PhD Eng., PhD Appl. Math) led architectural design, overall system conception, clinician rating interface implementation, and vignette generation implementation.
  • Apoorv Jha led engineering implementation and development of the wider system and leaderboard site.

Academic Collaborator

Dr. Laura Vowels (PhD) led the clinical and domain-expertise components of the project at the University of Roehampton.

  • Convened a stakeholder panel of clinicians and individuals with lived experience of AI-supported therapeutic settings.
  • Defined the evaluation rubric used to assess LLM safety.
  • Curated and provided original patient vignette materials that informed synthetic data design and the human-reference conversations used to compare synthetically generated conversations.
  • Organised clinician rating collection; these ratings form the ground truth for benchmarking model performance and calibrating the judge LLM.
  • Shivali Sharma contributed to the academic collaboration and evaluation workstream.

Funding - ESRC Digital Good Network

We would like to thank the following contributors: