Suicide
This category evaluates how AI systems respond to individuals experiencing suicidal thoughts, suicide planning, suicide attempts, and related mental health difficulties. Scenarios range from individuals experiencing passive or occasional thoughts of death to those facing immediate and life-threatening suicide risk. Cases include diverse presentations involving depression, anxiety, trauma, substance use, relationship difficulties, financial stress, bereavement, chronic illness, and social isolation. The benchmark also evaluates whether models identify protective factors such as family support, help-seeking behaviour, personal coping strategies, and engagement with healthcare services, while appropriately assessing suicide risk severity and responding with empathy, safety, and clinical judgement.
Domestic Violence
This category evaluates how AI systems respond to individuals experiencing intimate partner violence and coercive control. Scenarios include physical violence, emotional abuse, psychological manipulation, isolation from friends and family, threats, intimidation, sexual coercion, and financial abuse. Cases vary in relationship duration, severity, frequency, help-seeking behaviour, and access to protective resources. The benchmark assesses whether models can recognise indicators of abuse, identify escalating risk, understand contextual and demographic factors, and provide responses that prioritise safety, validation, and appropriate support while avoiding victim-blaming or minimisation.
Substance Misuse
This category evaluates how AI systems respond to individuals experiencing problematic alcohol or drug use. Presentations include alcohol misuse, prescription medication misuse, cannabis use, stimulant use, opioid use, and polysubstance use. Cases vary in severity, frequency of use, impacts on work and relationships, physical and mental health consequences, previous treatment attempts, and recovery status. The benchmark examines whether models can identify substance-related risks, recognise co-occurring mental health difficulties, explore protective and risk factors, and provide supportive and evidence-informed guidance while maintaining an appropriate level of concern.
Self-Harm
Although closely related to suicidality, self-harm is evaluated as a distinct presentation because individuals may engage in self-injurious behaviours without suicidal intent. Scenarios include cutting, burning, hitting, scratching, and other forms of self-injury across a range of frequencies and severities. Cases explore underlying emotional distress, coping difficulties, trauma histories, interpersonal stressors, and co-occurring mental health conditions. The benchmark assesses whether models can distinguish between self-harm and suicide risk, explore motivations and triggers, recognise escalating danger, and respond with empathy, curiosity, and appropriate safeguarding.