Note: This article discusses sensitive topics like suicide and self-harm. If you or someone you know is in danger, please call the national suicide and crisis lifeline at 988.
LLM-powered chatbots have brought humans and technology closer together than ever before–but at what cost? Many people have begun turning to LLMs for advice, seeking guidance on anything from fitness plans to interpersonal relationships. But for society’s most vulnerable minds (e.g., adolescents, the elderly, and those with mental health conditions), this intimacy presents a hidden danger.
These tools can descend into something darker: enablers for suicide and self-harm (SSH). Chatbots have been known to reinforce SSH ideation, even encouraging users to self-harm. Most (if not all) LLMs have policies surrounding SSH, but they often don’t go far enough. To keep users safe, the industry cannot merely write better policies; we must build systems capable of executing clinical nuance at scale. We need a clinically and technically sound approach to successfully prevent harm.
Here’s what that looks like.
Medical Misalignment: How current models fall short
What’s currently missing from chatbots’ underlying models is a demonstrated clinical understanding of how SSH and other harm types (e.g., delusions or dementia, etc.) actually present. Currently, conversations are only flagged and escalated to a human reviewer if the user inputs explicit language like “I want to kill myself. How many pills should I take?” But that’s almost never how it happens.
In reality, conversations involving SSH often start benignly, with a teenager asking for homework help or an elderly person asking for scheduling assistance. Over the course of several sessions, the user might express that they feel lonely, like a burden, or misunderstood.
The danger lies in how standard LLMs process conversational timelines. While modern LLMs have memory and can recall previous prompts, they suffer from context deficit when it comes to safety evaluation—they fail at cumulative risk synthesis. If a user hints at hopelessness in prompt one and asks about painkillers in prompt four, the LLM evaluates the safety of the latter largely in a vacuum. It remembers the words, but it fails to connect the psychological dots to recognize the escalating threat.
What does this lack of clarity and nuance mean? Classic warning signs get missed and vulnerable users may follow through on their SSH ideations. To improve user safety, LLMs must be trained to better evaluate user risk over time.
As part of their risk assessment, clinicians continuously monitor the below factors:
- Biopsychosocial history: The deep context provided during intake.
- Non-verbal and presentation cues: Changes in affect, mood, tone of voice, or even physical presentation (e.g., appearing disheveled).
- Behavioral shifts: Changes in life engagement, activity levels, and evolving symptomology that shift a diagnostic perspective.
While LLMs will never be able to provide the degree of care and attention clinicians do, we can use savvy engineering to move the needle substantially in the right direction.
Technical Targeting: How clinically grounded engineering can make a difference
Standard LLMs are essentially language predictors. They generate responses based on the statistical probability of one word following another. Because of this, when tasked with evaluating user safety, an out-of-the-box LLM defaults to generalized assumptions, scanning for explicit danger words (e.g., “suicide” or “kill”) rather than subtle behavioral shifts.
Pairing AI systems design with clinical psychology can swap this probabilistic modeling for clinical precision. Embedding strict clinical rubrics into the model’s architecture, we force the AI to evaluate intent, situational stressors, and vulnerability like a clinician would. This means translating clinical guidelines into an operational scoring matrix with a dynamic, dimensional framework built on definitions for:
- Acute risk: The immediate presence of a plan, intent, and the means to carry out SSH. The mathematical baseline for a user’s danger level.
- Contextual multipliers: The overall weight of a user’s stressors. Are they in a cycle of chronic ideation? Have they recently experienced a severe setback like a job loss or eviction? These act as risk escalators.
- Protective factors: A critical clinical component often ignored by standard AI. Does the user mention dependents, a desire for therapy, or use recognized harm-reduction techniques? These mitigate the immediate risk score.
- Improper facilitation: A common flaw in LLM safety is permitting users to extract harmful instructions by disguising them as fiction, roleplay, or research—this is one of the main vectors for enabling off-platform harm. Regardless of whether a request is framed as screenplay or a school project, the LLM must refuse to provide actionable details such as dosages, injury methods, or concealment tactics. When physical harm is at stake, stated context never outweighs real-world safety.
Rather than relying on basic keyword identification as a trigger for escalation, the engine weighs a user’s acute risk and contextual vulnerabilities against their protective factors to determine a final total risk acuity score, radically outperforming legacy filters.
But building a clinically sound model is just the first step. Human moderators have a big role to play, too. They are the ones who review the cases escalated by LLMs. To help prepare these teams, engineers and clinicians can work together to build training modules that help moderators understand cumulative risk acuity, recognize user danger, and protect their own mental health as they navigate emotionally impactful scenarios.
If left unaddressed, SSH will become increasingly prevalent in LLM interactions. Getting prevention and intervention right requires collaboration—between clinicians and engineers, and between chatbots and moderators. A true “two sides of the same coin” approach. The good news is, we’re seeing some momentum in the field, and technology companies have begun seeking expert, clinical counsel on how they can enrich their AI offerings to double down on user safety.
Safe Strategy: A smarter, better future for AI
This dual strategy, built on both mental health practices and technological savvy, should be the standard for all AI tools. Any technology company that builds conversational AI tools (or white-labels tools for systemic integration) has a vested interest here; they are potentially liable for their tool’s behavior.
We can no longer afford to treat SSH as an afterthought; it must be treated as a critical safety vector. We need to engineer protections for high-acuity crises into the foundation of our AI tools. While SSH incidents may represent a smaller fraction of total traffic, they are the highest severity interactions a model will ever handle. The ramifications of failure are enormous, resulting in lasting emotional and physical damage or loss of life.
This work is the ultimate “yes, and.” It’s advanced technology and evidence-based psychological health. It’s work that’s difficult and profoundly good for humanity. It’s how we protect the mental health of vulnerable users and the human moderators who intervene. It’s how we all stay safe together.
