The Hemingway Report
Posts
#75: What the FDA is hearing about mental health AI

#75: What the FDA is hearing about mental health AI

What I learned from reading all 31 submissions to the FDA DHAC on AI in Mental Health

Steve Duke
November 07, 2025

Hi friends,

As you read this, the FDA Digital Health Advisory Committee is meeting to discuss generative AI in mental health.

The outcomes of this meeting will influence how AI in mental health is regulated, which will, in turn, directly influence the strategies for everyone building AI in this space.

As part of this process, the committee invited people to submit comments. Thirty-one organisations and individuals did so, including Spring, Talkspace, Big Health, Slingshot, Headspace, Otsuka and more.

I’ve read all these comments, and in today’s report, I share what I learned.

I’ll discuss where there is consensus and where there is conflict in how organisations think this technology should be regulated. I also highlight some of the more interesting ideas that I found in these submissions, and the reflections I had on what this might mean for the industry.

If you want to read the original submissions yourself, I’ve collated all of them in a database here.

But if you have a better way to spend ten hours of your life (which I clearly don’t), then just take the ten minutes to read this report instead.

Let’s get into it.

Join The Hemingway Community

I run a vetted community for mental health innovators, and we now have over 230 members, including founders, clinicians, investors and researchers. If you’re interested in networking and collaborating with these kinds of people, I’d suggest checking it out.

We discuss important topics, share learnings and host events. We’re actually hosting a private dinner for community members in San Diego next week, so if you join the community and will be in San Diego, let me know, and I’ll share the invite with you.

More consensus than conflict

When I finished reading every submission to the DHAC, I had one clear reaction. Despite significant diversity in the types of organisations submitting comments, there was far more consensus than conflict.

Most organisations are mostly aligned in how they think mental health AI should be regulated. Here are the areas where I found significant consensus.

A risk-based, tiered framework

There are no surprises here that this reasonable approach is recommended. Most submissions agreed that low-risk wellness tools should be lightly regulated, and high-risk clinical AI should be subject to stricter review. Oversight should be proportionate and flexible, focusing on intended use and risk level. This is also highly aligned with the FDA’s current approach to regulating medical devices.

Non-negotiable safety guardrails

There was a significant focus on ensuring safety. Models must be able to detect crisis cues (e.g., suicidality), block unsafe advice, and maintain clear escalation pathways. Almost everyone agreed that those escalation pathways should lead to some form of human support. Talkspace urged that the FDA require “human-in-the-loop” escalation protocols and routine testing of crisis recognition functions. SonderMind called for clear system limits - for example, chatbots should never attempt to manage crises autonomously - and for labelling that explains how escalation works. Click Therapeutics described how systems should monitor conversation patterns and semantic drift to catch early signs of psychosis or delusional reinforcement, triggering escalation to human oversight. The American Psychiatric Association urged that crisis management protocols be standardised and auditable, and NASW-Texas warned that AI should never replace the human therapeutic alliance in crisis contexts.

Transparency is essential

Everyone agreed that AI tools must clearly disclose that they are non-human, describe their purpose, capabilities, and limitations. Some organisations suggested standardised “model cards” (like what we have in nutrition labelling) that describe training data provenance, model performance, validation and known risks and limitations. I’m on board with some version of this. I think one of the most powerful things regulation can do is to enable better customer and provider choice by enforcing transparency. I wrote about this in more depth in last week’s report.

Continuous lifecycle oversight is required, but adaptations are needed

There was broad agreement that continuous oversight of models is required and that some version of the FDA’s Predetermined Change Control Plan (PCCP) is essential to make regulation workable for adaptive, learning systems in mental health.

However, some raised questions and concerns about how this would operate in practice. Kooth and Click noted that the current process is vague about what kinds of retraining or fine-tuning count as “significant” changes that would trigger new FDA review. It was also pointed out that most developers rely on third-party foundation models, leaving them with limited visibility into when those base models change. Several organisations proposed ways to strengthen or complement the PCCP: SecuraAI recommended quantitative “drift gates”, which would require each model update to pass regression testing on its proposed “Safety-5” metrics (Guardrail Persistence, Time-to-Escalation, Escalation Completeness, Critical Safety Event Rate, and Equity Gap). Kooth and Spring Health both supported continuous, transparent real-world monitoring of AI safety and performance. Kooth suggested that results be shared publicly, and Spring promoted open, repeatable evaluation frameworks such as their own VERA-MH. Click and the Connected Health Initiative called for a certification pathway for “validated foundation models” so downstream developers can rely on trusted baselines. The American Psychiatric Association suggested periodic third-party or FDA spot audits to verify compliance.

Overall, there was a push for the FDA to evolve PCCP into a more dynamic system that blends pre-specified change plans with real-world monitoring, quantitative safety thresholds, and independent verification.

Privacy, fairness and data protection are table stakes

Nearly all submissions call for strict data encryption, informed consent, and transparency on data use. The American Psychiatric Association, ABHW, and NASW-Texas all emphasise that mental health data is uniquely sensitive and should never be used for marketing or retraining without consent. NASW-Texas goes further - arguing that users’ data should remain encrypted and non-transferable even if the company is sold. SecuraAI and Livio Labs also propose regular bias and equity audits to prevent demographic disparities in performance.

Some areas of divergence

Despite this significant consensus, there were a few areas where submissions diverged in their recommendations.

The level of human oversight

No one outright rejects the need for human oversight, but opinions diverge on how universal it should be. Clinician-led organisations like the American Psychiatric Association and NASW-Texas insist that AI should never operate autonomously in mental health contexts. Companies like Talkspace and SonderMind largely align with this approach and believe that AI should support, and not replace, clinicians. Wellness-oriented products, like Slingshot, argue for more proportionality however, i.e., that low-risk products shouldn’t be forced to maintain human supervision.

Liability

The APA and NASW-Texas want the FDA to shift liability from clinicians to developers. If an AI system gives harmful advice, they argue, the company - not the provider using it - should bear responsibility. No one wants to discourage clinical adoption by making practitioners the scapegoats, but regulators still need to define who’s legally responsible when AI goes wrong.

The boundary of wellness vs. clinical care

This is the hardest regulatory line to draw. Kooth and M.Cert argue that “subclinical” tools - for prevention or early distress — need their own category. Kooth makes some strong statements here, saying that; “The current regulatory framework does not adequately capture these ‘subclinical’ tools that influence mental states and behaviors but do not make formal medical claims” and that “FDA should consider a new classification for subclinical digital mental health technologies, ensuring minimum safety, transparency, and evidence standards without imposing full SaMD requirements.” Slingshot disagrees, warning that dragging wellness tools into FDA territory would crush low-risk innovation. Deciding if a new category is required, or where to draw the line between wellness and SaMD will be top of mind for the FDA.

Insights and questions

A new regulatory category for Mental Wellness AI?

As discussed, several submissions (including from Headspace) envision a formal middle ground between wellness and clinical tools for subclinical products that don’t diagnose or treat disorders but meet defined standards for safety, privacy, and efficacy. In their public post, Headspace outlined what this might look like.

Headspace’s proposed framework.

This would have interesting implications for general-use chatbots, like ChatGPT. For example, if this framework were to be adopted, ChatGPT would be required to shut down and redirect sensitive mental health conversations. If it wanted to provide mental wellness support, it would need to adhere to the suggested safeguards around transparency and safety. Defining “mental wellness support” and “sensitive mental health conversations” could be challenging, however.

Now, I’m no regulatory expert, but I would be surprised if the FDA created a new category like this.

What are the right standards and evaluations for Mental Health AI?

Several organisations questioned what the right evaluations should be for mental health AI. Some proposed their own standards, like Spring’s VERA-MH. What is clear is that there is currently no widely accepted standard for this technology. Last week, I wrote about how this lack of consistent standards is holding back progress. One of the most impactful things organisations could do is to collaborate on this topic and align on a shared set of standards that can be used to evaluate and improve AI products in mental health. I’m sure regulators and payers would thank us for this too.

The challenge of evidence generation

Evidence generation is a complex challenge for generative AI in mental health. We would all agree that AI in a clinical context should be supported by high-quality clinical research. Big Health made this point explicitly in their submission. But the generative nature of these AIs makes this difficult - in short, because every interaction is unique and difficult to predict.

Traditional trial designs struggle to capture that variability. These trials tend to be conducted over relatively short time periods and are therefore highly affected by engagement rates - something which can fluctuate greatly over longer time periods. Big Health proposed using validated reporting tools to collect large volumes of real-world evidence, helping to reduce statistical noise and provide a clearer picture of effectiveness across diverse users.

While the current approaches to evidence generation do an OK job of guaranteeing safety and effectiveness, they are expensive, long, difficult and limited in their ability to understand real-world, long-term impacts. I don’t know what a solution for better evidence generation looks like in this space, but I know we need one. I’d be very curious to hear people’s thoughts on this topic.

Could AI create a two-tiered mental health system?

Some comments suggested that AI might deepen inequities in mental health care. NASW-Texas warned that if AI is used as a cheap substitute for clinicians, low-income and publicly insured patients might end up with bots while wealthier people get humans. This is a fair concern. We don’t want human therapy to be available only to those with privilege (although arguably, to some extent it already is). However, that must not stop us from trying to develop more accessible forms of treatment. My view? We can do both - work to increase access to human treatment, whilst also innovating on more accessible AI solutions.

Behind every AI is a human making choices

In its submission, Slingshot AI reminded us that every AI is designed, and that the people building them make choices as to how those models behave; “Every foundational AI is designed, and through that design process, specific choices are made that shape an AI’s behaviors, reward incentives, and impact on users.”

Every model reflects its creators’ assumptions and choices. While regulation will enforce standards through legal frameworks, we must also remember our personal responsibility to hold the people building these products responsible for their performance. Often, social expectation does more to shape behaviour than any rulebook.

As for what comes next, we’ll know more after today’s session. We’ll be discussing it in the Hemingway Slack Community, so feel free to apply if you want to be part of the conversation.

That’s all for this week. Have thoughts? Reach out and let me know. I love hearing from you.

Keep fighting the good fight!

Steve

Founder Hemingway

Reply

or to participate.