Hi friend,

Conversational AI has become one of the most active areas of mental health innovation. Many startups, incumbents, and research institutions are building products in this space. They hope to extend access to support, reduce cost, and improve outcomes. Builders have moved fast and the market is moving quickly too, with regulations, safety standards and the available evidence base changing weekly.

This report on Conversational AI in mental health is an attempt to bring greater transparency to the field. We map the landscape from a business, product and clinical perspective, covering thirty-one conversational AI products and analysed across nine dimensions including clinical evidence, selected business models and product features. This report provides a map of the market divided into three segments, shares the primary insights from our analysis and a discussion of the key trends and gaps that we observed.

Let’s get into it.

Disclaimer: This market map is unlikely to be exhaustive. It relies on publicly available information (which comes with limitations), does not rank products or make treatment recommendations. Its aim is to bring transparency to a fast-moving area of innovation, and we encourage ecosystem members to contribute to it as a living project. Details on how to do so and additional notes can be found at the end of this report.

Methodology

First, a quick note on our methodology.

We identified 31 products from an initial set of 70+ by filtering for generative, multi-turn conversational AI with mental health as its primary purpose and a verifiable market presence. This excluded general-purpose AI (like ChatGPT), clinician-only tools, and assessment-only tools. Each product was coded across clinical positioning, evidence, human involvement, business model, and regulatory status by a cross-functional team from the Hemingway community with academic, clinical, and commercial backgrounds. The full methodology, product database, and links for feedback are included at the end of this report.

Market Map

The 31 products were clustered into three meaningful segments, Scaled Hybrid Platforms, Clinical Infrastructure and Consumer-First.

Scaled Hybrid Platforms (n=6): Headspace, Spring Health, Lyra, Grow Therapy, SonderMind, Sword Health

Scaled Hybrid Platforms are established, later-stage companies — typically Series B and beyond — that have added conversational AI capabilities to an existing therapy or clinical services business. Their AI products sit alongside a broader offering that includes human clinicians, and all operate with some level of clinician-in-the-loop. They reach users primarily through employer-sponsored or health system channels and carry the infrastructure — compliance, clinical governance, etc. — that institutional buyers require. Evidence for the efficacy of the conversational AI products created by these businesses is limited, with only Spring Health reporting controlled study-level evidence. The strategic question for this segment is whether their AI tools become a meaningful product in their own right or remain a feature of the broader platform. It is likely that many more businesses (outside of those listed in this market map) are working on products in this space, however they have not yet been publicly released. For example Talkspace have announced that they are building a conversational AI product but that it won’t be released until the summer of 2026.

Segment 2: Consumer-First AI (n=16) — Slingshot/Ash, Sonia, Noah AI, Youper, Yana, Kin, Rosebud, Manifest, Yuna, Earkick, Ahead, Kai, Feeling Great, Inner Vault, Elomia, Flourish/Sunnie

This is the largest and most heterogeneous segment. These products go direct to consumers via freemium or subscription models and almost universally operate with zero human involvement. Evidence levels vary widely — from RCT (Elomia, Flourish) to none at all. The level of clinician involvement in product development also varies widely. The segment is defined less by a shared clinical philosophy than by a shared distribution strategy: reach users directly, at low cost, without the friction of institutional sales cycles. Most are pre-seed or seed stage.

Segment 3: Clinical Infrastructure (n=9) — Wysa, ieso/Velora, Alongside/Kiwi, Wayhaven, Limbic, Jimini Health, Brightn, Affiniti, Xaia

This segment describes companies building conversational AI products designed to sit within or alongside another organisation’s clinical services and workflows. These tools augment what clinicians and care systems already do — whether through between-session support, measurement-based care, or structured intervention delivery that feeds back into a treatment plan. The user may interact with the AI directly, but like most of the Scaled Hybrid Platform organisations, the product is architected around the clinician relationship, not as a substitute for it. This segment contains the highest concentration of clinical evidence in the dataset, including products with RCT or multiple-RCT-level validation (Wysa, ieso, Limbic). Business models skew toward health system licensing and API/integration plays reflecting the segment's orientation toward institutional buyers. They are distinct from Scaled Hybrid Platforms in that they do not tend to employ clinicians to deliver care directly, instead relying on their clients — provider organisations, health systems, educational institutions — to deliver care alongside their technical tools.

Key Insights

After reviewing the collected data, we noted eleven key insights on the market.

1. This is still an emerging market.

The market clearly sees potential in these products and are investing to build them out. But conversational AI for mental health is still in its early innings, and our understanding of its impact is similarly nascent. Only five products have RCT-level evidence or above (Elomia, Flourish/Sunnie, ieso/Velora, Limbic, and Wysa), while eleven - more than a third - have no published clinical evidence at all. A lot of the activity in this space is being driven by very young companies that have not yet been through rigorous clinical validation. Nineteen of thirty-one products (61%) are Pre-Seed or Seed stage. There is strong evidence for the underlying therapeutic modalities that most products claim to use (e.g., CBT, ACT); however what is largely unknown in this new AI context is the way in which these modalities are being delivered and the impact of new elements in the equation (e.g., therapeutic drift from a conversational agent or potential replacement of important offline activities/interactions). While this evidence base is now being rapidly developed, this is a market still in the early stages of its definition.

2. Evidence and scale do not correlate.

The largest companies in this market have generated little clinical evidence for their AI products specifically. Among the Scaled Hybrid Platforms most have no published clinical evidence for their conversational AI features. These companies are not asking their AI to be the intervention — at least, not at this point — so it’s an important question to consider what evidence they do need. Their core clinical products are human-delivered support, coaching, or blended care models, many of which carry substantial evidence bases of their own. The AI layer typically serves an adjacent function: intake, between-session engagement, or content delivery that supports the primary clinical relationship. In that context, the evidentiary question is not whether the AI works as a standalone treatment but whether it increases access, improves retention, engagement, or clinical efficiency within an already-validated care model.
For smaller companies pursuing a clinical route, the calculus is inverted. If your AI product is the intervention — if there is no clinician behind it — then evidence is the entire basis of your credibility with payers, regulators, and health systems. However, because many of these companies are currently going direct to consumer, credibility with those parties is less important.

The asymmetry in evidence generation is notable: the companies with the most resources to run trials may need them least, while the companies with the least resources need them most.

3. No dominant player has emerged.

Despite significant consumer demand, we found no product that had gained significantly more users compared to the rest of the peer set. What’s more, none of the products analysed come close to the popularity of existing general-purpose AI chatbots (e.g. ChatGPT, Gemini, Claude) which we know are often used for mental health purposes despite not being designed to do so. Many products have coalesced around the medium of a text-based chat user interface and CBT and emotional support dominate the therapeutic modalities underlying the products. Therefore, many of these products look similar and attempt to serve similar purposes for users. So far, no product has clearly separated itself in the market in terms of adoption and all are dwarfed by the usage of general purpose chatbots.

From both an impact and commercial perspective, there is a large opportunity for a well-designed product to achieve large-scale adoption, replacing general-purpose model usage for emotional and mental health use cases.

This market is still all to play for.

4. CBT is the dominant therapeutic modality and most products cluster around the same use cases.

CBT appears in twenty-one of thirty-one products, followed by mindfulness (sixteen) and ACT (thirteen). On the use-case side, anxiety (twenty-three), general well-being (twenty-three), stress (twenty), and depression (fifteen) account for the vast majority. Meanwhile, substance use, eating disorders, trauma/PTSD, and chronic pain each appear in only one or two products. The market is densely concentrated on mild-to-moderate anxiety and stress, with very little coverage of more complex or less prevalent conditions.

5. Nearly every product positions as "wellness/coaching".

Almost all products position themselves as wellness or coaching rather than treatment. Feeling Great explicitly claims a treatment positioning, yet it has no published clinical evidence and operates with zero human involvement. Another product, Inner Vault, calls itself AI Therapy. But these are the exception. The near-universal wellness framing reflects regulatory caution and go-to-market strategy: it avoids the higher costs and regulatory burdens associated with clinical treatment claims. But for clinicians and policymakers, it creates a transparency problem. Users may be receiving something that looks and feels like a therapeutic interaction without the oversight or evidence requirements that a treatment designation would demand.

6. Two-thirds of products operate with zero human clinical involvement.

Nineteen of thirty-one products operate with no clinician in the loop. Among those fully autonomous products, six have no clinical evidence. Fully autonomous models are more prevalent in early stage, consumer first AI products while larger, hybrid care providers all have a human in the loop. However, for those who do involve clinicians, the nature of that involvement isn't always defined or consistent, and it's difficult to know what that actually implies for a product’s validation or safety.

7. Most products include clinicians in their development.

While several products don’t include humans in their actual product experience, most included clinicians in the development of their product. Twenty-six of thirty-one products have documented clinician involvement of some kind in their development. We noticed clinicians being involved as co-founders, early team members, clinical advisors, scientists/researchers, or design partners. Of the five for which we could not find evidence of clinician involvement in their design, four were in the Consumer-First AI category.

8. The line between a support tool and a substitute for human connection is largely unaddressed.

Several products in the dataset emphasise emotional closeness, continuous availability, and affirmation as core features - especially within the Consumer-First AI category. The clinical implications of this design orientation — particularly for vulnerable users — are not well understood and have been linked to potential harm for users. Few products make explicit distinctions between tools designed to scaffold human connection and those that may inadvertently replace it.

9. Products serving adolescents vary enormously in evidence and safety infrastructure.

Nine products serve adolescents (ages 13–17), but their evidence levels range from RCT (Flourish, Wysa, Elomia) down to zero published evidence (Noah AI). Privacy practices are equally uneven: some products have no publicly posted privacy information. Given the heightened duty of care owed to minors, this variability is a notable finding for regulators, parents, and institutional purchasers alike. This is a significant gap given the well-documented importance of early intervention in mental health, and the reality that younger users who cannot find appropriate tools will default to general-purpose, generative AI chatbots that were not designed for their needs nor their vulnerabilities.

10. Products are largely built on top of general-purpose models.

A small minority of products describe using a "proprietary or custom" model. The vast majority are built on top of general-purpose models from providers like OpenAI, Anthropic, and Google, with model behaviour adjusted through system prompts, fine-tuning, retrieval-augmented generation (RAG), external safety filters and other processes. These tools give product teams meaningful control over typical model behaviour — therapeutic tone, modality adherence, crisis escalation. However, they cannot change the base model's architecture, core training data, or deep learned patterns, and they cannot prevent the model provider from updating the foundation underneath them.
At the infrastructure level, products are far less differentiated than one might imagine. Therefore, any potential moat for these companies does not sit at the model layer — it sits in the clinical design, safety architecture, the quality of proprietary data feeding back into the system, the evidence base, and the distribution relationships built on top of it. We did not observe any difference in available clinical evidence or any other dimension between products built on proprietary models and those built on top of general-purpose models.

Whether a foundational model for therapy or mental health purposes more broadly can out-perform general-purpose models remains an open question for the field.

11. The market remains remarkably opaque. 

Despite surveying thirty-one products across nine dimensions, basic questions about quality, safety, and underlying architecture remain difficult to answer with confidence. Four products — including one serving adolescents — have no publicly available privacy information. Evidence claims are often difficult to verify independently, and the line between internal validation and peer-reviewed research is often not made clear. Clinical positioning language varies so widely that products offering substantively similar experiences to users can describe themselves in very different terms. For clinicians looking to refer patients, employers selecting vendors, policymakers writing guidelines, and users making choices about their own care, informed decision-making is difficult due to incomplete public information. Transparency is not yet the norm, and until it is, independent efforts to map and evaluate this landscape will remain necessary — and necessarily imperfect.

Discussion

This report set out to map the conversational AI landscape in mental health. The aim was to add transparency to the ecosystem and encourage thoughtful innovation to improve population mental health. Based on our analysis, there are a number of interesting discussion areas for the field.

1. More insight is needed on safety, therapeutic quality and impact

This report did not assess product safety in any systematic way. We would be unable to identify which products handle crisis situations well, which have robust content moderation, or which have been tested against adversarial use. In our efforts to try to track safety, the best we could do for an objective measure was to assess what safety features were available in a product. But even doing that was challenging due to the lack of public information on this topic from companies. A number of safety benchmarks are emerging in this space which would offer more consistent and transparent safety assessments, but product performance on those benchmarks is for the most part, currently unavailable. This report did not evaluate therapeutic quality or user outcomes either. These are the most important questions for this space and we would hope to see them addressed in other work. 

2. Greater transparency can enable better clinician and consumer choice

This project found it genuinely difficult to assess many of these products, even with a cross-functional team and a deliberate methodology. If it is hard for a group of researchers, clinicians, and industry professionals to evaluate these tools, it is functionally impossible for an individual user downloading an app from a store. The opacity is not necessarily intentional — many companies are early-stage and still building — but the effect is the same. Users cannot make informed choices about products they cannot meaningfully evaluate.

3. How to ensure AI addresses underserved areas

As noted in the insights section, many of the solutions in the market appear quite similar. Most cluster around the same therapeutic modalities, the same use cases, the same user demographics, and the same wellness positioning. Most are text-based, most draw on CBT, most target anxiety and stress in young and working-age adults, and most avoid making clinical claims. This convergence likely reflects a rational response to where the evidence is strongest, where the regulatory risk is lowest, and where the commercial opportunity is most immediate. But it also means that the populations and conditions most underserved by traditional mental health systems — children, older adults, people with substance use disorders, eating disorders, trauma, or serious mental illness — may remain underserved by AI too. This pattern likely reflects not only market incentives but also appropriate clinical caution: these populations often involve greater complexity and risk, and current evidence suggests AI is best suited, at least for now, to more structured or lower-risk use cases.

4. The opportunity (and risks) of more disruptive innovation

This also raises a deeper question about the scope of innovation. Almost every product in this dataset is attempting to deliver or augment an existing therapeutic protocol - CBT, ACT, mindfulness - using a new technology. That is a very reasonable approach, and it allows builders to draw on decades of clinical evidence for the underlying approach. Most of the market is trying to answer the question: can AI deliver elements of existing, human-based care, but at a lower cost, greater scale and perhaps with some additional efficacy? What remains largely unexplored is whether conversational AI could enable fundamentally new forms of intervention — approaches that would not be possible, practical, or even conceivable in traditional care settings. Very few products in this dataset are exploring that frontier, suggesting the field is still thinking of AI as an adjunct or comparable alternative to human-based care, rather than an entirely new medium for the prevention and treatment of mental illness. Whether industry is the right place to test fundamentally new forms of intervention is an open and important question. Incentive structures are not always set up to have effectiveness as a top priority. New AI-powered modalities which might be worth exploring, perhaps need to developed in a clinical-research environment first. That said, startups can be useful bodies for turning nascent science into products people actually use. The best breeding ground for such innovation is an interesting question to consider.

5. What evidence, for whom?

The evidence question is more nuanced than developing a simple league table of who has trials and who does not. As this report noted, the largest platforms have little published evidence for their AI features specifically, but their AI is not the intervention — it sits within a broader clinical model. For smaller companies who target the healthcare system, evidence is an important part of their strategy to build credibility. Then there are consumer-facing products operating outside institutional channels entirely. Each of these segments have different user expectations and commercial strategies raising an interesting question; what evidence should they be generating?
Not every product needs to complete RCTs and pursue FDA approval to deliver meaningful population health benefits. Many of these businesses would not be capable of doing so due to the cost and duration of such pursuits. So what sort of evidence should we demand from the different segments of this market? All products should be able to demonstrate an acceptable level of safety although we are yet to clearly define what that should be. And their claims should be in line with what they're capable of delivering. But what else? Providing clear guidance to the market, especially this category of consumer-focused products is important. However, it must be done in a way that still encourages innovation while ensuring safety and user outcomes are prioritised.

6. Competing with general-purpose models

The products in this report are purpose-built for mental health. But general purpose models — from OpenAI, Anthropic, Google, and others — are already being used for mental health support by millions of people. That usage dwarfs the usage of all products mentioned in this report. If a general-purpose chatbot is liked by users, and can deliver passable interventions (e.g., CBT-informed conversations), the value proposition to users of a dedicated mental health product rests on what it adds beyond the base model: clinical governance, safety protocols, evidence, human oversight, integration with care systems and importantly, the clinical context of the user themselves.

As mentioned before, the majority of products in this dataset are built on top of those same general-purpose models. These products may benefit from improvements in the underlying models. But those improvements may also make the general-purpose models more attractive to users.

The products that gain adoption will be those that can take the best of the underlying models and layer meaningful improvements that users, clinicians and payers care about. If that added value isn't apparent, especially to the user, general-purpose models will remain the dominant source of AI mental health support.

Conclusion

This is an innovation area with potential to significantly extend and improve mental health support. It has attracted capital, attention and some of the world’s best talent. But potential is not impact. Thorough evidence generation, transparency, product innovation and listening to real users is the work ahead.

We intend for this report to support the ecosystem by creating greater transparency in the field. It is a living project and we welcome contributions and collaboration from across the ecosystem. To do so, please submit this form.

Acknowledgements and Contributors

This report was produced by The Hemingway Community. The following individuals contributed to research, data collection, product coding, clinical review, and/or editorial input. Their participation does not imply endorsement of the report's conclusions, nor of any product or company discussed within it. Contributions reflect individual professional expertise and are not based on primary data collection or direct verification of individual company claims. Contributions do not constitute a clinical audit, safety validation, or endorsement of any specific product or company. Some contributors may hold roles or be affiliated with the organisations included in this report. 

Danielle Vaeth - Contributor
Jackie Ourman, LMHC-D - Contributor
Michael Trang - Project Co-Lead
Michiel van Vliet - Contributor
Molly Fuller - Contributor
Nathaniel Hundt - Contributor
Nikki Huang - Contributor
Pavithra Ramesh - Contributor
Steve Duke - Project Co-Lead & Author

Many thanks to everyone who contributed to this report.

Notes

  • This report does not rank products or make any treatment recommendations. Rather, our goal is to provide increased transparency for this emerging and rapidly developing market and to encourage thoughtful innovation that can improve population mental health.

  • Full details on our methodology and approach are linked below.

  • Some contributors may hold roles or be affiliated with the organisations included in this report.

  • This map is unlikely to be exhaustive and the data has limitations. It is entirely possible that some products and data were missed, especially given the fast-moving nature of this space. Many companies disclose little about their technology, clinical oversight, or safety protocols. Some fields are unknown - not because the information doesn't exist, but because it isn't public. Our aim is for this to be a living database and we encourage contribution from the broader ecosystem.

Resources and Feedback

If you’d like to discuss this report with other mental health leaders, or work on future projects like this, consider joining The Hemingway Community.

Keep fighting the good fight!

Steve

Founder of Hemingway

Reply

Avatar

or to participate

Keep Reading