- The Hemingway Report
- Posts
- #78: Reflections on Slingshot's Real World Study
#78: Reflections on Slingshot's Real World Study
What does it mean to internalise an AI and other important questions...
Hi friend,
Last month, Slingshot released a pre-print paper on their real-world evaluation of Ash, their AI app for mental health.
It’s one of the first papers exploring how modern generative AI applications are used for mental health in the real world.
So it got people’s interest.
I’ve been reading, re-reading and discussing this paper, including with members of the Hemingway Community.
In today’s report, I share my reactions, but more specifically, I discuss the most interesting implications of this research and the questions it raises for people building AI for mental health. Specifically, I discuss:
The implications of internalising an AI
The risk that AI may not help the people who need it most
Why AI does not have to lead to social isolation (and how it could be designed to do the opposite)
And, the challenge of non-responders and what to do about it
Let’s get into it.
Join The Hemingway Community
We now have over 270 members in our vetted community for mental health innovators. This includes founders, clinicians, investors and researchers, all passionate about improving population mental health. We discuss important topics, share learnings and host events. If you’re interested in networking and collaborating with these kinds of people, feel free to check it out.
If you aren’t aware, Slingshot AI is the mental health AI company (disclosure: I am a consultant to Slingshot) that developed Ash, an emotional support chatbot built on a foundation model. In November, they published this pre-print paper, which caught a lot of people’s attention. The paper analysed how people use Ash in the real world.
So what can we learn from this paper?
First, we need to understand this study for what it is: a single-arm observational study that can demonstrate feasibility, correlations between engagement and outcomes in a specific population and safety system performance.
We also need to understand what it is not: a controlled efficacy trial that can make causal claims about whether Ash (or AI companions in general) can improve someone’s mental health.
You can draw your own conclusions from the study. But here’s my take on what we can and can’t say, having read this research.
What can we say from this study?
Among the 305 participants in this study, mental health symptom scores improved. The study used imputation to handle missing data from the 47% who didn't complete all assessments.
Among people who completed this study protocol, measures of social connection improved: loneliness decreased, perceived social support increased, and users reported more social behaviours - more time spent with others, more phone calls, and more attendance at social activities.
Users who engaged more (measured by active days) showed greater depression reduction, though engagement metrics didn't predict anxiety improvement.
Half the sample showed limited or no improvement.
The safety system functioned without documented failures in this deployment.
And what can’t we say?
We can’t conclusively say that Ash caused these improvements. The study reports 36.4% of users were on psychiatric medication and 23.9% were in concurrent psychotherapy (with unknown overlap between these groups). With no control group and these concurrent treatments controlled for using only binary yes/no variables, we cannot isolate Ash's independent contribution
We also can't say whether the results generalise beyond the study completers.
We don't know if improvements are sustained long-term.
And finally, we can't make claims around mechanisms. While the study analysed some mechanisms (working alliance, engagement metrics, social connection measures), there was no analysis of what actually happened in the conversations with Ash.
Every study has limitations. And not every piece of research can (or should) be an RCT. Observational studies like this are important - they teach us how people engage with these interventions and generate signals for further investigation. We just can’t make claims beyond what the research allows us to.
Once we are aligned on this, we can have a more interesting conversation. One focused on what we learned from this research, what signals it provides and what questions it poses for those building in this field?
Here are the five things that stood out to me.
1. People internalised Ash
75% of users reported that Ash came to mind when they felt distressed or confused. These rates are comparable to human therapist internalisation. That’s a huge finding.
Users imagined Ash's voice, thought of specific statements Ash had made, and sensed Ash was "with them" emotionally. In human therapy, internalisation is considered a positive therapeutic outcome. You're not just learning techniques. You're internalising a supportive presence and a way of relating to your struggles.
The fact that this happens with an AI is genuinely interesting and raises questions we don't yet have good answers to. Questions like:
What are users actually internalising? With a human therapist, you internalise a relationship - their way of being with you, their stance toward your struggles, the particular quality of how they understand you. With Ash, users are internalising... what exactly? A conversational pattern? A persona or avatar they've constructed of what they imagine Ash to be? An algorithmic representation of therapeutic presence? The study measured internalisation using adapted items from the Therapist Internalisation Scale (users imagined "a particular quality to the sound of Ash's voice" and "Ash sitting in his/her office or in the app"), suggesting users may be creating a personified representation. But this is new territory, and we don’t know what it is they are actually internalising and how that impacts them.
On that note, I had another question…
Is this the same psychological process as internalising a human therapist, or something fundamentally different? Does it work through the same mechanisms? Will it predict long-term outcomes the same way human therapist internalisation does?
What happens when what you've internalised is available 24/7? With a human therapist, internalisation means carrying forward their perspective when they're not available - you ask yourself, "what would my therapist say?". One of the benefits of AI support is that it is always accessible. But does that accessibility bring downsides? Could it short-circuit the process of developing internal resources and capacity? Instead of thinking “what would my therapist say?”, if someone has internalised an AI for emotional support, they could just open their app and find out exactly what their AI would say. Every time we have a difficult thought or a negative experience, will we reflexively reach for our AI rather than exercising our own capacity to cope? If not dealt with correctly, this may lead users to become dependent on AI for validation, emotional regulation or decision making.
Having tested a lot of these apps, I’ve noticed this pattern emerge in my own behaviour: have uncomfortable thought > open app > dump in the thought > see what it has to say.
Interestingly, the Slingshot study suggests that internalisation did not predict over-reliance. Users naturally disengaged after improvement, even those who had internalised Ash's presence. Whatever is driving the over-reliance we see in how some people use generic chatbots, it seems like it has to be more than simply internalising the chatbot as a caring, sympathetic 'other'.
A lot of this comes down to design choices. Ash is built on Self Determination Theory, which values autonomy, competency and relatedness. When this is reflected in user conversations, it pushes people away from over-dependency on Ash.
For those designing AI interventions in mental health, it’s important to be thinking through all of these user behaviours and how they impact design choices.
For example, could models be prompted to encourage reflection on past strategies rather than always providing new answers? Could they sometimes tell the user, “Hey, I think this is something you need to process by yourself?”. Could they focus on helping users internalise approaches and skills rather than dependence on the AI's ongoing presence? And how can they normalise discontinuation after improvement?
2. 82% of users were women
82% of users were women, and only 12.5% were men. While this may not be reflective of overall usage of Ash or other AI solutions, it does raise a question: will these interventions reach new, underserved populations or continue to engage the people already open to care?
Men have higher suicide rates, lower treatment-seeking rates, and face specific barriers to accessing mental health care. If AI-based mental health support replicates the same gender disparities as traditional therapy, we're just digitising existing inequities.
The client base of most therapy businesses1 skews female. While we have made progress in serving men and many other demographic groups, the real opportunity in mental health innovation is to find new ways to reach these groups with interventions they engage with and that work.
3. Lower income predicted worse outcomes
Speaking of helping the underserved… In this study, Lower income was associated with lower odds of being in the Improving group compared to non-responders, even after controlling for other factors. This is for a free, accessible, digital intervention.
It's worth noting that while AI is a new technology, the way it impacts health outcomes doesn’t appear (at least from this small study) to be different from how normal healthcare works. If your income is low, it's harder for your symptoms to improve - regardless of the intervention type. Lower-income populations consistently show worse health outcomes across virtually all types of care, from medication adherence to therapy completion to surgical recovery.
The question is whether digital AI interventions can overcome these structural barriers or if they are destined to operate within them.
The study showed that among participants, time spent with others increased, phone calls to friends and family went up, and attendance at social activities rose. These were changes in real, functional measures of people’s lives. Again, because of the study design, we can’t say for sure that Ash was directly responsible for these results. But the interesting thing is that they run counter to the dominant concern about AI companions - that they will replace human relationships and worsen problems of isolation and loneliness.
There is no fundamental law that means AI will lead to human isolation. It is purely a question about how the product is designed. You can tune a model to direct users toward relationships in their lives, not away from them. You can end long sessions. You can monitor for dependency and change the conversation to reduce the chances it becomes a problem. This is what the Slingshot team have tried to bake into Ash, and the early data is encouraging.
Design decisions are driven by incentives. If an organisation is incentivised to create an AI that promotes human relationships, it can build an AI that attempts to do so. Incentives, incentives, incentives!
5. What to do with non-responders?
Nearly half the sample showed minimal improvement. The study identified three distinct groups: Rapid Improving (9.5%), Improving (42.3%), and Non-Responders (48.2%). These groups had similar baseline severity. The difference was how they responded to the intervention.
This heterogeneity mirrors 70 years of psychotherapy research. We know that therapy doesn't work uniformly, and for many, it doesn’t work at all. The question is whether AI solutions can change this or if the interventions will work for the same people for whom therapy would work.
Can AI products be better at identifying non-response early, adapting the intervention or escalating to other forms of care that may be more likely to work? Organisations that are building AI interventions within a broader clinical service offering will have more options here.
When it comes to AI in mental health, there are so many hard, unanswered questions. Our obligation is to explore those questions with honesty, curiosity and a deep sense of responsibility for the people we aim to serve. It is only by asking these questions, discussing them with peers and rigorously researching them that we can start to understand what might work, and what might not.
I would love to hear your thoughts on the topics raised in today’s report. If you are a Hemingway member, jump into our Slack group and let me know what’s on your mind. And if you’re not, just reply to this email - I’d love to hear from you.
That’s all for this week. Until next week…
Keep fighting the good fight!
Steve
Founder of Hemingway
P.S. Many thanks to everyone who shared their analysis of this research with me. It was incredibly valuable in my thinking.
Notes:
[1] Calling therapy practices “human therapy” or “traditional therapy” businesses seems weird, so I’m going to try and avoid that where I can.
Reply