Dean does QA

Klarna's $40B AI Gamble: Why Even Tech Giants Are Bringing Humans Back + What It Means for YOUR Business!

Subscriber Episode Dean Bodart Season 1 Episode 19

This episode is only available to subscribers.

DDQA+

Get early access to new episodes

The promise of AI: unprecedented efficiency and massive cost savings. But what happens when the AI dream meets the messy reality of human interaction? Klarna, the fintech powerhouse, offers a compelling, cautionary tale. After a bold move to replace hundreds of human customer service agents with AI, they're now making a strategic U-turn, openly admitting that prioritizing cost over quality led to a decline in customer experience. This isn't just Klarna's story; it's a critical lesson for every business navigating the AI revolution.

In this episode, we dive deep into:

  • The Klarna Experiment: What went wrong when AI went too far in customer service.
  • Beyond Klarna: Widespread recalibration across industries, from IBM's HR bot to Air Canada's costly chatbot and DPD's erratic AI.
  • The AI Paradox in Software Testing: A real-life experiment by Carnegie Mellon researchers that shows AI's "common sense" gap and its limitations in practical scenarios.
  • The Indispensable Role of Human Intelligence: Why a "human-in-the-loop" approach is critical for success.
  • Key Takeaways for Your Business: Actionable strategies for embracing a phased, human-centric approach to AI, investing in data governance, designing seamless AI-human collaboration, and cultivating AI literacy in your workforce.

Tune in to understand why the future of AI isn't about replacing humans, but about empowering them, and how your business can harness AI's true transformative potential by prioritizing quality, ethics, and the invaluable human element.

Thanks for tuning into this episode of Dean Does QA!

  • Connect with Dean: Find Dean's latest written content and connect on LinkedIn: @deanbodart
  • Support the Podcast: If you found this episode valuable, please subscribe, rate, share, and review us on your favorite podcast platform. Your support helps us reach more listeners!
  • Subscribe to DDQA+: Elevate your AI knowledge with DDQA+, our premium subscription! Subscribe and get early access to new episodes and exclusive content to keep you ahead.
  • Got a Question? Send us your thoughts or topics you'd like us to cover at dean.bodart@conative.be
SPEAKER_01:

You're listening to The Deep Dive. And for this one, episode 19 in our Dean Does QA podcast series, we're tackling something huge. Klarna's$40 AI gamble. Why even tech giants are bringing humans back. Yeah, it's a

SPEAKER_00:

big

SPEAKER_01:

topic. For a while now, the promise of AI. It's just captivated everyone, hasn't it? You hear about unprecedented efficiency, massive cost savings, this like fully automated future where robots handle everything.

SPEAKER_00:

The dream.

SPEAKER_01:

And companies, I mean, from tiny startups to the massive players, they've jumped on these AI first strategies, sometimes leading to, well, significant workforce cuts.

SPEAKER_00:

That's definitely been part of the narrative.

SPEAKER_01:

But here's the core tension we really want to dig into today. What happens when that shiny AI dream bumps up against the messy, unpredictable You

SPEAKER_00:

know, have you ever stopped to think that maybe the future of work isn't really AI replacing us? But maybe it's more about a kind of partnership, one that actually frees us up from the boring stuff.

SPEAKER_01:

That's a really interesting way to frame it. We hear so much fear, right? Job losses, automation taking over.

SPEAKER_00:

Exactly. The headlines are always pretty dramatic. But what if the, you know, the real story is actually more about how we work with AI, how human skills and AI can, well, thrive together?

SPEAKER_01:

It's a critical point because, yeah, the common narrative often skips right to replacement. Right. But the truly exciting developments, the ones making a real difference, they are about sidelining people. They're designed to boost what we humans do best. This really gets to the heart of what people call the human in the loop idea. AI assists, it collaborates with human expertise, it doesn't just take over.

SPEAKER_00:

That tees things up perfectly for what we want to achieve today. We're going to dive deep into how AI is shaking up certain industries, but crucially, focusing on how it's built to enhance us, not replace us. This deep dive is really about unpacking that human in the loop concept in real world tech.

SPEAKER_01:

And to help us navigate this, We're drawing our insights today from a really specific source. It's titled SkySuite, GenAI-powered test automation with human oversight.

SPEAKER_00:

Ah, SkySuite. And actually, speaking of them, we should definitely mention that they're actively supporting this deep dive.

SPEAKER_01:

The point.

SPEAKER_00:

SkySuite is an AI-powered software testing platform. And, well, as you'll hear, they really champion this human-in-the-loop approach for AI adoption, especially in testing workflows.

SPEAKER_01:

Their whole model seems built around that balance we're discussing.

SPEAKER_00:

Exactly. And if you want to learn more about how they actually Yeah, what tasks it takes

SPEAKER_01:

on. And maybe most importantly, why Huey expertise is still so vital, absolutely central to the whole thing.

SPEAKER_00:

I think by the end, you'll probably have a much clearer, maybe even more optimistic view of how AI is reshaping jobs. So let's get into

SPEAKER_01:

it. So our mission in this deep dive is to unpack Klarna's fascinating story. Here's a company that made this really dramatic move, replacing human customer service with AI. And then maybe even more dramatically, did a sharp U-turn.

SPEAKER_00:

A really interesting case study.

SPEAKER_01:

And we'll see how this isn't just Klarna's tale. It's playing out across other major industries, even, believe it or not, in super technical areas like software testing.

SPEAKER_00:

Yeah, it's broader than many people think.

SPEAKER_01:

Ultimately, we're going to discover why the future of AI probably isn't about wholesale human replacement, but actually about deeply empowering humans.

SPEAKER_00:

This human AI collaboration angle.

SPEAKER_01:

Precisely. So get ready for some surprising facts, maybe some real aha moments. And hopefully practical insights will help you navigate your own business's AI journey. Sounds good. Okay, so when we talk about companies betting big on AI, Klarna's name, it often comes up first. This fintech powerhouse truly went all in. What exactly did that gamble look like?

SPEAKER_00:

Well, Klarna's journey into, let's call it, aggressive AI adoption, it started with some incredibly ambitious goals. Back in 2022, they partnered with OpenAI.

SPEAKER_01:

Right, the chat GPT folks.

SPEAKER_00:

Exactly. And then in a very bold, very public move, they laid off around 700 employees and announced a halt to human recruitment by 2023. Wow.

SPEAKER_01:

OK, that's decisive.

SPEAKER_00:

They even boasted pretty loudly that their AI assistant quickly took over like two thirds of customer service chats, claiming it performed the work of 700 human agents.

SPEAKER_01:

700.

SPEAKER_00:

Yep. the primary drivers for all this. Pretty clearly efficiency and substantial cost reduction. It was a headline grabbing move, no doubt about it.

SPEAKER_01:

That's a bold play, especially in customer service. But what happened when all that AI ambition met, you know, actual customers?

SPEAKER_00:

Well, what's fascinating here is how quickly that initial enthusiasm met, let's say, stark reality, that promise of efficiency. It had some critical blind spots. Customers became incredibly frustrated. They were getting these irrelevant, sometimes incorrect or just overly scripted AI responses. The AI, sure, it recognized keywords, but it completely missed the nuances of human intent. That led to these really rigid, unhelpful answers and those endless loops where customers just keep rephrasing their questions, trying to trick the bot into understanding.

SPEAKER_01:

Oh, I think we've all been there. Trying to explain something to a chatbot that just doesn't get it. It feels like you're talking to a brick wall. Especially with financial stuff, I can imagine that frustration just skyrockets.

SPEAKER_00:

Absolutely. The AI really really struggled with complex, nuanced, or emotionally charged queries, like disputes or potential fraud cases.

SPEAKER_01:

Right, the trickly stuff.

SPEAKER_00:

It frequently just got stuck. It forced these frustrating, totally unproductive conversations before eventually escalating to a human.

SPEAKER_01:

If it escalated.

SPEAKER_00:

Right, if it even got that far. And when it did escalate, the handoffs were, well, far from seamless. Customers often had to repeat their entire problem from scratch.

SPEAKER_01:

The worst.

SPEAKER_00:

Which of course led to longer resolution times and just more frustration. In many cases, honestly, it made the experience worse than if AI hadn't been involved at all.

SPEAKER_01:

That's a pretty brutal reality check. What did Klarna's leadership actually say about why things went wrong? Their CEO, Sebastian Siemiakowski, he was surprisingly candid, wasn't he?

SPEAKER_00:

He was. Remarkably transparent, actually. He openly admitted, and this is a direct quote, cost unfortunately seems to have been a too predominant evaluation factor when organizing this. What you end up having is lower quality.

SPEAKER_01:

Wow. Okay. Lower quality. That's blunt.

SPEAKER_00:

It really is. And that quote, it really hits on a common pitfall. The singular focus on immediate cost savings, especially with AI, can just inadvertently erode customer satisfaction and ultimately brand trust. You gain efficiency on paper, but you lose something invaluable.

SPEAKER_01:

So when he says cost was the two predominant factor, what's the deeper truth they're hitting on there, the thing maybe other companies also miss?

SPEAKER_00:

I think it's the understanding that while AI can handle volume, scale, repetitive stuff, it really struggles with the qualitative aspects.

SPEAKER_01:

The human stuff.

SPEAKER_00:

Exactly. The empathy, the nuance, the judgment calls things that are essential for good customer service, particularly when you're dealing with sensitive topics like people's money. They realize that sacrificing that quality for cost ultimately backfires badly.

SPEAKER_01:

Right. Okay. So what does this all mean for Klarna now? They've clearly learned some tough lessons. They made this significant strategic U-turn. What's their refined approach look like today?

SPEAKER_00:

That's right. Klarna is now actively rehiring human agents, particularly for remote customer support roles. And they're even using a gig economy model to help them scale flexibly.

SPEAKER_01:

Interesting. So bringing people back, but maybe in a different structure.

SPEAKER_00:

Precisely. And their refined AI strategy, it now restricts the AI only to those repetitive transactional inquiries. Things like, you know, order tracking, simple refunds, payment reminders. The

SPEAKER_01:

easy stuff.

SPEAKER_00:

The easy stuff. Critically complex cases are now swiftly escalated to human experts. No more getting stuck in those endless loops.

SPEAKER_01:

That sounds much better. And they're refining the AI's capabilities for those simple tasks too, right? Making it better at what it can actually do.

SPEAKER_00:

Exactly. They've introduced things like confidence scoring so the AI AI basically knows when it's out of its depth and automatically escalates if it's uncertain about a query. Yeah, and they're training it with real customer-agent interactions to get a more natural tone. They're using sentiment analysis to try and detect frustration earlier, plus personalization referencing past interactions to make the AI feel a bit more tailored.

SPEAKER_01:

Okay, so it's AI assisting, not replacing in those cases.

SPEAKER_00:

That's the key shift, and it all supports the CEO's reiterated commitment. It's so critical that you are clear to your customer that there will always be a human if you want. That's a fundamental shift in philosophy from where they started.

SPEAKER_01:

You know, Klarna's story, while dramatic, it doesn't sound like an isolated incident anymore. Are other big names facing similar wake-up calls with their AI strategies?

SPEAKER_00:

Oh, it's far from unique. We've seen really similar patterns across various industries. Take IBM's HR bot, AskHR. Okay. It initially replaced a lot of their HR functions, but they later had to reintroduce human staff because the AI just couldn't handle the empathy or the objective judgment needed in employee relations.

SPEAKER_01:

Makes sense. HR needs that human touch. Yeah,

SPEAKER_00:

absolutely. Then there's Air Canada. They actually faced legal action when their chatbot gave a passenger incorrect refund information and a court ruled the airline was liable for the bot's mistake.

SPEAKER_01:

Wow. Liable for the bot. That sets a precedent.

SPEAKER_00:

It really does. And then there was the DPD chatbot incident. That was quite a public relations nightmare, wasn't it? I remember that one going viral.

SPEAKER_01:

Oh, absolutely. The delivery company bot that went rogue.

SPEAKER_00:

That's the one. The parcel delivery firm DTD had this truly bizarre incident where its chatbot, instead of helping, started swearing, insulting itself, and even composed a critical poem about the company. A

SPEAKER_01:

poem? You can't make this stuff up.

SPEAKER_00:

Right. They were forced to disable the bot entirely. And it's not just customer service or HR Meta's content moderation AI. It's been widely criticized for failing to adequately handle harmful content and misinformation. It just lacks the necessary cultural and political context. Yeah,

SPEAKER_01:

nuance again.

SPEAKER_00:

Always nuance, even in really high stakes areas like health care. Remember IBM Watson for oncology? Huge investment, something like$4 billion.

SPEAKER_01:

Right, supposed to revolutionize cancer treatment.

SPEAKER_00:

Yeah, well, it was quietly scaled back. Why? Because it was found to be providing dangerous and sometimes ineffective treatment recommendations.

SPEAKER_01:

Good grief. Okay, so whether it's financial services, HR, airlines, parcel delivery, social media, even health care. These cases reveal some strikingly common underlying issues. It's not just about what the AI can do technically, but what it consistently struggles with. What are these common threads you're seeing?

SPEAKER_00:

That's the crucial question, isn't it? The common root causes driving this, let's call it recalibration, often boil down to the AI's inherent limitations. A big one is just the lack of genuine empathy and contextual understanding.

SPEAKER_01:

You keep coming back to that.

SPEAKER_00:

Because it's fundamental. AI struggles with emotional cues, sarcasm, the subtle meanings in human language. Then there's data quality and bias. We always say AI is only as good as its training data.

SPEAKER_01:

Garbage in, garbage out. Pretty

SPEAKER_00:

much. Poor or biased data leads straight to inaccurate or even discriminatory outputs.

SPEAKER_01:

So it's not just about the raw computing power, is it? It's how the AI is fed and instructed and really for what purpose?

SPEAKER_00:

Exactly. And strategic misalignment is a huge factor. So many AI initiatives seem to lack clear KPIs beyond just cutting costs. They don't focus enough on the qualitative impact on the actual customer experience.

SPEAKER_01:

Right. Measuring the wrong thing.

SPEAKER_00:

And frankly, customer preference plays a huge role here. A significant chunk of consumers, you see figures like 75 percent in some studies, they still prefer human interaction for complex or sensitive issues. They just do.

SPEAKER_01:

Can't argue with that.

SPEAKER_00:

And finally, you can't ignore the very real legal and reputational risks. Companies are being held liable for AI errors, as we saw with Air Canada. And these public AI fizzles like DPD's swearing bot, they cause significant, sometimes lasting brand damage. It all adds up to a pretty sobering reality check for the AI hype.

SPEAKER_01:

You might think these issues, the empathy gap, the context problem, are mostly limited to customer service or public-facing interactions. But what's really fascinating is how these same challenges pop up even in highly technical domains like software testing. We hear so much about AI's promise there, right? Automating tedious tasks, generating test cases, detecting anomalies. People point to things like Google's smart test selection or Facebook's fuzzy visual testing framework as examples of what's possible.

SPEAKER_00:

And there is potential there, definitely. But this raises an important question. If AI is supposedly so good at logic and data processing, why does it still fall short in a field like software testing where precision seems like it should be its strength?

SPEAKER_01:

Yeah, good question.

SPEAKER_00:

There was a particularly illuminating example from a Carnegie Mellon experiment. Researchers created this simulated company staffed entirely by AI agents.

SPEAKER_01:

Wow, completely AI run.

SPEAKER_00:

Completely. Using top models from OpenAI, Anthropic, Meta, Google, and they Okay, so running a basic

SPEAKER_01:

company with just AI. What did they find? How did these AI agents actually perform?

SPEAKER_00:

Well, the researchers didn't mince words. They described the results as disastrous.

SPEAKER_01:

Disastrous. Not just suboptimal, but disastrous.

SPEAKER_00:

Yeah. Not a single AI model achieved even moderate success. The highest performer, which was Claude, completed only 24% of its tasks. Others, like Gemini and ChatGPT, were hovering around 10%. And Amazon's Nova, a mere 1.7%. That's

SPEAKER_01:

shockingly low. What went wrong?

SPEAKER_00:

A key issue they identified was the AI's inherent lack of common sense.

SPEAKER_01:

Common sense.

SPEAKER_00:

For instance, one example they gave, an AI couldn't access a file it needed because there was an unexpected pop-up on the screen, like an ad or a notification.

SPEAKER_01:

Oh, like something we just click X on without thinking.

SPEAKER_00:

Exactly. Any human would have instantly recognized it, clicked X and moved on. But the AI, lacking that intuitive real-world understanding, just abandoned the task, gave up.

SPEAKER_01:

That's such a simple yet profound example of AI's blind spot, isn't it? It highlights how much just goes unsaid and unprogrammed and basic human interaction and context. And these AI-powered companies, they weren't exactly cheap to run in the simulation, were they?

SPEAKER_00:

Not at all. The experiment also found these AI setups were problematically expensive. They averaged about$6 per task completed.

SPEAKER_01:

$6 per task.

SPEAKER_00:

Yeah, which really challenges that whole notion of immediate slam-dunk cost efficiency from just plugging in AI everywhere.

SPEAKER_01:

Definitely complicates the ROI calculation.

SPEAKER_00:

And there are other limitations specific to AI in software testing, like its inability ability to effectively assess visuals and the user experience, the UX. Right. AI can confirm functionality like, does the button work when clicked? Yes or no? But it frequently misses subtle visual glitches or it just cannot truly judge user friendliness.

SPEAKER_01:

So it can tell you if a button works functionally, but not if it looks good or feels right to a human user.

SPEAKER_00:

Precisely. And then you get into the ethical blind spots. AI lacks moral judgment. There was a case of an AI recruiting app, for example, it passed all the automated functional tests, but was later found to be discriminating against specific groups of users because it had inadvertently learned biases from its training data.

SPEAKER_01:

That's a big one.

SPEAKER_00:

Huge. Plus, AI models can suffer from something called overfitting. That's where they learn the training data too well, too rigidly. So if the product changes significantly or if they encounter data that looks different from training, they can become ineffective. They can't adapt well. And finally, many advanced AI models are essentially black boxes.

SPEAKER_01:

Meaning we don't know how they reach conclusions.

SPEAKER_00:

Exactly. Their internal workings are opaque. That makes it really difficult to understand why they made a certain decision or flagged a certain issue, which complicates debugging and, frankly, building trust in the system.

SPEAKER_01:

So if AI has these, I mean, surprisingly fundamental limitations, even in technical areas like testing... What does that mean for the role of humans in testing or really in any AI driven process? What's the takeaway here?

SPEAKER_00:

It really underscores the indispensable role of human intelligence. Humans excel at things like exploratory testing, just poking around, trying things out creatively.

SPEAKER_01:

The stuff you can't easily script.

SPEAKER_00:

Exactly. Creative problem solving, assessing the feel of the user experience, applying ethical judgment. Humans identify those subtle bugs, those usability issues that AI just sails past, ensuring the software truly meets user needs and is robust. bust out there in the real world. So the overarching lesson, really, from all these diverse scenarios, from Klarna's customer service woes to the complexities of software testing, it seems clear. The future of AI isn't about replacing humans wholesale. It's much more about empowering them.

SPEAKER_01:

Right.

SPEAKER_00:

And this is what we often call the human in the loop or HITL approach.

SPEAKER_01:

HITL. Okay. So what exactly is HITL and why is it becoming so critical now?

SPEAKER_00:

At its core, it's about consciously designing processes that integrate human judgment, human over So humans are actively

SPEAKER_01:

involved?

SPEAKER_00:

Actively involved. They provide crucial feedback, they correct errors the AI makes, and they continuously help improve the data quality the AI learns from. Why is it critical? Well, for customer service, humans provide that essential empathy, that contextual understanding, and the ability to handle the complex, sensitive stuff. That's how you build trust.

SPEAKER_01:

Which Klarna found out the hard way.

SPEAKER_00:

Indeed. And for software testing, as we just discussed, humans excel at that exploratory testing. The UX assessment, the ethical judgment, the creative problem solving, all things AI still really struggles with.

SPEAKER_01:

So it's really a hybrid model then. You have AI handling the repetitive tasks, the large scale data crunching, and humans focusing on the higher value strategic, more nuanced roles. Essentially working together.

SPEAKER_00:

That's it. Exactly. It's about synergy, combining AI speed and scale with human intuition, creativity and adaptability. That combination usually leads to superior outcomes compared to either AI alone or humans alone.

SPEAKER_01:

Makes sense.

SPEAKER_00:

However, there is a cautionary note here. We need to be wary of something called automation bias.

SPEAKER_01:

Automation bias. What's that?

SPEAKER_00:

It's where human supervisors, maybe after seeing some initial successes with an AI system, become complacent. They start to overtrust the AI.

SPEAKER_01:

They stop double checking. Exactly.

SPEAKER_00:

Their reviews become cursory rather than critical oversight. You have to actively design processes and training to guard against that tendency, to keep the human meaningfully in the loop.

SPEAKER_01:

That makes perfect sense. Don't let the pendulum swing too far the other way.

SPEAKER_00:

So with all this in mind, thinking about our listeners, what does this all mean for your business? How can you navigate this AI frontier successfully, maybe avoiding the pitfalls Klarna and others clearly encountered?

SPEAKER_01:

Yeah. Based on these experiences, we have several key takeaways and maybe some recommendations for you. First, really embrace a phased, human-centric approach. Start small. Start small. Measure carefully. Refine. Gradually expand AI's role through pilot programs. Don't go for that big bang, replace everything approach. Second, invest seriously in robust data governance and AI ethics right from day one.

SPEAKER_00:

Not as an afterthought. Definitely not. Emphasize quality, unbiased training data and integrate ethical considerations into every single And

SPEAKER_01:

making sure that handover, that connection between AI and human is seamless seems absolutely vital. That DPD chatbot story or even Klarna escalation issues really drove that home.

SPEAKER_00:

It's paramount. You have to design seamless AI-human collaboration. Ensure there are clear escape hatches, easy ways for users or employees to get out of the AI loop and smooth handoffs from AI to humans. So customers or employees always have a straightforward option to connect to. with a person when they need or want to. Fourth, cultivate AI literacy within your workforce. Don't just drop AI on them. Train employees to work with AI, understanding its capabilities and its limitations, and importantly, reskill them for those higher value, more human-centric roles that AI will free them up for.

SPEAKER_01:

Makes sense. Empower your people.

SPEAKER_00:

Exactly. And finally, define clear, comprehensive KPIs. Look beyond those simplistic efficiency metrics like calls handled. Measure the actual customer experience That's a really

SPEAKER_01:

powerful shift in perspective. Feels like the era of maybe overly aggressive human replacement AI is clearly giving way to a far more nuanced understanding of its role. It really is about AI as an intelligent co-pilot, maybe not a full substitute pilot.

SPEAKER_00:

Yes, I think that's a great way to put it. And the core Our message really is that successful AI adoption requires prioritizing quality, ethics, and that invaluable human element alongside efficiency. That's how you build stronger customer relationships, how you foster innovation, and ultimately how you drive truly sustainable growth in this digital age.

SPEAKER_01:

So a final thought for you, our listeners. As you think about the AI systems touching your own work, your own life, what unexpected pop-ups, literal or metaphorical, might they miss? And what human touch, what specific element of human intelligence or empathy, is truly irreplaceable your processes. We encourage you to continue your own deep dives into these fascinating and rapidly evolving topics.

People on this episode