Recent research reveals a troubling trend: only one in three organizations are implementing sufficient testing practices in the development of AI applications. This has triggered calls within the industry for a stronger emphasis on red teaming to mitigate risks associated with emerging technologies.
A study conducted by Applause indicates that a staggering 70% of developers are currently working on AI applications, with chatbots and customer support tools being the primary focus for more than half (55%) of these developers. Yet, despite this surge in AI development, many organizations are neglecting crucial quality assurance (QA) measures, leading to concerns over product quality and diminished long-term return on investment (ROI).
Chris Sheehan, EVP of High Tech & AI at Applause, commented, “The results of our annual AI survey underscore the need to raise the bar on how we test and roll out new generative AI models and applications.” The time is ripe to rethink our strategies for ensuring that these technologies meet the high expectations users have.
The Need for Human Involvement in AI Development
The Applause study emphasizes the necessity of human involvement in the AI development lifecycle. As developers increasingly integrate generative AI tools into their workflows, the human touch has never been more critical for identifying and correcting issues such as inaccuracy, bias, and toxic outputs.
Among the most vital QA activities that benefit from human testing are:
- Prompt and response grading (61%)
- Accessibility testing (54%)
- User experience (UX) testing (57%)
Furthermore, humans play an essential role in training niche or industry-specific AI models, especially as agentic AI applications become more prevalent and interact directly with users. Alarmingly, less than a third (33%) of organizations are currently employing red team testing—an adversarial quality control method borrowed from cybersecurity that helps to uncover potential weaknesses in applications.
The need for enhanced red teaming in AI development is clear and should focus on revealing issues like model bias and inaccuracies that can severely impact user experience and trust in AI systems.
Addressing Persistent Application Flaws
Applause’s research also highlights that customer-related issues appear to be on the rise, with nearly two-thirds of customers using generative AI in 2025 reporting some sort of complication. Among these, over a third faced biased responses (35%), hallucinations (32%), and offensive outputs (17%).
Although improvements have been seen since the initial rush of generative AI popularity, hallucinations remain a significant concern, creating uncertainty for enterprise IT leaders. In a KPMG study released in August 2024, six out of ten tech leaders identified hallucinations as one of their main worries when implementing or building generative AI technologies.
However, there is a silver lining. Many of the enterprises surveyed have begun to integrate AI testing measures into their development process earlier on, with more sophisticated model training techniques that utilize diverse and high-quality datasets. Some organizations are also starting to adopt red teaming practices more eagerly, according to Sheehan.
“While every generative AI use case requires a tailored approach to quality, human intelligence can be woven throughout several phases of the development process, from data modeling to comprehensive real-world testing,” he stated. “As AI becomes more entwined in our daily lives, we must ensure these solutions meet user expectations while also addressing the inherent risks that accompany their use.”
MORE FROM ITPRO
The AI Buzz Hub team is excited to see where these breakthroughs take us. Want to stay in the loop on all things AI? Subscribe to our newsletter or share this article with your fellow enthusiasts.