Securing Generative AI: Embracing Open Source Tools to Combat Threats

As generative artificial intelligence (GenAI), especially large language models (LLMs), continues to expand its influence, companies are urged to harness a growing array of open-source tools designed to uncover security vulnerabilities. These tools focus on critical threats like prompt-injection attacks and jailbreaks, helping organizations safeguard their AI deployments.

The Rise of Open Source Security Tools

This year has witnessed an impressive surge in open-source tools crafted by academic researchers, cybersecurity firms, and AI security experts. Among these innovations is Broken Hill, developed by cybersecurity consultancy Bishop Fox, which aims to bypass the restrictions of nearly any LLM with a chat interface. This tool can be trained on a locally hosted LLM, generating prompts that are capable of overriding the built-in protective measures of other instances of the same model.

Derek Rush, a senior managing consultant at Bishop Fox, explains that Broken Hill intelligently manipulates prompts to navigate around existing guardrails, thereby exposing sensitive information. "It can change characters and add various suffixes to prompts, creating variations that evade detection," he notes, highlighting the tool’s capability to reveal secrets even in the face of additional security measures.

The Constant Struggle for Security

Despite rapid advancements in AI technology, security efforts appear to lag. New methods to circumvent AI protections regularly emerge. For instance, a team of researchers introduced "greedy coordinate gradients" (GCG) earlier this year, which effectively bypassed safeguards. More recently, the "Tree of Attacks with Pruning" (TAP) method and a less technical approach called "Deceptive Delight" further demonstrate the evolving landscape of security challenges.

Michael Bargury, CTO and co-founder of AI security firm Zenity, emphasizes the unpredictability of building secure AI applications, "We don’t really know how to build secure AI applications… and we are figuring that out while building them with real data and with real-world consequences."

Are Your AI Guardrails Working?

In their quest to fortify defenses, many companies are investing in prompt analysis models, like PromptGuard and LlamaGuard, which assess prompt validity. However, the effectiveness of these defenses remains uncertain. To address this, researchers and AI engineers have developed diagnostic tools to help organizations assess their security frameworks.

For instance, Microsoft introduced the Python Risk Identification Toolkit for generative AI (PyRIT) earlier this year. This AI penetration testing framework allows businesses to simulate attacks against their LLMs, helping them probe vulnerabilities effectively. Zenity’s Bargury mentions that they frequently use PyRIT internally for their research endeavors.

Additionally, Zenity has released its own open-source tool called PowerPwn, which specifically targets Azure-based services and Microsoft 365 products. Through PowerPwn, researchers have already unearthed multiple vulnerabilities in Microsoft Copilot, showcasing the necessity of ongoing security assessments.

How Attackers Evade Detection

Bishop Fox’s Broken Hill exemplifies how security strategies are evolving. By starting with a valid prompt and subtly altering characters, it guides the LLM toward disclosing secrets. According to Rush, this tool is adept at working across various GenAI models, enhancing its utility for organizations keen to understand their AI’s weaknesses.

For companies navigating the complex landscape of AI security, utilizing tools like Broken Hill, PyRIT, and PowerPwn is crucial. Bargury warns, "If your AI is useful, it also means it’s vulnerable…because anyone who can influence the data can exploit prompt injection and perform jailbreaking."

Conclusion

In a world where generative AI is becoming increasingly prevalent, understanding and mitigating the security risks associated with these technologies is imperative. As more open-source tools become available, companies have the opportunity to proactively test and improve their defenses against potential vulnerabilities.

The AI Buzz Hub team is excited to see where these breakthroughs take us. Want to stay in the loop on all things AI? Subscribe to our newsletter or share this article with your fellow enthusiasts.