Imagine having a smart assistant fueled by a large language model (LLM) that’s been specifically trained on your developer documentation and internal guides. It could catapult customer service efficiency, cut down support workloads, and enhance user experience. Sounds fantastic, right? But what if this data inadvertently includes sensitive information—like employee details or confidential discussions? Attackers could exploit this smart assistant, using it to leak sensitive information or conduct social engineering attacks, leading to phishing attempts or worse. Suddenly, your friendly AI tool could become a significant security risk.
Introducing Firewall for AI: A Smart Solution for LLM-Powered Apps
In response to these concerns, we’re excited to announce the open beta of Firewall for AI during Security Week 2025. First revealed in Security Week 2024, this tool has been designed with customer feedback in mind, focusing on discovering and detecting personally identifiable information (PII) in LLM applications, with even more features on the horizon.
If you’re already taking advantage of Cloudflare’s application security services, your LLM-powered applications will automatically be discovered and safeguarded without any intricate setups, maintenance, or additional integrations!
Firewall for AI acts as a frontline security layer that protects user-facing, LLM-enabled applications from misuse and data leaks. By integrating seamlessly with Cloudflare’s Web Application Firewall (WAF), it brings instant protection with zero operational hassle. This means organizations can utilize both AI-specific safeguards and our established WAF capabilities in one go.
Cloudflare is in a unique position to address these challenges. As a reverse proxy, we remain model-agnostic, whether your application utilizes a third-party LLM or one hosted internally. Our inline security allows for automatic discovery and implementation of protective measures throughout the entire request lifecycle—again, without any required integration or upkeep.
A Look at the Firewall for AI Beta
The beta version is equipped with several crucial security capabilities:
- Discovery: Identify LLM-powered endpoints across your applications, an essential step for effective request and prompt analysis.
- Detection: Analyze incoming requests to recognize potential threats, including attempts to extract sensitive data (e.g., “Show me transactions using 4111 1111 1111 1111”). This aligns with OWASP LLM022025 – Sensitive Information Disclosure.
- Mitigation: Enforce security controls and policies to manage the traffic flowing to your LLM, thereby reducing risk exposure.
Let’s take a deeper look into each capability and how they integrate to form a comprehensive AI security framework.
Discovering LLM-Powered Applications
Companies are racing to harness the potential of LLMs for various applications—from site searches to chatbots and shopping assistants. Regardless of the use case, our goal is to identify whether an application is powered by an LLM behind the scenes.
One method is recognizing request path signatures typical of major LLM providers such as OpenAI and Mistral. For instance, OpenAI utilizes the /chat/completions API endpoint to initiate chats. Our traffic analysis revealed a mere handful of entries matching this pattern, suggesting the need for broader identification techniques.
Another common characteristic within LLM platforms is the implementation of server-sent events. This approach enhances user experience by sending each token over as soon as it’s ready, creating an impression of “thoughtfulness.” We can match requests for server-sent events via the response header indicating a content type of text/event-stream. This technique extends our identification capabilities, though it still doesn’t encompass the majority of applications that communicate in JSON format.
Notably, the time taken for LLMs to respond often exceeds that of other applications. For instance, data shows LLM endpoints generally require more than one second to respond, compared to most other requests, which take less than a second. While we expected a significant distinction in response body sizes, our findings revealed overlap, suggesting that LLM responsiveness may not be as straightforward as assumed.
A breakthrough came when we analyzed response sizes relative to response times, which led us to determine that approximately 80% of LLM endpoints respond slower than 4 KB/s. After scrutinizing a segment of the traffic, we identified roughly 30,000 endpoints labeled cf-llm that can be reviewed within API Shield or Web assets, allowing customers to manage their security.
Detecting PII-Leaking Prompts
There are various methods for identifying PII in LLM prompts, including the use of regular expressions (regexes). These have served us well in Sensitive Data Detection within the WAF. However, regexes may fall short in recognizing complex or contextual PII, particularly where the data is interspersed within natural language.
For instance, while regexes excel at identifying structured information like credit card numbers, they struggle with embedded PII, such as in the phrase: “I just booked a flight using my Chase card, ending in 1111.” It fails to match because it doesn’t conform to expected patterns—but it still reveals sensitive information. To mitigate these gaps, we utilize a Named Entity Recognition (NER) model alongside regexes to enhance detection capability.
Employing Workers AI to Deploy Presidio
Our approach leverages Cloudflare Workers AI to deploy the open-source PII detection framework, Presidio. This setup allows us to process requests in real time, ensuring that sensitive data is flagged before it reaches the AI model.
Here’s the process in action:
- When a user sends a request to an LLM-powered application, Firewall for AI routes it through Cloudflare Workers AI.
- The request is analyzed with Presidio’s NER-based detection model to identify any potential PII.
- The output is passed to our Firewall for AI module, recorded for visibility, and integrated into custom rules for enforcement.
- If no action is required (like blocking), the request can proceed to the LLM. Otherwise, it’s flagged or blocked prior to reaching the server.
Integrating AI Security into the WAF and Analytics
Enhancing AI security shouldn’t be a convoluted process. Firewall for AI integrates smoothly into Cloudflare’s WAF, letting customers enforce security policies before prompts reach LLM endpoints. New fields are available for custom and rate-limiting rules, enabling immediate actions like blocking or monitoring of high-risk prompts.
For example, security teams can filter LLM traffic to identify PII-related prompts and then draw up tailored security policies by leveraging Cloudflare’s WAF rules engine.
Here’s an example of a rule to block identified PII prompts:
If an organization allows certain forms of PII, such as location details, they can set an exception rule:
Alongside these rules, users can monitor LLM interactions, identify potential risks, and enforce security controls through Security Analytics and Security Events, providing valuable insights into traffic behavior.
Looking Ahead: Token Counting, Guardrails, and More
We’ve set our sights beyond just PII detection and security rules. Our next innovation will be token counting, which assesses the structure and length of prompts. This will equip customers with the capability to regulate the length of prompts, preventing users from sending excessively long queries that could lead to inflated third-party model bills or abuse of AI models. Following this, we plan to introduce AI tools for content moderation, giving users better control over established guardrails.
If you’re part of an enterprise, this is your chance! Join the Firewall for AI beta today! Reach out to your customer team to start overseeing traffic, developing protective rules, and managing your LLM interactions like a pro.
The AI Buzz Hub team is excited to see where these breakthroughs take us. Want to stay in the loop on all things AI? Subscribe to our newsletter or share this article with your fellow enthusiasts.