Strategies for Safeguarding LLMs and Mitigating Novel Risks
As the adoption of AI models, particularly large language models (LLMs), continues to accelerate, enterprises are growing increasingly concerned about implementing proper security measures to protect these systems. Integrating LLMs into internet-connected applications exposes new attack surfaces that malicious actors could potentially exploit.
While traditional web and API applications share some common vulnerabilities like data exfiltration risks, the unique operational mechanisms of LLMs introduce a novel category of threats. This necessitates robust, tailored security frameworks to address the intricate risk profiles of LLM deployments.
In this article, we'll examine the differences between LLMs and traditional applications that call for tailored security solutions, explore some of the emerging risk vectors, and outline current strategies for mitigating these novel risks via the concept of the Firewall for AI.
How do LLMs differ from traditional applications?
Large Language Models (LLMs) differ from traditional applications in several key aspects, particularly in how they handle data and operations, and in their non-deterministic nature.
Traditional applications are deterministic, meaning they will consistently produce the same output given the same input. This predictability comes from the application's code, which explicitly defines the operations and logic to be followed. The data plane (the database or data storage) and the control plane (the application's code) are clearly separated, allowing for specific, predefined interactions with the data.
Here's an example: Consider a banking app that allows users to view their transaction history. The app's code (control plane) contains specific instructions on how to retrieve transaction data from the database (data plane). When a user requests their transaction history, the app executes a predefined query to the database, which returns the same transaction history every time, provided the data hasn't changed.
Now, let's explore how LLMs differ from this deterministic approach:
- Non-Deterministic: LLMs are non-deterministic to some extent. Given a prompt, an LLM might produce slightly different responses each time due to the probabilistic nature of the model's design.
- Integrated Data and Operations: In LLMs, the data (training corpus) and the operations (model inferences) are integrated. The model generates responses based on patterns learned from the training data, without separate queries to a database.
- Dynamic Responses: LLMs create responses dynamically, considering the context and nuances of the input prompt. This allows for a wide range of outputs from a single model, as opposed to the fixed operations in traditional apps.
- Learning Capability: LLMs can learn and adapt over time, improving their responses based on new data and feedback. Traditional applications do not learn or adapt unless explicitly updated by developers.
Data Plane vs Control Plane
Let's consider a traditional web application and a large language model (LLM) application, both protected by a traditional firewall.
Traditional Web Application
- Data Plane: The firewall inspects and filters incoming/outgoing network traffic based on defined rules.
- Control Plane: Administrators set firewall rules to allow/block specific IP addresses, ports, protocols etc.
- Example: The firewall could block unauthorized access attempts by filtering traffic on port 80 (HTTP) from untrusted IP ranges.
In this scenario, the traditional firewall effectively secures the web application by operating on the data plane (network traffic) using rules defined in the control plane.
LLM Application
- Data Plane: The firewall still inspects network traffic, but cannot effectively analyze the LLM's dynamically generated text outputs.
- Control Plane: The same firewall rules apply, but do not address the unique risks of LLM outputs like misinformation, toxic language, leaked sensitive data, or taking downstream privileged actions.
- Example: An innocuous prompt like "Tell me about the company's financial performance" could lead the LLM to output sentences containing trade secrets or personally identifiable information (PII). A traditional firewall cannot detect or prevent such outputs.
For the LLM application, while the traditional firewall secures network traffic, it cannot mitigate risks inherent to the LLM's text generation process itself. Harmful outputs can stem from seemingly benign inputs, requiring analysis of both prompts and outputs.
The key distinction is that for traditional applications, risks primarily manifest in the data plane (network traffic), which firewalls effectively handle. However, for LLMs, the risks go beyond just the traffic itself and originate from the model's language processing and text generation capabilities.
The distinction between the data plane and control plane for traditional applications versus large language models (LLMs) is important because it highlights a fundamental difference in how security controls need to be applied.
For traditional applications like firewalls:
- The data plane deals with the actual flow of data/traffic
- The control plane is separate and sets the rules/policies to govern that data flow
- Security controls in the control plane (e.g. block rules) determine what gets blocked/allowed in the data plane
This separation allows for a clear delineation of where security policies are defined (control plane) and where they are enforced (data plane).
However, for LLMs, the data plane and control plane are more tightly integrated:
- The data plane is where inputs are processed and outputs generated dynamically
- But the "control" that determines safe outputs is embedded within the model itself via its training data and parameters
There isn't a separate, cleanly delineated control plane to define security policies externally. The model's behavior controls emerge from its internal training/configuration.
This integration of "control" within the LLM model makes it challenging to enforce traditional security controls that operate on the data flow itself (like a firewall). The risks can manifest from the model's generated outputs based on seemingly innocuous inputs.
So for LLMs, security controls can't just be applied at the "data plane" level by inspecting and blocking traffic flows. They need to be embedded at multiple stages:
- The training data and processes to instill safer behaviors
- Filters/detectors analyzing both input prompts and generated outputs
- Continuous monitoring and retraining as new risks emerge
The blurring of data/control boundaries for LLMs means security needs to be an integrated part of the entire AI system lifecycle, not just applied as an external control on the data flow. It’s important to not only inspect data flows but also employ AI/NLP techniques to analyze inputs, outputs, and model behavior.
This fundamental operational difference from traditional application architectures is why novel AI security approaches like Nightfall are needed. Simply extending existing data/control plane separation concepts falls short for LLMs.
What is a Firewall for AI?
A "Firewall for AI" or an "AI Firewall" is distinct from a traditional web application firewall (WAF) that leverages AI to enhance its operations. An AI Firewall refers to a class of security solutions designed to protect AI models by acting as an intermediary, monitoring and operating on the model's inputs (prompts) and outputs. The service sits in front of the AI model, inspecting prompts for various risk vectors such as malicious injections, sensitive data exposure, and other threats discussed in more detail below.
It's important to note that the term "Firewall for AI" is somewhat of a misnomer, as it implies a network-based security control akin to a traditional firewall instrumented on the network. However, AI Firewall solutions can be deployed in various ways, not just as an inline network component. In the following section, we'll explore different deployment models and integration patterns for incorporating AI Firewall capabilities into your AI infrastructure.
While an AI Firewall shares some high-level concepts with traditional firewalls, such as acting as a gatekeeper and enforcing security policies, its implementation and functionality are distinctly tailored for the unique risks and characteristics of AI models, particularly large language models (LLMs). AI Firewalls employ advanced techniques like natural language processing, machine learning models, and contextual analysis to detect and mitigate threats that manifest in the form of textual inputs and outputs, rather than focusing solely on network traffic patterns.
What are the common risks of concern?
- Model Abuse and Prompt Injection: This risk involves malicious actors crafting prompts that manipulate the LLM into providing unintended responses or actions. Prompt injection is largely harmful in two ways: (a) eliciting inappropriate responses visible to the end-user, or (b) causing unintended downstream actions. When LLMs are integrated with other applications or services (e.g., via LangChain), a malicious prompt poses the greatest risk as it could cause the LLM to orchestrate deleterious actions, like deleting records from a database.
- Current Methods: Solutions for prompt injection remain fairly rudimentary and an ongoing area of research. Many current approaches involve using rules, heuristics, and predefined filters, which can be prone to false positives and require frequent updates. No single solution holistically addresses this threat. We'll explore this topic deeper in a subsequent article.
- Harmful or Toxic Content: The dissemination of harmful or toxic content, such as offensive, discriminatory, or damaging material.
- Current Methods: Content moderation, sentiment analysis, and ethical guidelines help detect and filter unacceptable content generation. The nuances of language, context, and cultural differences make accurately identifying all instances of harmful content difficult.
- Sensitive Data Exposure: LLMs can inadvertently leak sensitive information due to overfitting or memorization during training, improper filtering, or errors in processing inputs, resulting in unauthorized disclosure of confidential data.
- Current Methods: Existing solutions involve filtering and redacting sensitive data from prompts and outputs. At Nightfall, we leverage LLMs paired with a number of other machine learning based techniques to do this. Legacy approaches are more focused on rules, regular expressions, and pattern matching.
- Traditional Risks: While not LLM-specific, these models can integrate into applications vulnerable to traditional risks like Distributed Denial of Service (DDoS) attacks that disrupt services through traffic overload to the model. Solutions for such risks are well-established via traditional firewall capabilities.
Shared Responsibility Model for AI
As large language models (LLMs) continue to evolve, their capabilities in detecting and mitigating model abuse, prompt injection, toxic content, and sensitive data exposure will likely improve. However, solely relying on the LLM provider's built-in safeguards may not be sufficient for enterprises with stringent data security and compliance requirements. This is where a third-party service that provides an additional layer of protection can add significant value, aligning with the shared responsibility model for data security.
The shared responsibility model is a framework that delineates the security responsibilities between the cloud service provider and the customer. In the context of AI models, the LLM provider is responsible for securing the underlying infrastructure, platform, and services. However, enterprises are accountable for securing their data, applications, and workloads that interact with the AI models.
By incorporating a third-party service that both redacts sensitive data before it ever reaches the LLM provider and inspects LLM outputs, enterprises can maintain control over their confidential information and ensure compliance with data protection regulations. This approach aligns with the shared responsibility model, where the enterprise takes proactive measures to protect their data, while the LLM provider focuses on securing the AI model and its underlying infrastructure.
Moreover, enterprises may have specific data governance policies, industry-specific regulations, or unique security requirements that necessitate an additional layer of protection beyond what the LLM provider offers. A third-party service can tailor its data redaction and filtering capabilities to meet these specific needs, providing enterprises with greater control and customization over their data security posture.
Note that for the LLM provider to filter out sensitive data, it still has to receive that data in the first place. So an upstream security solution can detect and prevent the information from reaching the LLM provider at all, adding an extra layer of protection.
By embracing the shared responsibility model, enterprises can leverage the power of AI models while maintaining a strong security posture and ensuring that their sensitive data remains confidential throughout the data processing lifecycle. This approach not only mitigates potential data breaches but also builds trust with customers, partners, and regulatory bodies, demonstrating the enterprise's commitment to data privacy and security.
How do I deploy an AI Firewall?
AI Firewalls are typically deployed via an API, an SDK, or a reverse proxy. The recommended approach is via API, as this provides flexibility, aligns with how developers interact with third-party LLMs, and is easiest to instrument.
- API Deployment: Direct integration with AI services via API calls to a cloud hosted content inspection service, typically wrapping API calls to the AI service and evaluating both inputs and outputs, providing flexibility and ease of use for developers.
- SDK Deployment: Software development kits with tools and libraries to incorporate into applications, often simply wrapping API calls when the content inspection service is cloud based.
- Reverse Proxy Deployment: Deploying as a network intermediary between the AI service and users, suitable for traditional firewall or gateway implementations with features like rate limiting.
At Nightfall, we recommend an API-centric approach for integrating Firewall for AI capabilities into your existing workflows when working with third-party large language models (LLMs) like those offered by OpenAI. This approach complements developers' established processes and makes for smoother integration.
APIs offer composability and flexibility. This method enables the use of powerful AI-based techniques for content inspection, rather than relying solely on rules-based approaches that could introduce latency when inspecting traffic inline within the network via a reverse proxy.
The ability to leverage AI for detecting prompt injection, toxic content, and sensitive data exposure is crucial because these security concerns are highly context-dependent. Traditional rules-based techniques often struggle to accurately identify such nuanced threats, resulting in an unacceptable number of false positives that can overwhelm security teams and disrupt operations.
By employing advanced AI models, Nightfall's API-centric solution can effectively analyze the context and intent behind prompts and responses, enabling precise detection and redaction of sensitive information, malicious content, or potentially harmful prompts. This approach significantly reduces false positives while providing a robust layer of security tailored to the unique risks associated with large language models. Moreover, API based methods allow for seamless scalability and customization. As your AI model usage grows, a cloud hosted solution can scale more easily to handle increasing volumes of traffic, ensuring consistent security and performance.
Ultimately, a firewall for AI need not be a standalone solution but part of a comprehensive data leak prevention platform. Nightfall's Firewall for AI provides tools and capabilities for detecting, classifying, and protecting sensitive data across various SaaS and GenAI applications, as well as email and endpoints. By leveraging Nightfall's Firewall for AI, organizations can safeguard their AI applications against a wide range of security risks, maintaining data integrity and confidentiality.
In summary, Firewalls for AI are an important component of modern data security strategies as organizations increasingly build AI applications. By choosing the appropriate deployment method and leveraging advanced platforms like Nightfall, developers can ensure robust protection for their AI models. While AI Firewalls provide one crucial layer of defense against risks inherent to AI model consumption, they are not a catch-all solution for all potential AI-related risks. A comprehensive AI security strategy should incorporate additional safeguards and best practices throughout the entire AI lifecycle, from model development and training to deployment and monitoring. In subsequent articles, we will delve deeper into these techniques.