This post can also be found in our developer documentation at: https://docs.nightfall.ai/docs/content-filtering-sensitive-data-chatgpt
You can also see a live demonstration of this functionality in the Nightfall Playground: https://playground.nightfall.ai/filtergpt
The Data Sprawl Problem with Generative AI
Advancements in AI have led to the creation of generative AI systems like ChatGPT, which can generate human-like responses to text-based inputs. However, these inputs are at the discretion of the user and they aren’t automatically filtered for sensitive data.
This means that these systems can also be used to generate content from sensitive data, such as medical records, financial information, or personal details. In these cases, content filtering is crucial to prevent the unauthorized disclosure of sensitive data.
Similarly, content filtering is essential for ensuring compliance with data privacy laws and regulations. These laws require companies to protect sensitive data and prevent its unauthorized disclosure.
For example, consider a few real-world scenarios:
- You are using OpenAI to help debug code or for code completion. If your code inputted to sensitive data has an API key, that key will be transmitted to OpenAI. For example:
- You are using OpenAI to help customer service agents respond to customer inquiries and troubleshoot issues. Support tickets have customers’ sensitive PII, credit card numbers, and Social Security numbers. That data may get transmitted by your service agents to OpenAI.
- You are using OpenAI to moderate content sent by patients or doctors in a health app you are building. These queries may contain sensitive protected health information (PHI) that gets transmitted unnecessarily to OpenAI.
Content filtering can be used to remove any sensitive data before it is processed by the AI system, ensuring that only the necessary information is used to generate content. This prevents sensitive data sprawl to AI systems.
In this guide, we will walk through an example of how to add content filtering to a service that uses an OpenAI GPT model through its APIs.
Standard Pattern for Using OpenAI Model APIs
A typical pattern for leveraging GPT is as follows:
- Get an API key and set environment variables
- Initialize the OpenAI SDK client (e.g. OpenAI Python client), or use the API directly to construct a request
- Construct your prompt and decide which endpoint and model is most applicable
- Send the request to OpenAI
Let's look at a simple example in Python. We’ll ask a GPT model for an auto-generated response we can send to a customer that is asking our customer support team about an issue with their payment method. Note how easy it is to send sensitive data, in this case, a credit card number to ChatGPT.
🚨 This is a risky practice because now we are sending sensitive customer information to OpenAI. Next, let’s explore how we can prevent this while still getting the full benefit of using ChatGPT.
Adding Content Filtering to the Pattern
You can use Nightfall’s Firewall for AI to check for sensitive findings and ensure sensitive data isn’t sent out. Here’s how:
Step 1: Setup Nightfall
Get an API key for Nightfall and set environment variables. Learn more about creating a Nightfall API key here. In this example, we’ll use the Nightfall Python SDK.
Step 2: Configure Detection
Create a pre-configured detection rule in the Nightfall dashboard or inline detection rule with the Nightfall API or SDK client.
💡 Consider using Redaction
Note that if you specify a redaction config, you can automatically get de-identified data back, including a reconstructed, redacted copy of your original payload. Learn more about redaction here.
Step 3: Classify, Redact, Filter
Send your outgoing prompt text in a request payload to the Nightfall API text scan endpoint.
The Nightfall API will respond with detections and the redacted payload, for example:
For example, let’s say we send Nightfall the following:
The customer said:
We get back the following redacted text:
Send Redacted Prompt to OpenAI
- Review the response to see if Nightfall has returned sensitive findings
- If there are sensitive findings:
- You can choose to specify a redaction config in your request so that sensitive findings are redacted automatically
- Without a redaction config, you can simply break out of the conditional statement, throw an exception, etc.
- If no sensitive findings or you chose to redact findings with a redaction config:
- Initialize the OpenAI SDK client (e.g. OpenAI Python client), or use the API directly to construct a request
- Construct your outgoing prompt
- If you specified a redaction config and want to replace raw sensitive findings with redacted ones, use the redacted payload that Nightfall returns to you
- Use the OpenAI API or SDK client to send the prompt to the AI model
Python Example
Let's take a look at what this would look like in a Python example using the OpenAI and Nightfall Python SDKs:
Let’s take a look at the output printed to the console:
Safely Leveraging Generative AI
You'll see that the message we originally intended to send had sensitive data:
And the message we ultimately sent was redacted, and that’s what we sent to OpenAI!
OpenAI sends us the same response either way because it doesn’t need to receive the sensitive data to generate a cogent response. This means we were able to leverage ChatGPT just as easily but we didn’t risk sending OpenAI any unnecessary sensitive data. Now you are one step closer to leveraging generative AI safely in an enterprise setting.