Ebooks

The ultimate guide to security for AI

Author icon
by
Brian Hutchins
,
December 22, 2023
The ultimate guide to security for AIThe ultimate guide to security for AI
Brian Hutchins
December 22, 2023
Icon - Time needed to read this article

How many of us use ChatGPT? And how many of us use SaaS applications as part of our daily workflows? Whether you know it or not, if you use either of these tools, your data has likely traveled beyond the boundaries of your “fort.”

What do I mean by “fort,” exactly? For this guide, consider your “fort” to be somewhere where you can monitor and secure your data. When data leaks outside your “fort,” it presents a myriad of possible risks. Cloud Data Leak Prevention (DLP) tools prove useful for not only combatting those risks, but also for preventing data leaks in the first place.

Where is your data?

As with any step function technology, AI presents both opportunities and dangers. For instance, we’ve seen examples of the following early hacks involving generative AI (GenAI):

All of these hacks involve data, either in a GenAI model itself or in a model’s training data. The following diagram from Lightspeed succinctly covers both the vulnerabilities as well as the opportunities for protecting GenAI.

For the remainder of this guide, we’ll home in on two very important areas involved in the hacks mentioned above: Data extraction and data leakage.

When you interact with any given GenAI model, it’s all too easy to submit a prompt that includes sensitive data. Once you do that, your sensitive data can be sent off to train the model. But that’s not all: GenAI companies can’t possibly label or validate every incoming prompt, so your data might also be sent to annotation platforms. Those annotation platforms may also contract third-party vendors to annotate that data. Either way, once that data is annotated, it may then be entered back into GenAI models. The bottom line? A single data leak can result in up to four new attack surfaces.

These attack surfaces aren’t just limited to GenAI tools; they also apply to SaaS apps. Say you’re using GitHub and accidentally commit a secret like an API key. Alternatively, you’re using Zendesk, and a customer files a request including their credit card information. As both of these apps have GenAI capabilities, it’s possible for sensitive data to make it into models this way, too.

So, how do we prevent data leaks? The short answer is that we must stop them at the source, whether in GenAI prompts or SaaS apps.

How can you tell what data is sensitive and what’s not?

DLP tools can identify and prevent sensitive data sharing via GenAI and SaaS apps—and they can do so without generating false alarms or disrupting workflows. Below are some interesting and challenging use cases that DLP solutions must address.

Credit cards

Take a look at the following two sample prompts that have been submitted to ChatGPT.

  • Sample prompt #1: Please create a polite email I can send to a customer whose credit card was declined. Verify that his VISA card number is correct. The customer’s name is Tony Smith. I have his number as 2235-5978-0999-0987.
  • Sample prompt #2: Please create a polite email that I can send to Tony Smith telling him $254.09 was charged to his VISA ***2345. He can expect delivery on Tuesday, 9-13-2023. The transaction # is 2235-5978-0999-0987.

In the latter prompt, we see a transaction ID formatted just like a credit card number. This presents a challenge for certain DLP solutions, since many may incorrectly flag the transaction ID as PCI. A false alarm would clutter security team inboxes, in addition to blocking legitimate business workflows.

API keys

Disambiguating data gets even more difficult when it comes to API keys. Imagine if the following sample prompts were also submitted to ChatGPT.

  • Prompt #1: Please generate a Stripe code in Python to charge $1000 for a designer backpack. Use my Stripe API key, sk_live_4eC39HqLyjWDarjtTlzdp7dcTYooM54NiTphI7jx.
  • Prompt #2: Rewrite this description of a backpack: “With this Mickey Mouse backpack, you’ll be storing your laptop in style…” <a href = http://us.store.com/123?itemKey=lAv8Qxu6NGdChCgubVc95f1HFnJ0NW_eHD3AagO>

Both prompts include the word “key” near high-entropy words, but only the first prompt includes an active API key. With regex rules only, the latter prompt would generate a false alarm.

PHI

HIPAA defines Protected Health Information (PHI) as any data that uniquely identifies a patient and includes corresponding health conditions or treatments. PHI presents an even tougher problem for DLP models to detect.

  • Sample prompt #1: Patient Anthony Smith (DOB 05/10/1983, address: 123 Westbrook Road in Miami, Florida, age 40), presents with a sustained elevated heart rate. The patient has a past medical history of cancer. Attending physician: Harwood, Andrew MD.
  • Sample prompt #2: I met with Anthony Smith from Harwood Construction about re-paving the parking lot at the Cancer Center at 123 Westbrook Road in Miami, Florida. We aim to begin construction on 5/10/2023.

Our first example includes a patient's name, address, date of birth, and past diagnosis. This information is considered PHI as it uniquely identifies a patient as well as their health condition. Therefore, it is sensitive and should not be shared openly. However, in the second example, the same entities are unrelated to patient health and do not constitute PHI. Many DLP solutions will mistakenly flag both examples as PHI.

Are rule-based approaches enough?

Each of the examples above presents a different—and increasingly difficult—challenge for models to solve. First, let's attempt to solve them using a rule-based approach.

At a glance, some of the most common rule-based approaches might include:

  • Regexes
  • Custom rules
  • Checksums
  • Bags of words
  • Entropy tests
  • Keyword matching

All of these options excel at recognizing patterns. However, they aren’t able to use data to understand the full context of the problem—which tends to lead to false positive findings.

For instance, let’s apply these rules to our credit card example.

  • Sample prompt #1: Please create a polite email that I can send to a customer whose credit card was declined. Verify that his VISA card number is correct. The customer’s name is Tony Smith. I have the number as 2235-5978-0999-0987.
  • Sample prompt #2: Please create a polite email that I can send to Tony Smith telling him $254.09 was charged to his VISA ***2345. He can expect delivery Tuesday 9-13-2023. The transaction # is 2235-5978-0999-0987.

In this case, we can use a regex to search for 16-digit characters. Additionally, we can apply the Luhn Algorithm to check for validity, search for nearby keywords (such as "VISA"), and verify a valid Bank Identification Number (BIN). The 16-digit numbers in both prompts will pass all tests. However, the 16-digit number in the second prompt is a transaction ID, which means that this rules-based model will result in a false positive alert.

This example illustrates the limits of rule-based approaches. We can begin to address those limits with the help of word vectors.

How do word vectors work?

Say that a model is trying to understand the context of a word within a sentence. To “understand” that word, the model translates the word into a data point called a word vector. Word vectors provide another way for models to differentiate true positive findings from false positive findings—however, they’re by no means a perfect solution.

When setting up word vector matching, the first step involves telling your model to search for certain base phrases. For instance, you might use “social security” and “credit card” as your base phrases. Then, when you submit a sentence that contains one of those base phrases, your model will give you a reward.

The first example above doesn’t mention a “social security number,” or even an “SSN”—but it does mention an “SSA,” which is short for “Social Security Association.” Even though it’s not a direct match to the specified base phrase, the model still recognizes it and gives it a high score. The same occurs with “debit card.” Even though “debit card” wasn’t one of our original base phrases, this model would still surface the third example as a credit card number.

In short, word vectors help us get more accurate responses without needing to constantly update our base phrases. So why don’t we stop here? There’s still plenty of room for our model to improve.

How do you measure and compare models?

Up until now, we’ve been relying on a primarily rule-based approach. But how can we evaluate our model’s performance so far?

When evaluating any given model, our primary metrics are:

  • Precision: Percentage of correct predictions
  • Recall: Percentage of positive samples predicted
  • F-Score: Composite measure of precision and recall

Let’s see how these numbers play out when we ask our model to detect a bank routing number.

If we have 29% precision, this means that our model is making the wrong predictions 71% of the time. We’re also only catching 67% of bank routing numbers, which means that 32% of them go undetected. In short, the cons of our rule-based model significantly outweigh the pros.

So what’s a better approach? You guessed it: Machine learning (ML).

How can ML help?

Here’s a little experiment: What’s the first thing that comes to mind when you hear the word “bank”? Do you think of the sandy area by a river or lake? Or perhaps a vault in a financial institution? Using ML, a model can capture the meaning of the word “bank” depending on the sentence it’s in.

  • John stood by the “bank” of the river.
  • Jane deposited her check at the “bank.”

One way models can get this valuable context is by using Convolutional Neural Networks (CNNs).

In the diagram above, the green layers represent words or tokens in a CNN. The blue layers then take those words and redefine them based on their proximity to other nearby words. The more layers a CNN has, the better it is at understanding the context of a word within a sentence.

Time to revisit our previous example—except this time, we’ll use a CNN.

Now, our model is scoring individual words instead of just base phrases. This leads our model to be 95% sure that the first example is a social security number and 96% sure that the second is not. We also get similar results with the debit card number.

Now that we understand the advantages of CNNs, let’s see if there’s any change to our precision and recall.

We can see that in moving from a primarily rule-based approach to a CNN approach, our precision improved by nearly 60%. Similarly, our recall improved by over 20%. With these results, it’s clear that using ML results in significant improvements across the board. However, we’re not done yet.

How does GenAI fit in?

Transformer models are on the cutting edge of GenAI technology. They offer significant advancements from CNNs, with only two cons to consider.

Neural network models like CNNs may not be suitable for complex data types, leading to poor performance. In such cases, transformer models with an advanced attention algorithm and large training datasets can be utilized to achieve high performance. Let's see how this higher performance affects our precision, recall, and F-score.

With another sample rule-based model, our precision is only 7%. If we’re using a neural model that’s analogous to a CNN (an LSTM), we see a slight bump to 31% precision. And lastly, when we switch to a sample large language transformer model, our precision jumps to nearly 90%.

With this unparalleled precision, we can use transformer models to accurately identify complex data types like PHI. As discussed earlier, PHI is a combination of PII and diagnostic information. However, these two entities must be directly related to be considered PHI.

Let’s return to our earlier example:

  • Sample prompt #1: Patient Anthony Smith (DOB 05/10/1983, address: 123 Westbrook Road in Miami, Florida, age 40), presents with a sustained elevated heart rate. The patient has a past medical history of cancer. Attending physician: Harwood, Andrew MD.
  • Sample prompt #2: I met with Anthony Smith from Harwood Construction about re-paving the parking lot at the Cancer Center at 123 Westbrook Road in Miami, Florida. We aim to begin construction on 5/10/2023.

As our transformer model reads through the first prompt, it discovers a name, a date of birth, and an address to satisfy the PII criteria. Then, later on, it can be determined that the “cancer” diagnosis is directly related to the patient.

On the other hand, our transformer model would also be able to determine that the second prompt is not PHI. Even though there’s a name, a date, an address, and a possible diagnosis (”cancer”) present, the transformer model doesn’t detect a patient-care relationship between these entities and, therefore, doesn’t flag them as PHI.

What are our key takeaways?

If we’re looking to detect complex sensitive data types like social security numbers, credit card numbers, API keys, or even PHI, it’s vital to use a model that can understand the context of important entities within a sentence as well as within a document. Transformer models do this exceedingly well—which is precisely why Nightfall uses them to power our latest generation of detectors. Ultimately, these detectors are vital DLP tools that can help you to secure your “fort” from the threats posed by GenAI and SaaS apps.

Curious to see Nightfall’s AI-powered detectors in action? Schedule a demo today!

On this page

Nightfall Mini Logo

Getting started is easy

Install in minutes to start protecting your sensitive data.

Get a demo