Webinar: Join us, Tues 5/24. Nightfall & Hanzo experts will discuss how machine learning can enhance data governance, data security, and the efficiency of legal investigations. Register now ⟶

Guides 6 min read

Redacting Sensitive Data in 4 Lines of Code with Nightfall Data Loss Prevention (DLP) API

by isaacmadan Published Nov 19, 2021

In this tutorial, we’ll demonstrate how easy it is to redact sensitive data and give you a more in-depth look at various redaction techniques, how Nightfall’s data loss prevention (DLP) API works, and touch upon use cases for redaction techniques.

To learn more about the Nightfall Developer Platform and see an example use case you can watch the video below.

Before we get started, let’s set our Nightfall API key as an environment variable and install our dependencies for our code samples in Python. If you don’t have a Nightfall API key, generate one on your Nightfall Dashboard. If you don’t have a Nightfall account, sign up for a free Nightfall Developer Platform account.

Masking

Mask sensitive data with a configurable character, allow leaving some characters unmasked, and allow ignoring certain characters.

Examples with Nightfall DLP API

CasesAdditional ConfigBeforeAfter
DefaultNonemy ssn is 518-45-7708my ssn is ***********
Mask with custom charactermasking_char="#"my ssn is 518-45-7708my ssn is ###########
Leave first four characters unmaskednum_chars_to_leave_unmasked=4my ssn is 518-45-7708my ssn is 518-*******
Leave last four characters unmaskedmask_right_to_left=Truemy ssn is 518-45-7708my ssn is *******7708
Don’t mask - characterschars_to_ignore=["-"]my ssn is 518-45-7708my ssn is ***-**-****
All of the above!masking_char="*", num_chars_to_leave_unmasked=4, mask_right_to_left=True, chars_to_ignore=["-"]my ssn is 518-45-7708my ssn is ###-##-7708

Let’s put this together in Python with the Nightfall SDK. In our example, we have an input string with a credit card number (4916-6734-7572-5015 is my credit card number) and we wish to mask with an asterisk, unmask the last 4 digits, and ignore hyphens.

from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall

nightfall = Nightfall()  # reads API key from NIGHTFALL_API_KEY environment variable by default
payload = ["4916-6734-7572-5015 is my credit card number"]
findings, redacted_payload = nightfall.scan_text(
    payload,
    [DetectionRule(
        [Detector(
            min_confidence=Confidence.LIKELY,
            nightfall_detector="CREDIT_CARD_NUMBER",
            display_name="Credit Card Number",
            redaction_config=RedactionConfig(
                remove_finding=False,
                mask_config=MaskConfig(
                    masking_char="*",
                    num_chars_to_leave_unmasked=4,
                    mask_right_to_left=True,
                    chars_to_ignore=["-"]))
        )]
    )]
)
print(findings)
print(redacted_payload)

We’ll see our findings look like this (with line formatting added for clarity)

[[Finding(
  finding='4916-6734-7572-5015', 
  redacted_finding='****-****-****-5015', 
  before_context=None, 
  after_context=None, 
  detector_name='Credit Card Number', 
  detector_uuid='74c1815e-c0c3-4df5-8b1e-6cf98864a454', 
  confidence=<Confidence.VERY_LIKELY: 'VERY_LIKELY'>, 
  byte_range=Range(start=0, end=19), 
  codepoint_range=Range(start=0, end=19), 
  matched_detection_rule_uuids=[], 
  matched_detection_rules=['Inline Detection Rule #1'])
]]

We’ve also received the input payload back as a redacted string in our redacted_payload object:

['****-****-****-5015 is my credit card number']

When to use Masking

Masking is especially useful in scenarios where you want to retain some of the original format of the data or a certain amount of non-sensitive information as context. For example, it’s common to refer to credit card numbers by their last 4 digits, so masking everything but the last 4 digits would ensure that the output is still useful to the viewer.

Substitution

Substitute sensitive data findings with the InfoType, custom word, or an empty string.

Examples with Nightfall DLP API

  • Default case (“my email is sam@nightfall.ai.” → “my email is .”)
  • Case with custom word=”[REDACTED BY NIGHTFALL]” (“my email is sam@nightfall.ai” → “my email is [REDACTED BY NIGHTFALL].”)
  • Substitute with InfoType “my email is sam@nightfall.ai” → “my email is [EMAIL].”
from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, Nightfall

nightfall = Nightfall()
payload = ["4916-6734-7572-5015 is my credit card number"]
findings, redacted_payload = nightfall.scan_text(
    payload,
    [DetectionRule([
        Detector(
            min_confidence=Confidence.LIKELY,
            nightfall_detector="CREDIT_CARD_NUMBER",
            display_name="Credit Card Number",
            redaction_config=RedactionConfig(
                remove_finding=False,
                substitution_phrase="SubMeIn")
        )]
    )]
)
print(findings)
print(redacted_payload)

We’ll see our findings object returned to us looks like this (with line formatting added for clarity):

[[Finding(
  finding='4916-6734-7572-5015', 
  redacted_finding='SubMeIn', 
  before_context=None, 
  after_context=None, 
  detector_name='Credit Card Number', 
  detector_uuid='74c1815e-c0c3-4df5-8b1e-6cf98864a454', 
  confidence=<Confidence.VERY_LIKELY: 'VERY_LIKELY'>, 
  byte_range=Range(start=0, end=19), 
  codepoint_range=Range(start=0, end=19), 
  matched_detection_rule_uuids=[], 
  matched_detection_rules=['Inline Detection Rule #1'])
]]

And our redacted input payload in our redacted_payload object:

['SubMeIn is my credit card number']

Instead of using a custom string as the substitution (SubMeIn), we may want to use the name of the detector for additional context. We can make a one-line change to the example above, replacing substitution_phrase="SubMeIn" with infotype_substitution=True.

This yields:

['[CREDIT_CARD_NUMBER] is my credit card number']

When to use Substitution

Substitution is effective in scenarios where you intend to replace sensitive data with a contextual label, for example, you wish to replace a literal credit card number with the label “Credit Card Number”. This provides context to the reader of the data that the data is a credit card number, without exposing them to the actual token itself.

Encryption

Encrypt sensitive data findings with a public encryption key that is passed via the API. Make the encryption algorithm configurable.

Encryption is a complex topic so we’ll go into a more in-depth tutorial on encrypting and decrypting sensitive data with Nightfall in a separate post, but let’s run through the basics below.

Nightfall uses RSA encryption which is asymmetric, meaning it works with two different keys: a public one and a private one. Anyone with your public key can encrypt data. Encrypted data can only be decrypted with the private key. So, you’ll pass Nightfall your public key to encrypt with, and only you will have your private key to decrypt the encrypted data.

Example with Nightfall DLP API

  • Default case public_key=”MIG…AQAB” (“my ssn is 518-45-7708” → “my ssn is EhOp/DphEIA0LQd4q1BUq8FtuxKj66VA381Z9DtbiQaaHvy5Wlvtxg0je91DFXEJncOWbhgPbt7EvBl36k5MFlFdPbc5+bg40FxP676SnllEClEO+DDsuiRCk9VC4noAd0zLxgvV8qD/NPE/XhTfOpscqlKhllfTg7G5jZYYSG8=”)

For our example, we’ll use the cryptography package in Python, so let’s install it first:
pip3 install cryptography

Let’s first generate a public/private RSA key pair in PEM format on the command line. We’ll cover how to generate keys programmatically in Python in our encryption-specific tutorial.

First, we’ll generate our private key and write it to a file called example_private.pem:

openssl genrsa -out example_private.pem 2048

Next, we’ll generate our public key in PEM format from this private key:

openssl rsa -in example_private.pem -outform PEM -pubout -out example_public.pem

Let’s take a look at our public key with cat example_public.pem:

-----BEGIN PUBLIC KEY-----
MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAnszkbHNclOhYgEc1lMPn
6KLm3cXS+w2CRBSEC5HFlqOUmdcXWnBFa9tlJYvXhQYuMFhXBcjUYgVUSAftK703
oTFMwRGZNnBjcUnNSK+pD4iaCEmdskkSA85GFCPsO1yrcfJp4965c43FrgWqyo7A
Aka5sGW9gX2wibQpQhil9TS0vtWHvEOq1TZnFAJD/DEJFN7zIQhglA/53Vd5PEL9
8fSfXxzbtu68wwhRtRqTaVRjzslx6i2Xs/QWcS/sWnKhnuF/enjlcll+SLyDEoPO
6iGp8MpHkZzJHmjATQJBA1vyu+mqo+G3wWm7WPME6V83VBNfG4wdkZCx/n9N5KzH
yQIDAQAB
-----END PUBLIC KEY-----

Remember to keep your private key safe. Anyone with this key can decrypt your encrypted data.

Now we can use our public key to encrypt any content with Nightfall! To do so, we’ll first read the public key into a string.

from nightfall import Confidence, DetectionRule, Detector, RedactionConfig, MaskConfig, Nightfall
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography.hazmat.primitives import serialization

with open(f'example_public.pem', "rb") as key_file:
    public_key = serialization.load_pem_public_key(
        key_file.read()
    )

pem = public_key.public_bytes(
        encoding=serialization.Encoding.PEM,
        format=serialization.PublicFormat.SubjectPublicKeyInfo
    )

pem_str = pem.decode('utf-8')

Now, we’ll pass the public key into our redaction configuration, similar to the above examples, so Nightfall can use it to encrypt your sensitive data.

nightfall = Nightfall()
payload = [ "4916-6734-7572-5015 is my credit card number" ]
findings, redacted_payload = nightfall.scan_text(
                        payload,
                        [ DetectionRule([ 
                            Detector(
                                min_confidence=Confidence.LIKELY,
                                nightfall_detector="CREDIT_CARD_NUMBER",
                                display_name="Credit Card Number",
                                redaction_config=RedactionConfig(
                                remove_finding=False, 
                                public_key=pem_str)
                            )])
                        ])
print(findings)
print(redacted_payload)

We’ll see our findings look like this (with line formatting added for clarity):

[[
  Finding(
    finding='4916-6734-7572-5015', 
    redacted_finding='ar4PGD1T3yCBjBdgJ+iX2Ak3hZYIyaaKcRY+AcNS3RjsGnss9hUA9Q0ycLtBOaMjFMeTdCupCEPNUFVYyzeWhHmL009DwWshV47Vkm84zB5O6HroJHAG0JpKHb6bLL58hAb9FHZ73usU4bI67ZEtJhX41HovlOfSCaeUnH4y3pPqRnh7d5roX7EIYQ39wzPGGo2TNbeyqm2pluC1G4Mqt9hLqy0tCwfbmKPXro41i9i1xED9GkVcnxTu0gS8bCMFkvAK4S+Hw0K/gqPq0hu2JGoryKo335IYBCit6S39JESJdNh7IafuE6mrmvYMlR9l4c60VkowEMZAPkUjOelPDw==', 
    before_context=None, 
    after_context=None, 
    detector_name='Credit Card Number', 
    detector_uuid='74c1815e-c0c3-4df5-8b1e-6cf98864a454', 
    confidence=<Confidence.VERY_LIKELY: 'VERY_LIKELY'>, 
    byte_range=Range(start=0, end=19), 
    codepoint_range=Range(start=0, end=19), 
    matched_detection_rule_uuids=[], 
    matched_detection_rules=['Inline Detection Rule #1'])
]]

And our redacted input payload in our redacted_payload object (truncated for clarity):

['GpcjUg74...BpQHw== is my credit card number']

When to use Encryption

Encryption is well-suited for use cases where you want to preserve the original sensitive data but ensure that it is only visible to sanctioned parties that have your private key. For example, if you are storing the data or passing it to a sanctioned third-party for processing, encrypting the sensitive tokens can add one additional layer of encryption and security, while still allowing a downstream processor to access the raw data as required with the key.

Congrats! You’ve now learned about and implemented multiple redaction techniques in just a few lines of code. You’re ready to start adding redaction into your apps.

Subscribe to our newsletter

Receive our latest content and updates

Nightfall logo icon

About Nightfall

Nightfall is the industry’s first cloud-native DLP platform that discovers, classifies, and protects data via machine learning. Nightfall is designed to work with popular SaaS applications like Slack, Google Drive, GitHub, Confluence, Jira, and many more via our Developer Platform. You can schedule a demo with us below to see the Nightfall platform in action.

 

Schedule a Demo

Select a time that works for you below for 30 minutes. Once confirmed, you’ll receive a calendar invite with a Zoom link. If you don’t see a suitable time, please reach out to us via email at sales@nightfall.ai.

call to action

See Nightfall in action.

Schedule a demo