Webinar: Join us, Tues 5/24. Nightfall & Hanzo experts will discuss how machine learning can enhance data governance, data security, and the efficiency of legal investigations. Register now ⟶

Discover Sensitive Data in Amazon S3 with DLP
Blog 7 min read

Scanning Amazon S3 Buckets with Nightfall Data Loss Prevention (DLP)

by Isaac Madan Published Nov 14, 2021

Tutorial: Learn how to automatically discover and classify sensitive data in your Amazon S3 buckets with Nightfall’s sensitive data scanner to facilitate data loss prevention (DLP).

In this tutorial, we will walk through the end-to-end process of scanning your Amazon S3 buckets for sensitive data with Nightfall’s S3 Sensitive Data Scanner. By the end of this tutorial, you will have an exported spreadsheet report (CSV) of the sensitive data in your S3 buckets. You can then use this report in your data loss prevention (DLP) efforts to remediate/remove sensitive content for better security/privacy or use it as part of your compliance efforts, for example in relation to PCI-DSS.

Background

Organizations store high volumes of business-critical information in Amazon S3, such as personally identifiable information (PII), credit card information, secrets & credentials, and more. Identifying and protecting sensitive data in Amazon S3 is increasingly time-consuming, complex, and expensive, especially as your organization takes on more data.

Data leaks and improper storage in S3 can lead to compliance violations such as PCI-DSS, HIPAA, FedRAMP, and other compliance requirements. This tutorial will give you the ability to get visibility into the sensitive data that lives in your S3 buckets, which is the first step in having a DLP strategy for S3 and developing a strong security posture in the cloud.

Prerequisites

This tutorial will take about 15 minutes to complete, and in order to complete it you will need access to your AWS Management Console with sufficient permissions to create/edit IAM roles, permission policies, user groups, and if applicable, encryption keys.

You’ll also need a Nightfall account, but don’t worry if you don’t have one yet, as we’ll create one later in the tutorial. You’ll be able to scan up to 3 GB of data per month for free (no credit card required), and can upgrade at any time to the Usage tier to scan an unlimited amount of data, starting at $3 per GB per month (and scaling down with volume).

Create an IAM Role

First we will create an IAM role for the Nightfall Sensitive Data Scanner to use to access S3 buckets.

  • Navigate to IAM in the AWS Management Console. Select Users in the left navigation and Add users
  • In the User name field enter a name for Nightfall’s user role such as nightfall-scanner
  • Under Select AWS access type select Access key – Programmatic access and click Next

  • On the Permissions page, select Add user to group and click Create group
  • Give the group a name like nightfall-scanner-group and click Create policy
  • This will take you to the Policy creator in a new tab. Click the JSON tab, and paste the following policy:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:ListAllMyBuckets",
                "s3:ListBucket"
            ],
            "Resource": "*"
        }
    ]
}

It will end up looking like this in the UI:

  • Click Next. Skip the Tags page and click Next: Review
  • Give your policy a name like nightfall-scanner-policy and click Create
  • Now flip back to the Create group modal in your other tab window, click the Refresh button, and search for your newly created policy. Check box this policy and click Create group
  • Click Next: Tags and leave this blank, and click Next: Review
  • Confirm that you’ve completed the steps above correctly and click Create user

Now you will be presented with your Access key ID and Secret access key. Download the CSV file with these credentials or otherwise copy them locally. Keep this information safe as it allows for access to your S3 account, per the permissions we just assigned. You will need these credentials to run your S3 scan in a later step.

Assign Key Access (if needed)

Note: If you don’t have KMS encryption enabled on the files that you wish to scan, you can skip down the next section.

If you have files in S3 that you wish to scan that have server-side encryption enabled, you will need to grant access to this user to the assigned KMS key in order to open these files. You can check if a file has server-side encryption enabled with Amazon KMS by navigating to the file in the S3 file browser in the AWS Management Console.

Checking encryption settings on files in Amazon S3 for DLP

To grant access to the key, navigate to KMS in the AWS Management Console.

  • Click Customer managed keys in the left sidebar and select the relevant key from the list
  • Under Key users click Add and select the IAM user we just created above nightfall-scanner. Nightfall will then be able to scan files encrypted with this key.

Repeat these steps for any and all keys that encrypt files that you wish for Nightfall to scan.Now we have completed the steps required for Nightfall to access and scan the appropriate resources in your AWS account. We will transition over to configuring Nightfall itself.

Configure Detection

You have the ability to customize Nightfall’s detection engine by inputting your own Nightfall API key. This gives you granular control over what Nightfall detects and is powered by the Nightfall Developer Platform. You can select from our library of high-accuracy, pre-built detectors, or build your own custom detectors.

If you don’t specify an API key, Nightfall will fallback to use a default detection rule that detects likely credit card numbers, US social security numbers, and API keys.

How does pricing work? It’s free to get started with the Nightfall Developer Platform and to create your own API key. No credit card is required. You’ll be able to scan up to 3 GB of data per month for free without a credit card.

On the Free tier, scanning will stop when your API key reaches its 3 GB monthly limit. Add your credit card to upgrade to our Usage tier, which is billed at a starting rate of $3 per GB scanned. You’ll pay only for what you use with your Nightfall API key.

If you have questions or concerns about costs based on the data volumes you are looking to scan, reach out to us at support@nightfall.ai to discuss our Enterprise plan.

Read more about pricing in our API Docs.

  • First, log in to your Nightfall Dashboard (app.nightfall.ai). If you don’t have an account, sign up for a free account here (app.nightfall.ai/sign-up).
  • Once you’re logged in to the Dashboard, click Detection Rules in the left sidebar.
Creating a DLP detection rule for Amazon S3
  • Click New detection rule and give the detection rule a name like My First Detection Rule
  • In this example, we’ll create a detection rule that matches the default detection rule described above
  • Click + Detectors and scroll through the list or type in the search bar to find Credit card number, US Social Security number, and API key – select all three
Choosing DLP detectors for scanning Amazon S3
  • Click Confirm and you should see the added to your detection rule as follows
  • We will leave the rest of the default settings here as is, we can always come back to fine tune our detection rule later. Click Save Detection Rule
Creating a DLP detection rule for scanning Amazon S3
  • You’ll now see the detection rule in your list of detection rules. Note down the Detection Rule UUID as we will need this later when configuring our scan and telling Nightfall what detection rule we want to use. If you hover over it, you’ll be able to copy it to your clipboard.

Did you know? The S3 scanner supports multiple detection rules. Create up to 10 detection rules and list them all in your scan settings for more advanced detection configuration.

Create API Key

  • Next, navigate to the Overview tab under the Developer Platform header in the left sidebar.
  • We’ll create our first API key that we’ll use for the scan. Click Create key and give it a name like my-first-key
Creating an API key for Amazon S3 DLP
  • Copy the key locally to a file or environment variable as you will not be able to see they key again, and you will need to reference the key when starting your scan

Start Scan

We’ve now authorized Nightfall to access our S3 buckets and we’ve configured our detection settings for the scan, so we are ready to start our scan. Navigate to our S3 Sensitive Data Scanner (playground.nightfall.ai/s3)

  • First, enter your AWS Access Key ID and AWS Secret Access Key which you generated earlier in the tutorial.
  • Next, input the AWS Region in which the buckets you wish to scan are located. If you wish to scan buckets across scans, you’ll need to trigger multiple scans if your buckets are only accessible from a specific region. For example, enter us-east-1 if this is the region you’d like to scan buckets in.
  • Next, we will list out buckets we wish to exclude from the scan. By default, the S3 scanner will scan all buckets, so you have the option to specify any buckets that you want to explictly exclude. For example, let’s say you have the following S3 buckets in your S3 account:
bucket1
bucket2
bucket3
bucket4
bucket5

In our scan, let’s say we know that we do not want to scans buckets 2 and 5, so we will input the following in the text box, one bucket key per line.

bucket2
bucket5
  • Next, input your email address. This is where scan results will be sent as a CSV attachment.
  • Input your Nightfall API key and the Detection Rule UUID corresponding to the Detection Rule we created earlier.
  • All fields should now be filled out, and you’re ready to click Start Scan!

The scan will run in the background and you’ll receive an email with the results when it’s complete.

Review Results

You’ll receive a spreadsheet (CSV) via email with your scan results when complete. The results will have the following info:

  • ItemID – S3 Object key
  • ParentID – S3 Bucket key
  • ItemType – For S3, will always be Object
  • FewCharsBefore – 5 characters before the sensitive token, to provide context
  • Finding – The sensitive token, with all but the last 4 characters redacted with *
  • FewCharsAfter – 5 characters after the sensitive token, to provide context
  • Detector – Name of the detector triggered, e.g. Credit Card Number
  • StartChar – Character location within the file where the token starts, e.g. 100 means the token starts at the 100th character read in the file
  • EndChar – Character location within the file where the token ends, e.g. 119 means the token ends at the 119th character read in the file
  • Confidence – Confidence level of the detection triggered, will be one of Possible, Likely, or Very Likely

What’s Next?

Now that you’ve run your first scan, you can:

  • Fine tune your detection rules and add your own custom detectors
  • Identify opportunities to remove, encrypt, restrict sensitive content
  • Use generated reports for compliance with regimes like PCI-DSS, HIPAA, and GDPR
  • Scan additional AWS accounts, regions, and buckets to cover more of your cloud footprint
  • Run S3 DLP scans on a routine basis, such as monthly
  • Consider scanning other systems with Nightfall – we have similar scanners with apps like Zendesk, Jira, and more here (playground.nightfall.ai/scanners), and provide complete, native DLP functionality for apps like Slack and Google Drive

Subscribe to our newsletter

Receive our latest content and updates

Nightfall logo icon

About Nightfall

Nightfall is the industry’s first cloud-native DLP platform that discovers, classifies, and protects data via machine learning. Nightfall is designed to work with popular SaaS applications like Slack, Google Drive, GitHub, Confluence, Jira, and many more via our Developer Platform. You can schedule a demo with us below to see the Nightfall platform in action.

 

Schedule a Demo

Select a time that works for you below for 30 minutes. Once confirmed, you’ll receive a calendar invite with a Zoom link. If you don’t see a suitable time, please reach out to us via email at sales@nightfall.ai.

call to action

See Nightfall in action.

Schedule a demo