Webinar: Join us, Tues 5/24. Nightfall & Hanzo experts will discuss how machine learning can enhance data governance, data security, and the efficiency of legal investigations. Register now ⟶

Blog 6 min read

The Essential Guide to Secrets Scanning

by Michael Osakwe Published Nov 02, 2022

In today’s digital world, data breaches are becoming more and more common. In fact, recent studies found that a large majority of breaches are caused by stolen secrets & credentials, such as API keys. API keys are used to access data and resources from another application or service. They are typically used to connect two applications so that they can share data. For example, if you use a weather app on your phone, that app likely uses an API key to access the Weather Channel’s data.

While API keys are a necessary part of modern software development, they can also be a major security risk. If an attacker is able to steal an API key, they can gain access to the data and resources that key is meant to protect. There are a number of steps you can take to protect your API keys, secrets, and credentials and prevent them from being stolen. One such method that we will cover in depth in this guide is secret scanning.

What is Secret Scanning?

Secret Scanning is a practice of automatically scanning text and files for secrets, such as passwords or API keys. Most commonly, secret scanning applies to code: scanning the code you’re pushing to GitHub to find secrets by enforcing policies in the CI/CD pipeline. Even if you think your repository is private, there’s always a chance that someone could gain access to it; either by compromising authorized accounts or otherwise bypassing your repos’ security.

But I don’t have secrets in my code

While scanning code for secrets is one important way to scan for secrets, the reality is that secrets sprawl far and wide beyond code repositories, so it’s important to have an approach that can find secrets across some of the places they most commonly end up. Consider the following scenarios, do any of them apply to your organization? If so, these scenarios can be addressed with a secret scanning approach.

  • Passwords shared in Slack. This has caused hacks like 2020’s Twitter breach
  • Engineers log issues and bugs in Jira.
  • CSVs of password manager exports stored in Google Drive. 
  • And more…

How big is the risk?

The risk posed by secrets can be hard to quantify because secrets refer to a wide variety of items, including:

  • Passwords
  • API keys
  • Encryption keys (SSH, PGP, etc.)
  • Certificates (SSL, TSL, etc.)
  • Authentication tokens

However, some preliminary assessments indicate that managing secrets and credentials can be a difficult problem. Studies like Verizon’s 2022 Data Breach Investigations Report reveal that nearly half of all attacks result from insecure credentials. While other studies have found that nearly 40% of business have experienced an API key related security incident last year 

Where should I scan for secrets?

As we established, secrets and credentials can live in a variety of places due to the nature of managing a large or distributed organization with a number of independent collaborators who may or may not be following security policies and best practices. Below are some of the places you can begin to search for secrets and credentials to begin narrowing the scope of your risk. 

Scan for secrets in code

The most obvious place to start scanning for secrets is in code. You might want to first do this by conducting a historical scan of code that’s already been committed to your repositories. Secrets can exist:

  • In code itself (i.e., hardcoded)
  • In configuration files on the file system (i.e., environment variables)
  • Containers

You’ll also want to ensure that secrets and credentials won’t continue to be added to your repositories by implementing a code review process and following coding best practices for mitigating secrets exfiltration from repositories. Watch the brief video below to learn more:

Scan for secrets across your SDLC tech stack

You’ll also want to scan for secrets and credentials within the applications that your developers use across your organization’s software development lifecycle. This can include things like SaaS applications used in the SDLC, such as:

  • Ticketing systems – Jira, Linear
  • Knowledge management/wikis – Confluence
  • Bug tracking – Sentry, Bugsnag
  • Collaboration/chat – Slack, MS Teams
  • Support – Zendesk

Scan for secrets within observability pipelines

We’ve talked before about the issue of sensitive data appearing within observability platforms. This can happen as a result of stack traces or other types of issue that emerge within logging application activity and performance and is a pretty common issue within the software industry:

This is a problem that has to be actively addressed, as secrets and other sensitive information can live within logs for years. But where should you start looking for secrets and credentials within data infrastructure?

How do I know if it’s a secret?

This is the hard part. Secrets are often long, complex, and random, even when you know they are following a pattern. For example, all stripe secret keys begin with “sk.” However, this might not be enough to distinguish a key or secret from gibberish. 

There are three major types of detection methods that are in use today. These include:

  • Regular expressions – Also known as regex, regular expressions are used to search for expected characters that are anticipated to be part of a string by letting you define characters that you know are part of the text you’re looking for. But regex is bad at capturing variation across different types of services, e.g. AWS or GCP.
  • Entropy – Entropy refers to the amount of complexity or variability in a string of characters (see Shannon Entropy). Setting thresholds for entropy can help build an informed determination about the likelihood that a string is a credential/secret, as opposed to any other piece of information.


Machine learning – In the context of secrets, machine learning refers to algorithms trained on features extracted from a broad set of API key patterns and their surrounding context in code. ML is capable of extracting whether a character string is a credential/secret or not based off the context of the finding, without relying on indicators like naming conventions, regexes, or entropy thresholds.  With techniques like natural language processing (NLP) and deep learning, naming conventions don’t matter—only meaning does.

How do I scan for secrets?

Scanning for secrets and credentials can be done in multiple ways:

  • At-rest scanning, this includes historical scans of:
    • Full repositories
    • All data at rest in SaaS applications
  • Real-time scanning:
    • Upon new code push event or in CI/CD process
    • Upon new data entered or changed in SaaS applications

How can Nightfall help?

Nightfall can accomplish all the methods described above in a single platform as it  is the industry’s first cloud-native data loss prevention solution designed to discover, classify, and protect sensitive data in cloud environments. This includes all manner of secrets (API keys, encryption keys, more) and credentials like passwords.

Nightfall integrates via API to some of the most popular SaaS applications like Slack, GitHub, Jira, Confluence, Google Drive and more. Plus, with the Nightfall developer platform, Nightfall can be integrated anywhere with just a few lines of code. Reach out to us to learn how you can get started in just minutes.

What else can I do beyond Secret Scanning to protect secrets & credentials?

There are a number of security policies and best practices that can help protect your environments from secret exfiltration: 

  • Enforce MFA
    • One way to protect your secrets and credentials is to implement multi-factor authentication (MFA). With MFA in place, even if an attacker manages to steal a user’s credentials, they will not be able to access the account unless they also have possession of the second factor, which could be a one-time password (OTP) generated by an authenticator app or hardware token.
  • Use a password manager
    • Another way to protect your secrets and credentials is to use a password manager. A password manager is a software application that helps users generate strong passwords and securely store them. When using a password manager, users only need to remember one master password—the password manager takes care of the rest. 
  • Use a key management system
    • The best way to do this is to use a secret management service like AWS Secrets Manager or Hashicorp Vault. These services will encrypt your keys and store them in a secure location.
  • Encrypt credentials & secrets
    • Encryption is another great way to protect your secrets and credentials. When data is encrypted, it is turned into a code that can only be decrypted with the proper key. This means that even if someone obtains your data, they will not be able to read it unless they have the key. Encryption is a very effective way to protect data, but it can be time-consuming and expensive to implement.
  • Restrict ACLs to least privilege
    • An access control list (ACL) is a list of permissions that determine who can access what resources. ACLs can be used to restrict access to files, directories, and other resources. ACLs are a great way to control access to sensitive data. They can also be used to limit the amount of damage that can be done if a credential is compromised.
  • Employee training & coaching
    • It’s also important to educate your employees about security best practices. Your employees are one of your organization’s greatest assets—but they can also be one of your greatest vulnerabilities if they’re not properly trained on how to safeguard sensitive information. Make sure your employees know how to identify phishing emails, spot social engineering attacks, and follow proper password hygiene practices. 
  • Develop a password policy
    • Password policies are rules that dictate how passwords must be chosen and used. Password policies can include requirements such as minimum length, maximum length, complexity, and expiration date. Password policies are a great way to ensure that passwords are strong and secure. They can also help to prevent credential stuffing attacks, in which stolen passwords are used to gain access to multiple accounts. 
  • Regular key rotation
    • Change keys regularly so any exposed secrets are routinely outdated.

For a fuller list of policies and processes that can help keep your organization secure, check out Security Playbook for Remote-first Organizations, which you can read in its entirety for free online.

Subscribe to our newsletter

Receive our latest content and updates

Nightfall logo icon

About Nightfall

Nightfall is the industry’s first cloud-native DLP platform that discovers, classifies, and protects data via machine learning. Nightfall is designed to work with popular SaaS applications like Slack, Google Drive, GitHub, Confluence, Jira, and many more via our Developer Platform. You can schedule a demo with us below to see the Nightfall platform in action.

 

Schedule a Demo

Select a time that works for you below for 30 minutes. Once confirmed, you’ll receive a calendar invite with a Zoom link. If you don’t see a suitable time, please reach out to us via email at sales@nightfall.ai.

call to action

See Nightfall in action.

Schedule a demo