The Essential Guide to Secrets Scanning

Last Updated Dec. 31, 2024

What is Secrets Scanning?

Secret Scanning is a practice of automatically scanning text, messaging systems, and files for secrets to help set baselines for secret exposure and launch a remediation strategy. In an engineering context, secret scanning may apply to code that can become exposed by well-meaning employees during the development process, i.e. scanning the code you're pushing to GitHub to find secrets by enforcing policies in the CI/CD pipeline.

Even if you think your code repositories are private (not a public repository), it's a juicy target for attackers. At this point, threat actors have proven ingenuity to get in, so most security teams recognize it's bound to happen at some point–private repo or not.

But I don’t have secrets in my code.

While scanning code for secrets is one important way to scan for secrets, the reality is that various secret types sprawl far beyond coding contexts, so it’s important to have an approach that can find secrets across some of the places they most commonly end up. Consider the following scenarios, do any of them apply to your organization? If so, these scenarios can be addressed with secret scanning efforts, which we'll address later.

Passwords shared in Slack. Disney was completely pwned in a 2024 hack that exposed numerous secrets, including passwords employees shared in channels they assumed to be secure because they were private / invite-only.
Engineers log issues and bugs in Jira, where they often include API keys or tokens to help developers execute fixes.
CSVs of password manager exports stored in Google Drive.
GitHub app tokens to accounts with source code or other sensitive information
And more…

How big is the risk?

The risk posed by secrets can be hard to quantify because secrets refer to a wide variety of NHIs. However, some preliminary assessments indicate that managing secrets and credentials can be a difficult problem. Studies like the DBIR reveal that 74-80% (depending on the study) of all cyber attacks result from unsecured credentials. While other studies have found that nearly 35% of exposed cloud services API keys are still active.

Depending on what part of your cloud environment to which an active API key grants access, the damage that can occur due to an exposed key is significant. If the key leads to a cloud storage location containing your organization’s production and/or staging environments for software you build or manage, for example, one stolen key could put potentially catastrophic events in motion for your organization.

Needless to say, exposure of secrets is among the most critical vulnerabilities facing security teams today–especially because they tend to be shared in locations where standard DLP provides no visibility. Even worse, most DLP and secret scanning tools struggle to identify secrets with any degree of accuracy.

Where should I scan for secrets?

Secrets and credentials can live in a variety of places due to the nature of managing a large or distributed organization with a number of independent collaborators who may or may not be following security policies and best practices. The truth is most employees are doing their best to prevent accidental exposure and simply need more real-time training in how to protect your organization's security posture.

Below are some of the places you can begin to search for secrets and credentials to begin narrowing the scope of your risk.

Scan for secrets in code.

The most obvious place to start scanning for secrets is in code. You might want to first do this by conducting a historical scan of code that’s already been committed to your repositories. Secrets can exist:

In code itself (i.e., hardcoded secrets)
In configuration files on the file system (i.e., environment variables)
Containers

You’ll also want to ensure that secrets and credentials won’t continue to be added to your repositories by implementing a code review process and following coding best practices for mitigating secrets exfiltration from repositories. ‍

Scan for secrets across your SDLC tech stack

You’ll also want to scan for secrets and credentials within the applications that your developers use across your organization’s software development lifecycle. This can include things like SaaS applications used in the SDLC, such as:

Ticketing systems - Jira, Linear
Knowledge management/wikis - Confluence
Bug tracking - Sentry, Bugsnag
Collaboration/chat - Slack, MS Teams
Support - Zendesk

Scan for secrets within observability pipelines.

We’ve talked before about the issue of sensitive data appearing within observability platforms. This can happen as a result of stack traces or other types of issue that emerge within logging application activity and performance and is a pretty common issue within the software industry:

Meta was fined $101 million in 2024 for admittedly storing hundreds of millions of users' passwords in plaintext and allowing broad access to employees.
Incorrectly configured Firebase instances led to the exposure of nearly 19 million passwords in plaintext.
More than 1 million active Google API keys, 250,000 Google Cloud credentials, and 140,000 AWS authentication secrets were verified circulating in GitHub

This is a problem that has to be actively addressed, as secrets and other sensitive information can live within logs for years. Attackers with access to secrets represent more than just "potential security threats". But where should you start looking for secrets and credentials within data infrastructure?

Observability & logging - Datadog, Fluent Bit, Splunk, Cribl
Data stores - Amazon RDS, MongoDB, Kafka

How do I know if it’s a secret?

This is the hard part. Secrets are often long, complex, and random, even when you know they are following a pattern. For example, all stripe secret keys begin with “sk.” However, this might not be enough to distinguish a key or secret from gibberish.

There are three major types of detection methods that are in use today. These include:

Regular expressions - Also known as regex, regular expressions are used to search for expected characters that are anticipated to be part of a string by letting you define characters that you know are part of the text you’re looking for. But regex is bad at capturing variation across different types of services, e.g. AWS or GCP. It's also sorely insufficient for unstructured data formats common to actual secrets and is more appropriate for highly structured data like Social Security numbers or credit card information.
Entropy - Entropy refers to the amount of complexity or variability in a string of characters (see Shannon Entropy). Setting thresholds for entropy can help build an informed determination about the likelihood that a string is a credential/secret, as opposed to any other piece of information.
Machine learning - In the context of secrets, machine learning refers to algorithms trained on features extracted from a broad set of API key patterns and their surrounding context in code. ML is capable of extracting whether a character string is a credential/secret or not based off the context of the finding, without relying on indicators like naming conventions, regexes, or entropy thresholds. With techniques like natural language processing (NLP) and deep learning, naming conventions don’t matter—only meaning does.

When detectors use these methods in layers to filter out obvious sensitive information, then incorporate a robust layer of properly trained and maintained AI, it can become self-learning enabled.

How do I scan for secrets?

Scanning for secrets and credentials can be done in multiple ways. Most tools utilize scanning rules as a means to tell them what to look for, where, how often, and what constitutes a true positive. More advanced tools not only have AI-powered detection models, but support granular configuration of policies, so you can adapt them to your infrastructure, data types, and compliance needs.

Thorough protection requires comprehensive scanning (scans of data at rest and in motion/in use), but it's okay to start with your biggest areas of risk based on your organization's unique business model, secret types, and most vulnerable locations.

At-rest scanning, which includes historical scans of–

Any potential repositories for secrets
All data at rest in SaaS applications
All data in cloud workspaces

Real-time scanning:

Upon new code push event or in CI/CD process
Upon new data entered or changed in SaaS applications

How do I mitigate potential security risks of exposed secrets?

Once you have identified a secret, you need a way to remediate. You can always use secrets scanning tools and route all alerts to your security teams. However, they are likely to become overwhelmed fairly quickly, given all the security alerts they are tasked with investigating.

Self-remediation Makes Employees First Line of Defense

Another approach is to invite users to self-remediate. This can help proactively reduce insider risk in three ways. First, it removes the data from a vulnerable location. Second, allowing users to fix their own mistakes improves their understanding of how to improve personal data hygiene. Over time, this reduces risk of a breach across the entire company. Third, partnering with users improves their sense of collaboration and well being as it relates to cybersecurity. Rather than feeling like there is a security team watching over their shoulders for mistakes, which can actually increase insider risk, collaboration fosters trust.

How can Nightfall help?

It's important to find the right partner who uses powerful scanning techniques and advanced detection models.

Nightfall can accomplish all the methods described above in a single platform as the industry’s first cloud-native data loss prevention solution designed to discover, classify, and protect sensitive data in cloud environments. This includes all manner of secrets (API keys, encryption keys, more) and credentials like passwords. Nightfall's best-in-class API & secrets detection capabilities can identify whether secrets are active to help you prioritize what to remediate.

Nightfall integrates via API to some of the most popular SaaS applications like Slack, GitHub, Jira, Confluence, Google Drive and more. Plus, with the Nightfall developer platform, Nightfall can be integrated anywhere with just a few lines of code. Reach out to us to learn how you can get started in just minutes.

What else can I do beyond Secret Scanning or secret management tools to protect secrets & credentials?

There are a number of security policies and best practices that can help protect your environments from the severe risks associated with compromised secrets:

Enforce MFA. One way to protect your secrets and credentials is to implement multi-factor authentication (MFA). With MFA in place, even if an attacker manages to steal a user's credentials, they will not be able to access the account unless they also have possession of the second factor, which could be a one-time password (OTP) generated by an authenticator app or hardware token.
Use a password manager. Another way to protect your secrets and credentials is to use a password manager. A password manager is a software application that helps users generate strong passwords and securely store them. When using a password manager, users only need to remember one master password—the password manager takes care of the rest.
Use a key management system. The best way to do this is to use a secret management service like AWS Secrets Manager or Hashicorp Vault. These services will encrypt your keys and store them in a secure location.
Encrypt credentials & secrets. Encryption is another great way to protect your secrets and credentials. When data is encrypted, it is turned into a code that can only be decrypted with the proper key. This means that even if someone obtains your data, they will not be able to read it unless they have the key. Encryption is a very effective way to protect data, but it can be time-consuming and expensive to implement.
Restrict ACLs to least privilege. An access control list (ACL) is a list of permissions that determine who can access what resources. ACLs can be used to restrict access to files, directories, and other resources. ACLs are a great way to control access to sensitive data. They can also be used to limit the amount of damage that can be done if a credential is compromised.
Employee training & coaching. It's also important to educate your employees about security best practices. Your employees are one of your organization's greatest assets—but they can also be one of your greatest vulnerabilities if they're not properly trained on how to safeguard sensitive information. Make sure your employees know how to identify phishing emails, spot social engineering attacks, and follow proper password hygiene practices.
‍Develop a password policy. Password policies are rules that dictate how passwords must be chosen and used. Password policies can include requirements such as minimum length, maximum length, complexity, and expiration date. Password policies are a great way to ensure that passwords are strong and secure. They can also help to prevent credential stuffing attacks, in which stolen passwords are used to gain access to multiple accounts.
Regular key rotation. Change keys regularly so any exposed secrets are routinely outdated.

For a fuller list of policies and processes that can help keep your organization secure, check out Security Playbook for Remote-first Organizations, which you can read in its entirety for free online.

‍