Join us Thurs, June 24 at 11 AM PT for a live discussion about the growing risks of data exfiltration posed by code repos. Learn more.
GitHub DLP Remediation Guide
GitHub is a code versioning tool, which means that it preserves a full history of searchable code changes. Sensitive data can proliferate in these code changes and is not always easily discoverable.
Credentials & secrets that are hard-coded in GitHub repositories pose risk if repos are leaked or accessed via social engineering attacks, as they can provide access to infrastructure, databases, and third-party APIs. Likewise, sensitive data like customer PII can end up in code repos. This can raise significant security, compliance, and brand risk.
Nightfall automatically detects sensitive data in GitHub repos in real-time across any code push event to GitHub. With Nightfall’s context-rich detection results, you’ll be able to prioritize remediation efforts based on the type and location of sensitive data. However, git is a complicated protocol and because git is designed to be a full historical trail of commits, remediation is non-trivial. As a result, we’ve outlined best practices to remediate violations below.
1. Rotate Your Credentials
In their data security guide, GitHub states the following: “Once you have pushed a commit to GitHub, you should consider any data it contains to be compromised. If you committed a password, change it! If you committed a key, generate a new one.” Secrets such as API keys and cryptographic keys should be considered high priority risks.
If you have pushed credentials & secrets to a repo, your first step should be to immediately revoke the compromised credentials and generate new ones.
After doing so, please remember to update your application code or config/environment variables accordingly to ensure things continue to work with your new credentials. If your credentials are used by other developers or deployed in your infrastructure, make sure they all get a new version of it.
Most common services that issue API keys such as Twilio, Stripe, Twitter, etc. typically all have mechanisms to revoke a key and generate a new one. This can be found in the admin panel of their API consoles. If you have questions about how to rotate a specific service’s token, please reach out to Nightfall Support at email@example.com and we would be happy to look into it.
2. Remove Historical References
Simply because a GitHub repo is private doesn’t mean that sensitive data can or should be stored there safely. On GitHub, developers associate their personal GitHub accounts with their corporate organizations. This means that it can be easy for the lines to blur. You can view who can access your repos on GitHub.
Similarly, an organization’s repos can have collaborators from outside the organization. Check on this access by navigating to your GitHub organization and clicking the People tab. From here, click Outside Collaborators to review who can access your organization’s repos. Consider removing or modifying access.
You can customize access to each repository in your organization with granular permission levels, giving people access to the features and tasks they need. Likewise, you can set base permissions for the repositories that your organization owns.
GitHub repos should only be made public when strictly necessary. If a repo is intended to be open-sourced, it should undergo a thorough code review to confirm no sensitive information or trade secrets are revealed. If a repo is public that shouldn’t be, you can make it private.
Navigate to your GitHub repository and click Settings. In the “Danger Zone” section, click “Make private” if the repository is public. If the repo is no longer necessary, click “Delete this repository” to permanently remove it.
If the sensitive data can be successfully rotated per above, it may not be necessary to rewrite git history. However, it may be helpful to go the extra mile to keep your repositories clean and avoid future fire drills if the credentials are discovered at a later point and it is unclear if the secret has been correctly rotated or not.
If the sensitive data cannot be rotated, for example PII, then rewriting git history is necessary.
There are two recommended methods for rewriting git history, the BFG Repo-Cleaner tool, which we describe below, and git filter-branch to which we’ve linked steps.
The BFG Repo-Cleaner is a tool that’s built and maintained by the open source community. It provides a faster, simpler alternative to git filter-branch for removing unwanted data. For example, to remove your file with sensitive data and leave your latest commit untouched, run:
$ bfg --delete-files YOUR-FILE-WITH-SENSITIVE-DATA
To replace all text listed in passwords.txt wherever it can be found in your repository’s history, run:
$ bfg --replace-text passwords.txt
After the sensitive data is removed, you must force push your changes to GitHub.
$ git push --force
See the BFG Repo-Cleaner‘s documentation for full usage and download instructions.
There are additional ways to modify git history, which you can review in git documentation for advanced & specific use cases: Git Tools – Rewriting History.
3. Review Access
Various third-party services provide access logs describing when secrets & credentials are used or called. Where possible, review these access logs to determine if any exposed secrets or credentials had been leveraged. If so, you may need to undergo a deeper investigation around a potential security incident.
For example, Zendesk provides the following Activity portal for their API. If you identified that a Zendesk API key was leaked, you could subsequently check this portal to determine if there is any anomalous usage activity beyond what you would normally expect.
4.Establish Your Workflow & Process
If you are collaborating on code with others in your organization or have limited permissions, it may not be feasible to remediate the compromised secrets on your own or complete the steps outlined above. In this case, it’s beneficial to set up a process by which you can log compromised secrets and notify the responsible party who will be able to act on them.
Project management and bug tracking tools are good options for managing the process of remediating compromised credentials & secrets. For example, Jira and Linear. Consider creating tickets that include the following information so the Assignee has sufficient context:
- Repository – e.g. nightfalldlp/sample
- Commit reference – e.g. d3cce9f
- File path – e.g. sample.py
- Branch – e.g. main
- Link to the specific line of code on GitHub – e.g. https://github.com/nightfalldlp/sample/blob/master/sample.py#L2
When deciding who to notify and what to say in your message, it’s important to first consider the following:
- Is the repo still actively used or maintained? If not, it may be best to advise the repo owner to archive or delete the repo.
- Who committed the compromised credentials? The author who committed to secrets into the codebase may have the ability to follow the remediation steps above.
- Who owns or manages the repo? Perhaps the original author of the commit that introduced the compromised credentials is no longer in your organization or has changed roles/responsibilities. In this case, it may be best to reach out to the repo’s owners or admins. You can see who has access to the repo by navigating to Settings then Manage Access.
5. Preventing Credential Exposure Going Forward
Now that you know what types of sensitive data tend to get exposed on GitHub and have begun to remediate these violations, you can be proactive about mitigating these risks moving forward via the following recommendations.
Use a visual program like GitHub Desktop or gitk to commit changes.
Visual programs generally make it easier to see exactly which files will be added, deleted, and modified with each commit.
- Avoid the catch-all commands
git add .and
git commit -aon the command line — use
git add filenameand
git rm filenameto individually stage files, instead.
git add --interactiveto individually review and stage changes within each file.
git diff --cachedto review the changes that you have staged for commit. This is the exact diff that
git commitwill produce as long as you don’t use the
Name sensitive files in
.npmignore to avoid checking them into git. Learn more about gitignore.
- Use local environment or configuration variables so the application retrieves these variables dynamically instead of hard-coding them into files
- Centralize credentials with a secrets management service “secrets as a service” solution
- Services like Stripe, Twilio, SendGrid, Zendesk, etc. whose API keys you may be using may have different security controls you can take advantage of:
- Enable IP allowlisting if the service allows it.
- Restrict permissions associated with the key.
- Certain services like SendGrid and Stripe have granular permissions management across API keys, similar to IAM roles on AWS:
Treat secrets equally. Protect dev/test secrets in addition to production secrets.
It’s easy to confuse dev/test secrets with production ones, and they can end up in the wrong environment – treat all credentials as if they are sensitive and high-risk to avoid this.
Tools like Slack and email make communication seamless, but that also means they can open up pathways for easily transmitting credentials & secrets and sprawling this information to more data silos.
Nightfall has native integrations to identify sensitive data in other applications such as Slack and Confluence, where sensitive data is commonly proliferated by developers.
As software development proliferates across every industry and use case, code security is more important than ever. Consider a formal training program or service to ensure developers understand the risk and best practices.
Nightfall provides multiple ways to scan for sensitive data in code repositories – in near real-time upon code commit to GitHub, historically across all code changes, upon Pull Request, and via a git-hook on the developer endpoint device. Reach out to our Support team to learn more about these options and determine what fits best in your SDLC.
Reach out to Nightfall Support at firstname.lastname@example.org.
Subscribe to our newsletter
Receive our latest content and updates
Nightfall is the industry’s first cloud-native DLP platform that discovers, classifies, and protects data via machine learning. Nightfall is designed to work with popular SaaS applications like Slack, Google Drive, GitHub, Confluence, Jira, and many more via our Developer Platform. You can schedule a demo with us below to see the Nightfall platform in action.
Schedule a Demo
Select a time that works for you below for 30 minutes. Once confirmed, you’ll receive a calendar invite with a Zoom link. If you don’t see a suitable time, please reach out to us via email at email@example.com.