How does sensitive information end up in observability platforms?

Observability (logs, traces, metrics) is a core tenet to building strong software systems. Logs are used to debug issues and check on system activity, traces provide valuable insights into system performance and architecture, and metrics allow engineering teams to closely track business metrics within their systems.

Observability stacks can come in SaaS offerings such as DataDog or New Relic, or as a self-hosted ELK stack. We will give some examples about how sensitive data can end up being stored in these systems, either intentionally or unintentionally. Observability systems usually have fewer access controls than their associated application environments, are multi-tenanted and often open to the internet. This exacerbates security issues when observability data contains sensitive information because there's no fine grained access control. For example, Engineering and Product teams usually have full access to observability data such as logs and dashboards which increases the potential attack vector for hackers to access sensitive information.These vulnerabilities happen regularly, even to the world’s biggest tech companies:

Facebook admits it stored ‘hundreds of millions’ of account passwords in plaintext - TechCrunch
You should change your Twitter password right now - TechCrunch
GitHub says bug exposed some plaintext passwords - ZDNet
Google admits to storing plaintext passwords - Tech Beacon

Many of these incidents last months (the Facebook incident) to years (Google) as there is often no automated system monitoring for data leakage. These sensitive data breaches are commonly found during audit periods or stumbled across in unrelated log searches. Instead of relying on audit periods or pure luck to find sensitive data in logs, companies can opt for a continuous scanning solution like Nightfall's Fluent Bit integration to scan logs for sensitive information automatically in near-real time.

Engineers might choose to add sensitive data to their logs and traces - as it is useful! We generally collect that data for a business purpose, and sometimes we need to correlate it with a bug or some other symptom of how the system is performing. For example, one might log a credit card number that fails an active check for debugging later or reaching out to the user about. One might also choose to add a tag of a user’s social security number or name as a way of uniquely identifying them through the system - to see what their actions are and correlate them together to a single individual. This is not good stewardship of the data, as it is now printed and saved anywhere the logs or traces are saved, and leaves many places to be remediated when found.

However, much more commonly, sensitive user data, passwords, and API keys make their way to logs inadvertently or as metadata on traces/metrics.

Sensitive data leaks in practice

Our first example shows how we can accidentally leak usernames, passwords, or API keys in logs by simply logging requests or errors:

thing_api_key = os.getenv("EXAMPLE_COM_API_KEY")

def get_example_thing(thing_id):
params = {"api_key": thing_api_key}

try:
resp = requests.get("https://example.com/thing/" + str(thing_id), params=params)
resp.raise_for_status()
except RequestException as e:
logging.error("failed to run request %s", e)

If this API call fails, we log the API key even as the API key is securely stored in our secret store and injected into our environment as it is a parameter to the request, resulting in a log like this:

ERROR:root:failed to run request: HTTPSConnectionPool(host='api.example.com', port=443): Max retries exceeded with url: /thing/1234?api_key=adda9d800f92b4b2 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x10f2b9af0>: Failed to establish a new connection [Errno 8] nodename nor servername provided, or not known'))

We can resolve this issue by only logging the status code and return body of the request:

try:
resp = requests.get("https://example.com/thing/" + str(thing_id), params=params)
resp.raise_for_status()
execpt requests.exceptions.RequestException as e:
- logging.error("failed to run request %s", e)
+ logging.error("failed to run request: code: %s, body: %s, e.response.status_code, e.response.text)

Which results in the log only having the pertinent information:

ERROR:root:failed to run request: code: 404, body: {"error": "not_found", "code": 404}

To give a more concrete example, here is a toy usage of the HubSpot API, where we try and retrieve a contact from the CRM, and log our exception in a similar way, along with the client config to aid in debugging:

import logging
import os

from hubspot import HubSpot
from hubspot.crm.contacts.exceptions import ApiException

api_client = Hubspot(api_key=os.getenv("HUBSPOT_KEY"))

func get_hubspot_contact(id):
try:
contact = api_client.crm.contacts.basic_api.get_by_id(id)
except ApiException as e:
logging.error("Exception when calling get_hubspot_contact: %s, id: %s", e, id)
logging.error("API config: %s", api_client.config)

return contact

However, it may not be clear to the engineer that logging the API client config includes the secret used for authentication, and thus it ends up in our logs!

ERROR: root: Exception when calling create_token method: (401)
Reason: Unauthorized
HTTP response headers: HTTPHeaderDict({'Date': 'Wed, 02 Feb 2022 20:03:03 GMT', 'Content-Type': 'application/json;charset=utf-8', 'Content-Length': '280', 'Connection': 'keep-alive', 'CF-Ray': '6d76054878b56444-SJC', "Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'Vary': 'Accept-Encoding', 'CF-Cache-Status': 'DYNAMIC', 'Access-Control-Allow-Credentials': "false', 'Expect-CT': 'max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"', 'X-HubSpot-Correlation-Id': 'd4b37343-47e2-4300-ad52-c10c0a5a4aeb', 'X-Trace': '2B4BA45245C9C1D610AC6F6E0A7DCCBAICE36EFFC2000000000000000000', 'Report-To': '{"endpoints": [{"url": "https:\\/\\/a.nel.cloudflare.com\\/reportI\\/v3?s=&2BQiKhghnvSOM&2BrBt"}], "group": "cf-nel", "max_age":604800}', 'NEL': '{"success_fraction":0.01,"report_to": "cf-nel","max_age":604800} "Server': 'cloudflare", "alt-sve': 'h3=":443"; ma=86400, h3-29=":443"; ma=86400'})
HTTP response body: ("status": "error", "message": "The API key provided is invalid. View or manage your API key here: https://app.hubspot.com/l/api-key/","correlationId":"d4b37343-47e2-4300-ad52-c10c0a5a4aeb","category";"INVALID_AUTHENTICATION","Links": ("api key"; "https://app.hubspot.com/l/api-key/")}
ERROR:root:API config: ('api_key'; 'your_api_key', 'access_token': None, 'retry': None}

This can also happen if we are adding parameters to our SQL or HTTP traces.

Here is another contrived example in code that showcases how user data can be unintentionally logged in a database call:

class UserService:
...
def update_user(self, user_id, new_user):
users = self.session(select(User, user_id))
if not len(users) == 1:
logging.error("more than one user: %s", users)
raise UserIntegrityException()
...

In this example we have a User object and a service method called update_user. The function might be called in a backend when a user tries to update their information. Within the function there is an error case when there is an integrity error where there are two users with the same ID. In this error path we log an error message with some additional information including the user object to help us identify which user it was that had failed to update.

The resulting log line in our logging system could look something like this:

ERROR:root:more than one user: [User(id=1234, name='John Smith', address='1234 Green Road, California', social_security='123 45 6789', birthdate='1990-09-09', hashed_password='286755fad04869ca523320acce0dc6a4)', User(id=1234, name='Jane Smith', address='...

As we can see, the developer had inadvertently logged the entire User object not knowing that some of the fields contained within it had sensitive information that wasn’t necessary to debug the underlying error in production such as the hashed password and user’s name, address, and age. However, just an ID would have been sufficient to log to locate this row in the database, keeping this data in its appropriately secured location. Though contrived, this is a very common mistake that happens as logging libraries typically have convenience functions that will take entire objects, like our user object, and log them in a pretty format, rather than forcing engineers to pick which fields they would like to log. To bring some realism to this specific example, within a large engineering organization it might be different developers or teams that define the structs to the ones that go on to use them.

If this were to happen in production, our logging system now has user password hashes and PII logged to it unnecessarily posing a security and compliance risk. For example, if we were a healthcare company, this could be considered a HIPAA violation.

This type of problem could similarly occur when attaching metadata to distributed tracing or metrics.

The reality is, there are many best practices that developers can be coached on to minimize the risk to the company. However, these are never foolproof, and it’s important for security teams to have fallback mechanisms in place to truly ensure this is not a problem for their organizations. This is a perfect use case for Nightfall’s Developer Platform, which can help security teams identify when logs contain sensitive information and help them to remediate any violations that may occur. Nightfall also offers a Fluent Bit filter and a Cribl pack as well to help ensure you have visibility into what's going through your observability platforms.

Nightfall offers an API platform that allows developers to add data protection into their own products and workflows, and have more zero-code solutions coming in the future! Sign up here to get started with our detection engine API with no credit card needed.

How does sensitive information end up in observability platforms?

On this page

Sensitive data leaks in practice

Schedule a live demo