How it all began
From our beginnings as childhood friends to coming up together in the tech industry, Isaac and I would catch up on our adventures as professionals working in Silicon Valley: him in the VC world, and me as an engineer at Uber Eats. We’re both very interested in entrepreneurship, so we would always come back to discussing various business ideas, including a topic we’d both become intrigued with — the existing challenges enterprises faced with cloud data security.
My foray into the space began at Uber where I had observed that the combination of massive scale and rapid business change created significant challenges in managing and protecting sensitive data in cloud platforms. Similarly, while investing in security companies at Venrock, Isaac had become aware that CISOs were struggling to manage data security risk especially with privacy compliance regulations ramping up across the globe.
Thus began our “listening tour”, where we were able to identify gaps in the market and ways in which we could leverage our applied machine learning (ML) and distributed systems backgrounds to create a next generation cloud data security platform.
Where are we going?
Nightfall.ai started in 2018 as the first cloud-native data loss prevention (DLP) platform. It’s 2021 now & our vision is shared by our many customers. They rely on Nightfall’s SaaS offering to detect and secure sensitive data within their organizations. We develop powerful ML-based data classification algorithms for various classes of sensitive data (e.g. PII, credentials, images, toxic comments, and many other types of information) and build native product integrations in our platform like Slack, Github, Jira, Confluence, and Google Drive. Plus the many more integrations we’re developing for future release!
We have also built a Developer Platform with APIs & SDKs which allow our customers to embed data classification into their own services or protect applications that we don’t natively integrate with. This allows our customers to directly leverage the same backend data extraction and classification engine we built to use in Nightfall’s own native product integrations.
We know we have more to do ahead as we strive to meet our goals and deliver on Nightfall’s value proposition. Here are just a few of the ideas we’re working on, based on the vision of our roadmap, feedback from customers, and other inputs:
- High-confidence and accurate classification for all data types and sources (text, images, audio, video) using cutting edge machine learning techniques across natural language processing (NLP) and computer vision (CV)
- Scale our stack & platform to onboard all classes of cloud/SaaS app vendors (scanning petabytes of data per day in real-time)
- ML infrastructure to support an array of different model architectures, like long short-term memory networks (LSTM), transformers, conditional random fields (CRF), and more at scale
- Rapidly expand native integrations to cover all cloud/SaaS apps used by our customers
- Create multi-region and multi-cloud setups that are high availability (HA) and fault tolerant
- Craft rich user experiences with compelling insights and intuitive flows across very large data sets
In our journey so far, we have continuously evolved and scaled our technology stack. On the path ahead, there are many more learnings and challenges we must tackle for realizing our mission to reimagine data security and compliance through a cloud-native platform.
Why do we want to blog?
As we started our journey at Nightfall, we had to make the decisions that everyone tackles at the beginning such as, which languages (Go, Python) to use for our services, containers and container orchestration, messaging infrastructure (Kafka), stateful workflow orchestration (temporal.io), distributed data stores (Cockroach), infrastructure tooling (Terraform), etc. All of these were (relatively) easy choices as they are widely used tools and frameworks with a large community of users supporting each other. However, contrary to our expectations, we encountered interesting challenges with our chosen tech stack. As we come up with new ways to solve them, it is important for us to ease the path for others by sharing our learnings, solutions, and approaches through our engineering blog posts.
We will also periodically blog about high level architecture of the new components we are building to make data classification scale across a wide class of SaaS and PaaS services. Our intent is to openly share what we achieved along with the shortcomings we’ve encountered and addressed. We hope that others benefit from these shared learnings to build more scalable and reliable systems.