We've Been Solving the Wrong Problem: Why ContextOS Exists

November 17, 2025
Thomas Hatch, CEO
Share:

I've spent over 20 years building datacenter infrastructures. From the smallest startups to the largest corporations; from banks to non-profits; and, yes, even to chicken farms. SaltStack is used to build and manage the infrastructure for 20% of the Fortune 500. We've literally configured thousands of infrastructures across every industry you can imagine.

For years, I had this saying: "If you've seen one infrastructure, you've seen one infrastructure." I said it to acknowledge that everyone's needs seemed different, that every company required a bespoke solution tailored to their unique snowflake requirements.

I was deeply, profoundly wrong.

The 98% Realization

Here's what building thousands of infrastructures actually taught me: 98% of infrastructures have nearly identical goals.

Every single one needs compute, networking, and storage. Each needs these resources to be flexible and on-demand. Each needs security baked in. And each needs to deliver applications on top of this fabric. The implementation details might differ, but the fundamental requirements are virtually identical across banking, healthcare, manufacturing, tech, retail—you name it.

This realization was my light bulb moment. We've been treating infrastructure complexity as an unavoidable tax when it's actually a solved problem hiding in plain sight.

The Current State: We Made It Worse

Let's be direct about what happened. Kubernetes promised to solve our orchestration problems. Instead, it created 10 new ones.

Configuration drift has become endemic in K8s environments—the unintentional deviation of configurations from their desired states leads to operational issues, security vulnerabilities, and application failures. YAML manifests have become a maintenance nightmare, with developers spending valuable time debugging syntax errors rather than building features.

Kubernetes relies heavily on YAML manifests which are prone to errors—typos, mismatched API versions, and incomplete resource definitions are common pitfalls that make managing clusters exponentially harder as workloads grow. The complexity grows gradually and invisibly, with emergency hotfixes applied only to production environments while other environments drift out of sync.

Then we added service meshes to fix the networking problems Kubernetes introduced. Running a mesh proved hard in practice—sidecars added resource overhead and operational complexity ballooned, with many platform teams explicitly choosing to avoid service mesh because it slows down developer workflows or introduces too much operational overhead. The complexity tax manifests as thousands of lines of YAML for policies that exist in separate repositories requiring specialized knowledge to author, debug, and maintenance.

The cloud platforms? Financial inefficiency runs rampant, with 30% of cloud spend often wasted on unused or underutilized resources. Organizations experience roughly nine complete application rebuilds annually at an average cost of over $200,000 per year just in recovery labor costs.

The Real Problem: Developer Productivity

Here's what really matters: your developers shouldn't need to understand service meshes to ship code.

Microsoft Research recently published a comprehensive study on developer productivity that validates what we've all been experiencing. Their findings are damning: developers are spending excessive time on DevOps tasks rather than actual feature development.

Think about that. We've built this incredibly complex stack—Kubernetes, Helm, Kustomize, Istio, ArgoCD, Prometheus, Grafana, and dozens of other tools—supposedly to make development easier. Instead, we've created a world where your best engineers need to become experts in distributed systems just to deploy a simple web application.

Platform engineers face the nightmare of managing an explosion of configuration files, leading to slow deployments, frequent errors, and a steep learning curve. This isn't progress. It's just expensive complexity masquerading as innovation.

The Paradigm Shift: Locality, Scheduling, and Security

At ContextOS, we made a fundamental choice: instead of looking at infrastructure as compute, network, and storage, we reframed the problem around Locality, Scheduling, and Security.

This isn't just semantic. It's a complete reconceptualization of what infrastructure should provide.

Locality isn't about which server something runs on—it's about understanding context, proximity, and resource relationships in a way that makes sense for your application, not for the underlying hardware.

Scheduling isn't about pod placement algorithms—it's about dynamically allocating resources based on actual need, with the system understanding both current state and future requirements.

Security isn't bolted on through network policies and service meshes—it's fundamental to how contexts communicate, with zero trust built into the fabric of the system.

When you change this perspective, everything else falls into place. You're no longer managing infrastructure—you're describing what you want to accomplish, and the system figures out how to deliver it securely, reliably, and efficiently.

What ContextOS Actually Does

ContextOS delivers flexible and scalable compute, storage, and networking by creating something that looks like a distributed operating system—but crucially, you don't need to interact with it like a distributed OS. Your interfaces and tools still run on classic operating systems that just scale seamlessly across multiple hardware servers.

Unlike Kubernetes or DBOS, you can run full OS application code and systems using whatever OS you want. Unlike cloud platforms, ContextOS completely abstracts away everything:

  • Storage allocation and sharing? Seamless.
  • Networking and firewalling? Completely automatic.
  • Naming and addressing? Completely abstracted.
  • Development environments and DevOps stages? Completely built in.

This abstraction creates the model of Locality, Scheduling, and Security. And it all starts with a simple but powerful construct we call CTX/ICC.

What's Next

In this series, I'm going to walk you through exactly how ContextOS works and why it's fundamentally different from anything else in the market:

  • Next post: The CTX/ICC Construct—the core innovation that makes everything else possible
  • Post 3: The VFS and how distributed state really works
  • Post 4: The DSM and Software Drivers—deployment done right
  • Post 5: Putting it all together

We've built this system on the foundation of 20+ years of infrastructure experience and the hard lessons learned from building Salt at scale. We know what works, we know what doesn't, and we've built ContextOS to solve the actual problem: making infrastructure disappear so your developers can focus on building products.

The 98% of infrastructure that has identical needs deserves a solution that acknowledges this reality. ContextOS is that solution. Sign up for our beta to keep up with our latest news and to try ContextOS in your business.

Tom Hatch is the CEO, CTO and Co-founder of ContextOS. Previously, he created SaltStack, used by 20% of Fortune 500 companies.