Why We Open-Sourced Our Schema Diffing Engine

Last week we published vxdiff — our internal schema diffing engine — under Apache 2.0. It’s 12,000 lines, three years old, and powers every migration we ship internally.

We’re going to be honest about why.

What it does

vxdiff reads two schemas — Avro, Protobuf, or our own internal IDL — and produces a structured diff: added fields, removed fields, type narrowing, default changes. It then tells you whether the diff is forward-compatible, backward-compatible, or a breaking change, with a citation to the rule it matched.

Most schema tools tell you what changed. vxdiff tells you whether you can ship it.

Why now

Three reasons:

It’s not a moat. Schema diffing is plumbing. The interesting part of Vaultix is what we do with the schema after diffing it.
We need contributors. We support five schema formats. Customers ask for six more. We can’t keep up internally and the obvious shape of the answer is “let the people who use those formats maintain them.”
Hiring. Engineers who care about schema correctness are exactly the engineers we want to talk to. A repo is a better recruiting tool than a careers page.

What we’re not open-sourcing

The runtime. The router. The storage. Everything that’s actually a moat.

This isn’t a strategy shift. It’s vxdiff-shaped. If we open-source something else later, it’ll be for vxdiff-shaped reasons.

Why We Open-Sourced Our Schema Diffing Engine

What it does

Why now

What we’re not open-sourcing

More from the blog

Building Real-Time Data Pipelines at Scale

How We Cut Pipeline Latency by 85% With Adaptive Buffering

A Practical Guide to Vector Search at Production Scale