From Cron Jobs to Streams: Migrating Off Legacy ETL

The customer’s setup, on day one of the audit:

340 cron jobs, orchestrated by a 4,000-line bash script.
Half ran nightly, a third overlapped on the same tables.
Two of them silently corrupted data when run in the wrong order.

It worked, in the same way a Jenga tower works.

“If your crontab needs a comment to explain what comes after */5 *, the system is already telling you something.”

The strangler approach

We didn’t rewrite. We wrapped. Each cron job got a thin event emitter on completion: when the job finished writing to its target table, it emitted table-X updated to a stream.

Downstream consumers got two paths to the same data:

Legacy path — read the target table. No code change for the consumer.
New path — subscribe to the event stream. Get push semantics, retries, and replay for free.

Over six months, every consumer migrated to the stream at their own pace. Zero downtime, zero data migration, zero “big bang” weekend.

The trap

Don’t let “we’ll migrate later” become permanent. We set a hard sunset date for the legacy paths from week one. Two consumers asked for extensions:

One legitimate — compliance review needed an extra month. Granted.
One political — a team didn’t want to schedule the work. Refused.

The team that refused the extension had moved by month three. The team we extended took eight months. Lesson: deadlines move work; extensions don’t move deadlines.

What’s still legacy

Three jobs. They’re orchestration glue between two SaaS tools and have no real-time consumers. Cron is the right tool for them.

Migrating everything is a vanity project — migrate what’s painful.

From Cron Jobs to Streams: Migrating Off Legacy ETL

The strangler approach

The trap

What’s still legacy

More from the blog

Building Real-Time Data Pipelines at Scale

How We Cut Pipeline Latency by 85% With Adaptive Buffering

A Practical Guide to Vector Search at Production Scale