As stated in the post: these steps make *complete sense* if you're aiming for a ...

jchook · on June 9, 2021

One upside to writing an intermediate version that supports both the old and new representation: your data migrations are not an urgent race against the clock, or mission critical to the release.

You can decouple migrations from the release, and rest easy knowing your app is working without praying that a massive migration happens successfully and promptly.

Also since your UI or admin Interface likely lets you inspect your data in either representation, you can more naturally verify up/down migrations work in staging.

Sebb767 · on June 9, 2021

> This eliminates several steps of the process, which has the added benefit of reducing the number of places where something can go wrong.

It actually also removes some advantages. The author fails to point that out, but the advantage of these small steps is that you can roll back to the previous version at any time without loosing writes. If you find a production bug that does not impact data consistency, you can go back one version without loosing the writes that happened while the new version was up.

> To put this in context: say you guarantee 99.9% uptime - that's still 43.2 min / month of downtime available to you!

On the other hand, there might be a fatal bug at some point and you will need 3 hours of leeway to get up & running again. Shutting down is surely the easier method, but the available downtime is usually not free real estate for updates.

candu · on June 10, 2021

To be clear: data model migrations should still be tested and scripted to the extent possible; there should still be backups in place; there should still be a way to roll back application code and data schema. Just because a team / organization decides not to do zero-downtime updates - which I'm arguing is a completely reasonable choice in a lot of cases - doesn't mean they should abandon other software engineering practices.

Looking through the comments on this post, I see a lot of conflation between "not doing zero-downtime deployments" and "seat-of-your-pants engineering", as though fully orchestrated / containerized blue-green zero-downtime deployments were the One True Way for all teams / organizations / projects, and completely inseparable from other parts of good engineering practice.

Tainnor · on June 9, 2021

There's also some DB updates that, even if strictly speaking non-backwards compatible, run in just a matter of seconds. Additionally, in postgres you can set statement_timeout, which should abort any query that takes too long.

If you have a reasonable deployment strategy, then the chance that someone will talk to your application in the exact moment that the app is in some invalid state, might be quite small.