hand-rolling backups like in the 90s

the journal of Michael Werneburg

Toronto, 2018.01.12

I've once again found myself hand-making a backup solution when a large organization has proved incapable of deploying a sensible solution. I don't know why large companies fail at this particular thing, but it comes up time and again: they acquire expensive and highly capable storage and virtualization infrastructure, then dither and debate (and defer to audit!) and wind up not utilizing the infrastructure.

I've seen disuse of equipment carried out to mind-blowing dimensions. For instance, I witnessed a Japanese investment bank deploy an vast array of server equipment from which to launch remotely-run applications. People on their desktops would run application from images on the servers rather than the desktops themselves. The local servers were completely idle, however, and the users were running the applications from servers in a distant city.

But back to the backup solution: I eventually realized that the due to the highly sensitive nature of the data on the servers we were using, no one wanted to do the leg work of determining where the backups could be safely stored. It's not an uncommon problem.

But it's also one with a solution: the virtualization software manages this stuff out of the box, with replication and encrypted backups. But in this environment, we'd learned that nothing of the sort had been deployed despite several attempts to do so. We'd also discovered (the hard way) that even vmotion – the software that allows client virtual machines to move from server to server – wasn't properly configured in the resource pool (a set of six physical servers). We lost some hardware, and things got ugly. Then the client had a data center go down due to a hurricane, with a lengthy recovery that saw long service outages.

Dithering or no, backups started looking like a better and better idea.

So I ordered a VM sitting outside the production resource pool, and deployed GPG. Creating a key pair on the newly anointed "backup" server, I started exporting databases, copying configuration elements, encrypting data at rest, and collecting all of that in an out-of-the-way spot on the production servers. Using shell scripts under cron running from the "backup" server, I archived encrypted copies of all of that stuff to that server, and set up a cycle to remove it all.

With this little trick – barely a morning's work – I was ticking all kinds of boxes: enumerating the assets on the servers; cleaning up the production environment; encrypting sensitive data at rest; ensuring that the service could be rebuilt should the fragile hosting infrastructure pack it in; not distributing sensitive data or software beyond its intended production "home"; avoiding expense; avoiding bureaucracy.

All of this would have been much more manageable with a centrally managed scheduling system like BeyondCron, so that's my next target.